EMC VNX – List of Useful NAS Commands

Verify ‘NAS’ Services are running:
Login to the Control Station as ‘nasadmin’ and issue the cmd /nas/sbin/getreason from the CS console. The reason code output should be as follows (see detailed list of Reason Codes below):
10 - slot_0 primary control station
11 - slot_1 secondary control station
5 - slot_2 contacted
5 - slot_3 contacted

Complete a full ‘Health Check’:
/nas/bin/nas_checkup
Location of output:
#check Log:# cd /nas/log/
ls
cat /nas/log/checkup-rundate.log

Confirm the EMC NAS version installed and the model name:
/nasmcd/bin/nas_version
/nas/sbin/model

Stop ‘NAS’ Services:
/sbin/service nas stop
Start ‘NAS’ Services:
/sbin/service nas start

Check running status of the ‘DATA Movers’ and view which slot is active/standby:
nas_server -info -all

Verify connectivity to the VNX storage processors (SPs) from the Control Stations:
/nas/sbin/navicli -h SPA_IP domain -list
/nas/sbin/navicli -h SPB_IP domain -list

Confirm the VMAX/VNX is connected to the NAS:
nas_storage -check -all
nas_storage -list

View VNX NAS Control LUN Storage Group details:
/nas/sbin/navicli -h SP_IP storagegroup -list -gname ~filestorage

List the disk table to ensure all of the Control Volumes have been presented to both Data Movers:
nas_disk -list

Check the File Systems:
df -h

server_sysconfig server_2 -virtual
/nas/bin/nas_storage –info –all

View trunking devices created (LACP,Ethc,FSN):
server_sysconfig server_2 -virtual
Example view interface name “LACP_NAS”:
server_sysconfig server_2 -virtual -info LACP_NAS
server_2 :
*** Trunk LACP_NAS: Link is Up ***
*** Trunk LACP_NAS: Timeout is Short ***
*** Trunk LACP_NAS: Statistical Load Balancing is IP ***
Device Local Grp Remote Grp Link LACP Duplex Speed
------------------------------------------------------------------------
fxg-1-0 10002 51840 Up Up Full 10000 Mbs
fxg-1-1 10002 51840 Up Up Full 10000 Mbs
fxg-2-1 10002 51840 Up Up Full 10000 Mbs
fxg-2-0 10002 51840 Up Up Full 10000 Mbs

Check Code Levels:
List the datamovers: nas_server -list
Check the DART code installed on the Data Movers: server_version ALL
Check the NAS code installed on the Control Station: nas_version

View Network Configuration:
To display parameters of all interfaces on a Data Mover, type:
Control Station: /sbin/ifconfig (eth3 is the mgmt interface)
Data Movers: server_ifconfig server_2 -all
cat ifcfg-eth3

View VNX SP IP Addresses from the CS console:
grep SP /etc/hosts | grep A_
grep SP /etc/hosts | grep B_

Verify Control Station Comms:
/nas/sbin/setup_enclosure -checkSystem

Confirm the unified FLAG is set:
/nas/sbin/nas_hw_upgrade -fc_option -enable

Date & Time:
Control Station: date
Data Movers: server_date ALL

Check IP & DNS info on the CS/DM:
nas_cs -info
server_dns ALL

Log Files:
Log file location: /var/log/messages
Example of NAS services starting successfully:
grep -A10 “Starting NAS services” /var/log/messages*
Output:
Dec 8 19:07:27 emcnas_i0 S95nas: Starting NAS services
Dec 8 19:07:46 emcnas_i0 EMCServer: nas_mcd: MCD will monitor CS IPMI connection.
Dec 8 19:08:46 emcnas_i0 EMCServer: nas_mcd: slot 0 missed 10 heartbeats from slot 1.
Dec 8 19:08:50 emcnas_i0 EMCServer: nas_mcd: Install Manager is running on slot 0, skipping slot 1 reboot
Dec 8 19:08:50 emcnas_i0 EMCServer: nas_mcd: Slot 0 becomes primary due to timeout
Dec 8 19:08:52 emcnas_i0 mcd_helper: All NBS devices are up
Dec 8 19:09:08 emcnas_i0 kernel: kjournald starting. Commit interval 5 seconds

Check the Data Mover Logs:
server_log server_2

Failing over a Control Station:
Failover:
/nas/sbin/./cs_standby -failover
Takeover:
/nasmcd/sbin/./cs_standby -takeover
Or reboot:
nas_cs –reboot

Shutdown control station:
/sbin/shutdown -h now

Power off CS1 from CS0:
/nas/sbin/t2reset pwroff -s 1

List on which VMAX3 directors each CS and DM are located:
nas_inventory -list

List Datmover PARAMETERS:
/nas/bin/server_param server_2 -info
/nas/bin/server_param server_3 -info
/nas/bin/server_param server_2 -facility -all -list
/nas/bin/server_param server_3 -facility -all -list

Determine the failover status of the Blades (Datamovers):
/nas/bin/nas_server -info –all

Initiate a manual failover of server_2 to the standby Datamover:
server_standby server_2 -activate mover

List the status of the Datamovers:
nas_server -list

Review the information for server_2:
nas_server -info server_2
All DMs: nas_server -info ALL

Shutdown Datamover (Xblade):
/nas/bin/server_cpu server_2 -halt now

Power on the Datamover (Xblade):
/nasmcd/sbin/t2reset pwron -s 2

Restore the original primary Datamover:
server_standby server_2 -restore mover

To monitor an immediate cold restart of server_2:
server_cpu server_2 -reboot cold -monitor now
A cold reboot or a hardware reset shuts down the Data Mover completely before restarting, including a Power on Self Test (POST).

To monitor an immediate warm restart of server_2:
server_cpu server_2 -reboot -monitor now
A warm reboot or a software reset performs a partial shutdown of the Data Mover, and skips the POST after restarting. A software reset is faster than the hardware reset.

Clean Shutdown:
Shutdown Control Stations and DATAMovers:
/nasmcd/sbin/nas_halt -f now
finished when:exited on signal IS APPEARS

Powerdown Entire VNX Including Storage Processors:
nas_halt –f –sp now

Check if Product Serial Number is Correct:
/nasmcd/sbin/serial -db_check
Remove inconsistency between the db file and the enclosures:
/nasmcd/sbin/serial -repair

List All Hardware components by location:
nas_inventory -list -location
nas_inventory -list -location | grep “DME 0 Data Mover 2 IO Module”

Use location address to view specific component details:
nas_inventory -info “system:VNX5600:CKM001510001932007|enclosure:xpe:0|mover:VNX5600:2|iomodule::1”

List of Reason Codes:
0 – Reset (or unknown state)
1 – DOS boot phase, BIOS check, boot sequence
2 – SIB POST failures (that is, hardware failures)
3 – DART is loaded on Data Mover, DOS boot and execution of boot.bat, boot.cfg.
4 – DART is ready on Data Mover, running, and MAC threads started.
5 – DART is in contact with Control Station box monitor.
6 – Control Station is ready, but is not running NAS service.
7 – DART is in panic state.
9 – DART reboot is pending or in halted state.
10 – Primary Control Station reason code
11 – Secondary Control Station reason code
13 – DART panicked and completed memory dump (single Data Mover configurations only, same as code 7, but done with dump)
14 – This reason code can be set for the Blade for any of the following:
• Data Mover enclosure-ID was not found at boot time
• Data Mover’s local network interface MAC address is different from MAC address in configuration file
• Data Mover’s serial number is different from serial number in configuration file
• Data Mover was PXE booted with install configuration
• SLIC IO Module configuration mismatch (Foxglove systems)
15 – Data Mover is flashing firmware. DART is flashing BIOS and/or POST firmware. Data Mover cannot be reset.
17 – Data Mover Hardware fault detected
18 – DM Memory Test Failure. BIOS detected memory error
19 – DM POST Test Failure. General POST error
20 – DM POST NVRAM test failure. Invalid NVRAM content error (checksum, WWN, etc.)
21 – DM POST invalid peer Data Mover type
22 – DM POST invalid Data Mover part number
23 – DM POST Fibre Channel test failure. Error in blade Fibre connection (controller, Fibre discovery, etc.)
24 – DM POST network test failure. Error in Ethernet controller
25 – DM T2NET Error. Unable to get blade reason code due to management switch problems.

Mail Flow Troubleshooting

NSLOOKUP
At the command prompt type “NSLOOKUP “.
This will give you a “>” prompt.
Type “set type=MX “.
Type “microsoft.com “.
You should now get the MX records for “microsoft.com”:
Non-authoritative answer:
microsoft.com MX preference = 10, mail exchanger = microsoft-com.mail.protection.outlook.com

microsoft-com.mail.protection.outlook.com internet address = 207.46.163.215
microsoft-com.mail.protection.outlook.com internet address = 207.46.163.247
microsoft-com.mail.protection.outlook.com internet address = 207.46.163.138

Or Use who.is:
http://who.is/

Telnet:
Using Telnet from Windows to test port 25, from the command prompt window type:

telnet server port
Example:
telnet mail.eircom.net 25
or whoever the SP is ..

TELNET – BLACKLISTED:
521 connection rejected
Connection to host lost.

Blacklist lookup/release:
http://www.fortiguard.com/static/ip_lookup.html

Tools:
http://mxtoolbox.com/NetworkTools.aspx

CISCO UCS Blades – Memory Troubleshooting

This is a guest blog post kindly contributed by Eric Daly @daly_eric.

In an effort to pin point specific DIMMs within UCS that are throwing an error, please follow these simple steps. Be aware that an entire memory channel with up to 3 DIMMs may show as disabled in a channel, all because of one DIMM with uncorrectable errors.

1) Check the Inventory > Memory tab to see which DIMMs are not registering. Make note of the DIMM Location (F0,F1,F2) as per Below:
TS_MEM1
2) Review the SEL Log and search for the specific DIMM throwing “uncorrectable” “memory error”. In this case you will see from the image below that the F2 DIMM was causing the issue. If nothing shows in SEL log perform steps 3-5.
TS_MEM2
3) Reset CIMC controller of blade (Recover Server > Reset CIMC (Server controller). Wait a minute or 2.
4) Re-acknowledge blade. Takes 2-3 mins
5) Review SEL Log again as per step 2 in order to identify the faulting DIMM.

DEEPER ANALYSIS
1) Download techsupport for the specific chassis where the suspect blade is located.
TS_MEM3
2) Extract the tar and then extract the relvant zip file for suspect blade. There are 2 files which will give you a clear picture of memory DIMM failures. MrcOut.txt and DimmBl.log
3) Locate the DimmBl.log file and open this with Word (not notepad).
TS_MEM4
4) You will get a summary of first page telling you if blade has any DIMMs with uncorrectable errors

====================== SUMMARY OF DIMM ERRORS ======================
NO DIMM ECC ERRORS ON THIS BLADE

====================== DIMM BL RAM DATABASE DUMP ========================

====== RAM DB DUMP =====
--- Control Header :
DataBaseFormatVersion : 2
FaultSensorInitDone : 0x00
SyncTaskInitDone : 0x01
DimmBLEnabledBySAM : FALSE
MostRecentHostBootTime : Sat Jun 28 19:07:28 2014
PreviousHostBootTime : Sat Jun 28 02:47:42 2014
MostRecentHostShutdownTime : Sat Jun 28 02:58:12 2014
ErrorSamplingIntervalLength : 1209600
DBSyncPeriod : 3600
CurrentIntervalIndex : 0

---------------------- PER DIMM ERROR COUNTS -----------
CORRECTABLE ERRORS UNCORRECTABLE ERRORS
DIMM ID Total This Boot Total This Boot
-----------------------------------------------------------
A0 0 0 0 0
A1 0 0 0 0
A2 0 0 0 0
B0 0 0 0 0
B1 0 0 0 0
B2 0 0 0 0
C0 0 0 0 0
C1 0 0 0 0
C2 0 0 0 0
D0 0 0 0 0
D1 0 0 0 0
D2 0 0 0 0
E0 0 0 0 0
E1 0 0 0 0
E2 0 0 0 0
F0 0 0 0 0
F1 0 0 0 0
F2 0 0 0 0
G0 0 0 0 0
G1 0 0 0 0
G2 0 0 0 0
H0 0 0 0 0
H1 0 0 0 0
H2 0 0 0 0

EMC VNXe – Troubleshooting NFS Connectivity

Step1 Check the Health status of the Link Aggregation:

uemcli -d 10.0.0.1 -u Local/admin -p Password123# /net/la show -detail

1: ID = la0_SPA
SP = SPA
Ports = eth2_SPA,eth3_SPA
Health state = OK (5)

2: ID = la0_SPB
SP = SPB
Ports = eth2_SPB,eth3_SPB
Health state = OK (5)

Step2 Ensure the Network Interface for NFS is correctly configured:
uemcli -d 10.0.0.1 -u Local/admin -p Password123# /net/if show -detail

ID = if_2
Port = eth2_SPA
VLAN ID = 0
IPv4 mode = static
IPv4 address = 10.0.0.11
IPv4 subnet mask = 255.255.255.0
IPv4 gateway = 10.0.0.254
MAC address = 08:00:00:00:00:00
SP = SPA

Step3 Check the Health Status and MTU Value set on the Ports:
uemcli -d 10.0.0.1 -u Local/admin -p Password123# /net/port show

ID = eth2_SPA
Role = frontend
SP = SPA
Supported types = iscsi, net
MTU size = 9000
Speed = 1 Gbps
Health state = OK (5)
Aggregated port ID = la0_SPA

ID = eth3_SPA
Role = frontend
SP = SPA
Supported types = iscsi, net
MTU size = 9000
Speed = 1 Gbps
Health state = OK (5)
Aggregated port ID = la0_SPA

ID = eth2_SPB
Role = frontend
SP = SPB
Supported types = iscsi, net
MTU size = 9000
Speed = 1 Gbps
Health state = OK (5)
Aggregated port ID = la0_SPB

ID = eth3_SPB
Role = frontend
SP = SPB
Supported types = iscsi, net
MTU size = 9000
Speed = 1 Gbps
Health state = OK (5)
Aggregated port ID = la0_SPB

For a more detailed analysis of Frontend|Backend:
uemcli -d 10.0.0.1 -u Local/admin -p Password123# /net/port -role frontend show -detail

Step4 Check if Jumbo MTU is set correctly on Cisco SW’s:
SwitchA#show system mtu
If change required then issue command: system mtu jumbo 9198
Save and then reload

Step5 Check Shared folder for enabled NFS and Interface ID
uemcli -d 10.0.0.1 -u Local/admin -p Password123# /net/nas/server show

ID = file_server_2
Name = NFS_01
Health state = OK (5)
SP = SPA
CIFS enabled = no
NFS enabled = yes
Interface = if_2

Step6 Run a PING Test from the VNXe NFS Interface if_2 to ESXi NFS IP & VMKPing from ESXi NFS vmk to VNXe Server
Gather ESXi Host details:
uemcli -d 10.0.0.1 -u Local/admin -p Password#123 /remote/host show -detail

ID = 1003
Name = ESXi_01
Type = host
Address = 10.0.0.50
OS type = esx

Ping the vmkernel of the ESXi host to ensure proper connectivity:
uemcli -d 10.0.0.1 -u Local/admin -p Password123# /net/util ping -srcIf if_2 -addr 10.0.0.50
Operation completed successfully.
Ping the NFS Server from the vmkernel of the ESXi host:
vmkping -s 8972 -d 10.0.0.11

Failure here will normally imply a networking configuration issue – Verify subnets, vlan’s and any firewall configs are correct.

Step7 Check the status and details of the NFS Datastore
uemcli -d 10.0.0.1 -u Local/admin -p Password123# /stor/prov/vmware/nfs show -detail
ID = app_1
Name = NFS-01
Health state = OK (5)
Health details = "The component is operating normally. No action is required."
Server = file_server_2
Storage pool = NFS-01
Size = 858993459200 (800.0G)
Size used = 241740808192 (225.1G)
Maximum size = 4417404272640 (4.0T)
Thin provisioning enabled = no
Cached = no
Current allocation = 858993459200 (800.0G)
Protection size = 42949672960 (40.0G)
Protection size used = 0
Maximum protection size = 17592184995840 (16.0T)
Protection current allocation = 0
Auto-adjust protection size = yes
Local path = /NFS_01
Export path = /NFS_01
Default access = na
Root hosts = 1003[10.0.0.50]
Replication destination = no
Deduplication enabled = no
Creation time = 2014-06-11 09:55:00
Last modified time = 2014-06-11 09:55:00

Step8 For any Firewall in-place ensure the following port is open:
2049 – TCP/UDP – NFS – Required for NFS

Step9 Add the NFS file system manually to the ESXi host:
Log on to vCenter or the ESX host click on the ESXi host, from the Configuration tab choose Storage and Add storage:
1. Enter the IP address of the NFS server – 10.0.0.11
2. Enter the Folder name which is the mount point from VNXe – /NFS-01
3. Enter the datastore name that vCenter/ESXi will use to present as – NFS-01