VxVerify – Pre-upgrade Health Check Tool
VxVerify is an incredibly useful tool to have at your disposal while working with VxRail. Complimentary to the native VxRail manager checks VxVerify performs a health check analysis on a VxRail cluster and it is highly recommended to run VxVerify in advance of upgrades, expansions or for general maintenance operations. VxVerify when run on VxRail Manager deploys what is referred to as ‘Minions’ (small python programs) which are sent to each of the VxRail nodes in the cluster, these minions in turn perform host checks on each of the nodes. In addition to ESXi host specific tests VxVerify also performs checks on VxRail Manager, VMs, vCSA and at a cluster level. Here are just a sample of what gets tested:
- Maintenance mode status
- Check hostd service
- IPMI tool for VxRail hardware
- NTP status and time delta
- Secure Boot status
- vSAN Health
- Service Datastore available capacity
- ESXi version consistency
- vSAN Health
- Free space
- Password for unsupported characters
- Check free space in root and /tmp
- Host Lockdown Mode
- Reboot required for ESXi host
- Check for RecoverPoint
- Check for external vCenter
- vMotion compatibility
- Mounted ISO on VM
- VM-to-host affinity rules
- Docker Servers Running
- VCF installation type, if present
- Check time difference between VxRM & VC
It is highly recommended that you run the VxVerify tool before proceeding with a VxRail upgrade; more details on how to download and run VxVerify can be found here:
https://www.dell.com/support/kbdoc/en-ie/000021527/vxrail-how-to-run-vxverify
There is a VxVerify download available for each of the following VxRail versions:
- VxVerify 1.xx.xxx is for VxRail 4.0 only
- VxVerify 2.xx.xxx is for VxRail 4.5, 4.7 & 7.0.000
- VxVerify 3.xx.xxx is for VxRail 7.0.010+
Note: Always download the very latest edition as new checks are being added regularly. In addition each version of VxVerify is timelimited to prevent outdated versions.
Running VxVerify
In this example VxRail is running code version 7.0.100 so we begin by downloading the VxVerify 3.x edition:
Once download completes extract the .zip and review the readme.txt to review all the recent updates:
Before uploading VxVerify to VxRail Manager we need to create a directory with the required permission set. Login to VxRail manager via SSH elevating to root user and execute the following set of commands to create a ‘vxv’ dir in /tmp and set the permissions:
#mkdir /tmp/vxv
#chmod 777 /tmp/vxv
#cd /tmp/vxv
Upload the ‘.pyc’ file to the newly created directory (in this example I am leveraging ‘winscp’ to perform the upload to VxRail manager):
Once uploaded navigate to /tmp/vxv/ and use --help
to expose all the VxVerify associated command line arguments:
Simply running the .pyc script as root will kick off VxVerify:
As can be seen from the results table below; all hosts and cluster tests completed successfully. You can also see the status 3 critical alert notifying that ‘VC login failed’ the VC management credentials get decrypted automatically from the VxRail Manager database and runtime (although they can also be specified with -u & -p), so if those are failing, you’ll need to check the vxv.log
Inserting the vCenter root credentials performs the respective vCSA tests such as free space tests for the VCSA partitions:
Results table below now inclusive of vCenter health checks:
VxVerify creates a number of files in ‘/tmp/vxv/’ these include .log files which can be used for troubleshooting purposes etc. also the summary report seen on the previous screen can be recalled by viewing the ‘/tmp/vxv/vxverify.txt’ file:
Another really useful output here is the ‘vxtii.txt’ file which includes detailed information on each ESXi host in the VxRail cluster providing a nice overview of the hardware:
Creating a dummy failure by attaching an .iso to the vCenter VM using local storage results in the following failure scenario. You will note for each warning or failure there is an associated support.dell.com KB referenced in the results table which will assist with resolving any issue highlighted by VxVerify (do not attempt an upgrade until these yellow or red event codes are investigated):
Special mention to our Escalation Engineering team for maintaining such a valuable tool! Thank you!
Like this:
Like Loading...
Related
Great post David, should VxVerify be run in advance of VCF on VxRail upgrades also? Or is this just for native VxRail upgrades?
Thanks Steve, great question – It is highly recommended to run VxVerify in advance of all upgrades including those of ‘VCF On VxRail’.
Just to add – I would add it on ALL WLD domains as well not just the MGMT domain
Great point! Thanks Gearoid!
Great post David!
Thanks Victor!
Excellent David recently was involved in a major migration activity, and I must say VxVerify was a great help.
Great to hear VxVerify was of assistance! Thanks for the feedback!
Hey David. That’s a really good article and thanks for that.
I would add though that the login error, listed under VxRM would be for the VC management credentials, whereas the VCSA root credentials are used for the tests lsited under VCSA (such as free space tests for the VCSA partitions).
The VC management credentials should decrypted automatically from the VxRM DB and runtime (although they can also be specified with -u & -p), so if those are failing, you’ll need to check the vxv.log.
Thank you Dave! It’s a great tool!
Hi David, nice post.
do the output files in /tmp/vxv have consist names? purely to grab those files for analysis at a later date ? looks a great tool, and hope to run soon to see the results.
Hi Kevin, thanks for the feedback. Great Q. I’ll update post to address this; file names are consistent and you can chose to save the VxVerify output if you leverage the -l or –log option along with a path that the logs should be saved to, for example:
python vxverify.pyc -l /tmp/vxv1logs
Does that help?
Thanks for the excellent post David. I am unable to run VxVerify when i do not have enabled ssh on the nodes manually before. I am just getting a timeout when the script tries to push the minions to the nodes, Results for VxRail-Manager and vCenter are green. VxVerify is 3.10.212 and VxRail is 7.0.131
Thanks Marc, VxVerify has inbuilt logic to enable SSH including the retaining of SSH status from before and after VxVerify is run. I have forwarded this to Engineering and will revert ASAP. In the meantime the following may help you especially if working on a large cluster: https://davidring.ie/2016/04/14/vmware-powercli-enabledisable-ssh/
Thanks David, i already use PowerCLI to enable ssh on all nodes, even if the cluster on which i tested VxVerify is not that big with its 8 nodes 😉 If i can help with the log files just let me know.
Awesome post! Thanks for sharing!
Thanks for the feedback Raj!
Wonderful post! Youve made some very astute observations and I am thankful for the the effort you have put into your writing. Its clear that you know what you are talking about. I am looking forward to reading more of your sites content.
https://www.nygci.com/hyper-converged-infrastructure-hci/
Thank you!!!! Great article.
Wonderful post! I am thankful for the effort you made to put all in your writing, can you help me resolve issue i get when doing precheck, I am getting this warning “VxRail: VxRM health-check fails for test ‘vc_pw_char’ “
I ran this tool before when doing a previous upgrade and all came back fine, this time around I’m getting the following failure which I can’t seem to find any information for online.
“Failure 193300 dcism ism is not running”
Any help understanding this would be helpful, thanks
Hi Angel,
Please see the following DELL KB:
https://www.dell.com/support/kbdoc/en-ie/000193300/vxrail-node-health-check-fails-for-test-dcism?lang=en
If hostd is stopped/restarted, then iSM may not come back into a running state, despite being listed as active.
For example, on ESXi:
/etc/init.d/dcism-netmon-watchdog status
iSM is active (not running)
Resolution
If you restart ism agent with:
etc/init.d/dcism-netmon-watchdog restart
After about 2 mins the ism should now report as
/etc/init.d/dcism-netmon-watchdog status
iSM is active (running)