This week and trying to do some maintenance on several ESXi hosts (vSphere 6.5) notice that vMotion and DRS were not migrated some of the VMs in some hosts. vMotion fails to migrate and EVC fails to enable.
Troubleshoot the VMkernel by pinging all ESXi hosts using vMotion network using command vmkping.
First, let us check VMkernel adapters to identify vMotion VMkernel adapters:
Identified the VMkernel and IP address of vMotion, vmk1, let us ping other ESXi hosts (if you don’t have all IPs, you need to run the above command in all ESXi hosts to display the vMotion IP address).
In this case was from 10.0.28.37 to 10.0.28.48 (all ESXi hosts from the same Cluster where VMs should migrate)
vmkping -I vmk1 10.0.28.38
All vMotion VMkernel IPs did ping from all ESXi hosts. No issues found here.
As we can see in the IP display board, MTU is 9000, that means Jumbo Frames are enabled. So I try same vmkping command with Jumbo Frames to make sure all ESXi hosts were able to ping each other (even using Jumbo Frames), so the issues were not on the network level.
vmkping -4 -c 6 -d -s 8972 -v 10.0.28.38
No issues found in the vMotion network, the problem needs to be somewhere.
Rechecked DRS and tried a manual Hot vMotion to some ESXi hosts was working and had no issues, but then I tried to move VMs from one or two ESXi hosts particular ESXi host and was not possible. Any Hot Migration that was done to this ESXi hosts doesn’t work. If I try a cold migration, I have no issues.
When try again to manually migrate a VM from a working host to the ones that were not possible to migrate VMs was when I saw this message.
“The target host does not support the virtual machine current hardware requirements. To resolve CPU incompatibilities, use a cluster with Enhanced vMotion Compatibility (EVC) enabled”
Strange error and strange that I need to enable EVC to perform this migration when all servers and CPUs in this Cluster are the same.
Even was strange that I need EVC to do this migration I try to enable EVC to perform this migration and none of the VMware EVC modes were compatible with the CPU, and I get: “The host’s CPU hardware does not support the cluster’s current Enhanced vMotion Compatibility mode. The host CPU lacks features required by that mode.”.
Again this is a strange behavior since all Dell servers are the same model and also CPU.
Then I start looking at other alternatives, since this was a new Cluster and also some updates were already applied (but not all) I double check if Spectre/Meltdown and L1 Terminal Fault – VMM vulnerability patches were applied in all ESXi hosts.
Since this is a patch that affects ESXi host CPU behavior and also VMs Guest OS, I double check the build for each ESXi host.
As we can see above, there is a mismatch between two ESXi hosts from the ten that existing on this particularly Cluster.
vSphere 6.5 Build 893507 doesn’t have cpu-microcode patch ESXi650-201808402-BG for the L1 Terminal Fault – VMM vulnerability. Since this affects CPU, I am pretty sure this is the issue on this Cluster where VMs already patched were not allowed to migrate to a host that still doesn’t have the proper patch applied.
This was the main reason why vMotion fails to migrate and EVC fails to enable.
Next is to go to Update Manager do a scan on this ESXi hosts and check missing patches.
As we can check above, those are the missing patches on those ESXi hosts, including the cpu-microcode patch. Next, I just applied the patches on the faulty ESXi hosts, reboot and tried to hot migration to those ESXi hosts again.
Now all VMs can migrate from all ESXi hosts inside the Cluster, and there are no more compatibility issues.
Once again is proved that ALWAYS we should apply patches to all ESXi hosts and all should always be up to date, particularly the security patches. We should not have mismatch vSphere builds inside a Cluster, or even inside the vCenter.
Particularly with the latest problems with Spectre/Meltdown and with the CPU Intel issues the L1TF security issues, patching your ESXi hosts also and adequately VMs is very important.
In the past weeks, VMware informed customers about a bug in the VMware Tools version 10.3.0, that was bug causing PSOD for Windows OS and that version 10.3.2 was launched in the last days to fix this issues.
Again, to have a clean and safe VMware environment, updates should always be applied as soon as possible in your vCenter, vSphere or Virtual Machines (VMware Tools and Virtual Hardware).
Note: Share this article, if you think it is worth sharing.