Unfortunately VMware did it again with ESXi 6.0. Another bug regarding CBT in ESXi 6.0 and this a second in 6 months. In my opinion this have a big impact in VMware image. For the last year, year and a half, not many people are happy with VMware. Even some are thinking to move from VMware because of this releases that are not completely tested before launch.
In my opinion VMware needs to give ESXi and vCenter proper attention and not only to NSX, vSAN or Cloud products. That we know is the future in Virtualization. Because most of this products still depends on ESXi and vCenter, but also most of VMware costumer uses these 2 products. And not having proper attention to ESXi releases is losing credibility as a product and as a company. Honestly VMware needs to fix that Quality Control issues, if not will start to loose customers. Because confidence in VMware is extremely important for customers.
Personalty here is getting more and more difficult to explain, and justify, all of those issues to management. The outages, the reconfigure of the Backups, the maintenance that we need around 1000 VMs to fix most of this issues that we ESXi 6.0 is getting..
But lets focus in the CBT issue.
This Bugfix included some fix for CBT in the VSAN product:
“Attempts to upgrade a Virtual SAN cluster On-Disk format version from version 2.0 to 3.0 fails when you Power On CBT-enabled VMs. Also, CBT-enabled VMs from a non-owning host might fail due to on-disk lock contention on the ctk files and you might experience the following issues:
Deployment of multiple VMs from same CBT enabled template fail.
- VMs are powered off as snapshot consolidation fails.
- VM does not Power On if the hardware version is upgraded (for example, from 8 or 9 to 10) before registering the VM on a different host”
So we don’t know if this was the part that again brought the CBT issues to our backups, but again the CBT issue is back.
VMs are getting error in the Veeam Backup:
“Error: An error occurred while saving the snapshot: Failed to quiesce the virtual machine. ”
And in the VM log (vmware.log) we get this:
“SNAPSHOT:SnapshotBranchDisk: Failed to acquire current epoch for disk /vmfs/volumes/2f6294e0-bc15bc5d/edeacwdp2vnf01/vmname.vmdk : Change tracking is not active for this disk 572.”
If you VMs are already affected with this CBT issue will affect all Backup that uses Incremental. Incremental Backups will not be consistent anymore, and when you need to restore a VM using a Incremental restore point it may not be consistent and may not work.
This only CBT issue will only be present if you use VMware Tools quiescence in your Backup Job. This is an example for Veeam Backup:
Note: This issue will also appear in other Backup Tools (all use CBT to backup VMs) if uses VMware Tools quiescence.
If you already have VMs with this issue, besides the workaround to prevent the issue, you will need to reset the CBT. I have written a article regarding how to do this in the previous CBT issue: CBT bug and fix – Reset CBT.
Note: No reboot is needed to reset CBT in the VMs, since CBT will only reset in poweron VMs.
You can encounter other symptoms with this CBT issue. You will have an abnormal amount of data to backup in a normal Incremental Job(we did not notice this problem in our environment). The size is almost like a Full Backup. And of course Backups will take longer than usual in a Incremental job.
Until now I only seen this issue in our environment VMs with Virtual HW 9,10 and 11. Also only in Windows Guest OS VMs. Until now did not see any Linux VMs with this issue. Looking at the original article from Andreas Lesslhumer, were I read this issue for the first time(when I was informed by Anton Gostev in is Veeam weekly newsletter), he also seen the issue with the same HW version and Guest OS.
The workaround for now to prevent this issue is to disable VMware Tools quiescence (not an option for us) or rollback the ESXi version(before the Express patch).
For now these are the only options we have to fix the issue, or to prevent.
Final thanks again to Andreas Lesslhumer to provide the feedback and knowledge publishing this issue for the first time.
Final Note: Hope VMware quick fix this issue, but also organize their Quality Control when new ESXi versions and providing patches, or bugfix.
Note: Share this article, if you think is worth sharing.