/vSphere 7 Update 2 loses connection with SD Cards a Workaround

vSphere 7 Update 2 loses connection with SD Cards a Workaround

If you are one of the unlucky ones with vSphere 7 Update 2 loses connection with SD Cards a Workaround, then I feel your pain. If you upgrade from vSphere 6.7 to 7 update 2 or install vSphere 7 Update 2 in an SD Card, then have or will get this huge problem soon.

Note: This problem is not related to vSphere 7 new partitions or /scratch or /coredump partition when running on an SD Card. The VMKernel.BOOT.allowCoreDumpOnUsb issue that we had in vSphere U1, fixed in U2 and is not related to this issue.

Information about these: kb2077516 andย  kb83376 or in the What is new,

More information about this issue: KB83963 KB83782

UPDATE 20/01/2022

There is already a patch to fix this issue. You use the patch, instead of this workaround.

ESXi 7.0 Update 2d | 14 SEP 2021 | Build 18538813

Check here in my other blog post about the patches for this issue and how to apply.

Note: Some unofficial statements from VMware employees say that a new patch to fix this issue(and others) will be launched on 15th of July. Let us hope so.

I move all our scratch and coredump partitions to a Storage Datastore, so not running any of those partitions or logs on SD Cards.

What is the problem?

vSphere 7 Update 2 when is running on an SD Card simply loses connection to the SD Cards, and ESXi host freezes. Since there is no access to the SD card and system partitions, the ESXi host hangs, and it is impossible to do anything. All VMs continue to work but not able to power down, power up, no migrations anything.

Then ESXi hosts reach 100% of the CPU, and all VMs have a huge impact on performance. Since it is impossible to power off or migrate any VMs, the only option is to do a hard reset to the server and leave HA to restart VMs on another ESXi host.

As everyone knows, doing this in a Production environment with hundreds of VMs has a huge impact on the company and its running systems and applications.

This is the first thing you see when your ESXi host has the issue.

 vSphere 7 Update 2 loses connection with SD Cards a Workaround

Checking logs, you will see a lot of this:

Particularly this one: NMP device “mpx.vmhba32:C0:T0:L0” state in doubt;ย ย  Meaning that SD is no longer available by the system.

After you will lose your ESXi host.

VMware first stated that it was a vendor problem (direct me and others to troubleshoot in the wrong place). Wrong, this is a VMware issue, and it seems to be with the vmkusb driver that triggers this problem.

This is one of the worse bugs that I have seen in a long time in VMware and was also one of the worse issues I have worked with and troubleshoot for a long time and to find the root cause. I have spent so many hours on this issue, that I cannot count.

Initially, I thought this was a problem with HPE (only had this problem with HPE Servers, since my DELL servers run vSphere on SSD Local disks) with the controller driver(vibs) or even the iLO driver (we had updated a week or so before).

Replaced with these updated vibs

  • Broadcom-ELX-lpfc_12.8.329.1-1OEM.700.1.0.15843807_17657023.zip
  • fc-enablement-component_700.3.7.0.5-1_17477831.zip
  • hpessacli-component_5.10.45.1-7.0.0_17771110.zip
  • ilo-driver_700.10.7.0.6-1OEM.700.1.0.15843807_17481969.zip
  • sutComponent_700.2.8.0.20-0-signed_component-17782108.zip

I replaced some vibs from HPE, applied new updates, and thought I was good, and the problem was fixed. WRONG!

The problem is that you need many hours for the issue trigger. It can take 24/48h to return, and if you apply a solution or workaround, you need to wait until the system breaks again or is good.

In forums, we see many people with this problem(in HPE and DELL servers) and many systems down because of this. In our case, we have almost 40 servers with this problem. It is a huge problem and a huge impact on us.

VMware now stated that this problem was not in vSphere 7 Update 2 but also was in U1. Well, I think that statement is not 100% true. We had our systems running without a problem with U1. Only when updated to U2, the problems started. And reading a lot of the forums out there, everyone is stating the same. With U1, no one had this problem. It only started after applying U2.

We could rollback to U1, and we are all good. But the problem is that when you apply new vibs and updates, some of them change the build number, and now you cannot rollback because it will not arrive in U1 but the initial U2.

In your case, if you did not apply any new drivers and updates, and you can rollback to U1, do it. It will save you a lot of headaches. Check HERE how to do that.

So the only solution, for now, is to reinstall all ESXi hosts with a vSphere 7 U1. For environments that have 100 or more ESXi hosts, of course, this is a huge problem. Even for my 40, if a problem. I already reinstall a couple to check the behavior, and even one with a fresh vSphere 7 Update 2 install (since all of the ESXi hosts were upgraded from 6.7).

So, let us talk about the workaround.

Disclaimer: Please use these tasks carefully and always test before you do anything in your production environment.

The only way to bypass this issue vSphere 7 Update 2 loses connection with SD Cards a Workaround is using this workaround without the need to reinstall the ESXi host.

First, when the ESXi hosts are frozen, and you cannot do anything to be able to migrate the VMs without an outage, login to the ESXi hosts console and run:esxcfg-rescan -d vmhba32 and then esxcfg-rescan -a vmhba32.

You will need to run the first command a couple of times until it finishes without an error.

Or

You need to give some minutes between each time you rerun the command. Be patient and try again in 2/5m.

After all, errors are gone and the command finishes without any error, you should see in logs that “mpx.vmhba32:C0:T0:L0” was mounted in rw mode, and you should be able to do some work on the ESXi hosts again.

If you still have some issues, restart the management agents.

You then should see this error on the ESXi hosts.

 vSphere 7 Update 2 loses connection with SD Cards a Workaround

After this, you should be able to migrate your VMs to another ESXi host and reboot this one. Until it breaks again in 24/48h

But after you have all your ESXi hosts running without the issue and you see no quick stats error on the ESXi host monitor tab, you can apply a workaround proposed by VMware.

This is to set the RamDisk to enable or created if it doesn’t exist (with the issue and mounting SD Card again, it can disappear)

You can check that by running the command: esxcli system settings advanced list -o /UserVars/ToolsRamdisk

Int Value by default is disable (0), and we need to set it to Enable (1). You can do that with this command: esxcli system settings advanced set -o /UserVars/ToolsRamdisk -i 1

 vSphere 7 Update 2 loses connection with SD Cards a Workaround

But if you have many ESXi hosts(dozens of hundreds), doing this manually is not good. So I created a small script that you do this.

The script checks if the Ram Disk exists. If not, it will create and check if set to Enable; if not, it will Enable. It will create a small text file to track what change was and in which ESXi hosts.

Note: The actions are commented, run first, and check the file and the created information. If you are ok with that, uncommented and rerun it.

But honestly, I cannot say 100% that this workaround works in all systems. I have done it and don’t see any issues now, but I don’t be surprised if the issues return in more than 24h the issues return.

So, it is ok for now, and let us hope this workaround works in all systems.

UPDATE 24/05/2021:

Just a quick update on this issue. Ater my workaround, in all 30 ESXi hosts that I have applied, after 4 days one ESXi host had the issue again. Need to double-check why this one and if I miss anything on this one for having the issue again.

Also, I would like to say that I am not using the new vmkusb vib driver that VMware support is providing (I did not see any feedback that this driver fixes the issue).

I have systems where I apply the workaround using both versions.

ESXi hosts that I apply the latest VMware updates are using: 0.1-1vmw.702.0.0.17867351
ESXi hosts with no updates are using: 1-1vmw.702.0.0.17630552 (from update 2 only)

The vmkusb vib driver provided by VMware support is: 0.1-2vmw.702.0.20.45179358

UPDATE 05/06/2021:

After a couple of weeks on this issue, the workaround fixes the issue but only until anyone tries to upgrade VMware Tools. This will trigger the issue immediately.

I ask Linux Teams and Windows Teams to upgrade some VMware tools from VMs from a specific ESXi host, to test, and 24 after voila that server has the issue.

ESXi hosts will have the issue in 20/24h after trying to upgrade the VMware Tools. Even the ToolsRamDisk above is set properly. Then we need again to run the esxcfg-rescan -d vmhba32 to fix the ESXi host and then reboot.

After more than a month since vSphere 7 update 2 was launch VMware still did not present a fix for the huge problem.

So what I have seen until now, the issue is in the vmkusb vib driver but also in the ToolRamDisk.

Latest UPDATE 05/07/2021:

Like I have already stated in my comments in this blog post, these are the tasks that I notice can trigger faster the issue.

    • Upgrading VM VMware Tools.
    • Deploying OVA/OVF appliance in this ESXi hosts.
    • Backup VMs with my Veeam Tool
    • When using vCloud Director on this ESXi hosts, the tasks and processes in the vCD will trigger faster the issue.

If the above tasks are done, we notice in less than 24h in one(or more) of the ESXi host in the Cluster.

If these kinds of tasks are not done, we have 1 or 2 per week in some of the Cluster (others 2 or 3).

For last, in a couple of vSAN Cluster, we rarely see this issue. That is something I was not able to understand why yet.

We can do the only thing besides reinstalling all ESXi hosts with vSphere 7 update 1 and wait for the fix from VMware. It was promised that it would be addressed in Update 3, but I think they will launch a fix for this. I hope so.

I hope this blog post about vSphere 7 Update 2 loses connection with SD Cards a Workaround. It can be useful to troubleshoot and apply a workaround on this big problem with this vSphere Update 2.

Meanwhile, you can check a good blog post from David about the vSphere 7 Update 2 and all the changes on the partitions and also regarding this issue.

Share this article if you think it is worth sharing. If you have any questions or comments, comment here or contact me on Twitter.

ยฉ2021 ProVirtualzone. All Rights Reserved
By | 2022-01-20T15:03:02+01:00 May 21st, 2021|VMware Posts, vSphere|190 Comments

About the Author:

I have over 20 years of experience in the IT industry. I have been working with Virtualization for more than 15 years (mainly VMware). I recently obtained certifications, including VCP DCV 2022, VCAP DCV Design 2023, and VCP Cloud 2023. Additionally, I have VCP6.5-DCV, VMware vSAN Specialist, vExpert vSAN, vExpert NSX, vExpert Cloud Provider for the last two years, and vExpert for the last 7 years and a old MCP. My specialties are Virtualization, Storage, and Virtual Backup. I am a Solutions Architect in the area VMware, Cloud and Backup / Storage. I am employed by ITQ, a VMware partner as a Senior Consultant. I am also a blogger and owner of the blog ProVirtualzone.com and recently book author.

190 Comments

  1. Lou Corriero 21/05/2021 at 16:49

    There is a patch for this if you contact support.

    • Luciano Patrao 21/05/2021 at 17:10

      Sorry I have been in contact with VMware support for some days now, for this particular issue there is no fix. They have a new vmkusb driver that we can install, but the issue happens again.

    • Mike 29/07/2021 at 20:01

      To be exact, this is what you’ll get from VMWare as of July 28, 2021.

      This is XXXX from VMware Technical Support and I will be assisting you in the support request #XXXXXXXXX.

      Kindly note that this issue is top priority for our engineering team and it will be fixed in the next release which is 7.02P03. Unfortunately, we don’t have the exact ETA but the engineering confirmed that it should be very soon.

      You can subscribe to this kb article to be notified of the release: https://kb.vmware.com/s/article/2143832.

      As for now until the release, to workaround this issue you can use the commands below if the issue happened again:
      esxcfg-rescan -d vmhba32 and then esxcfg-rescan -a vmhba32 (You will need to run the first command a couple of times until it finishes without an error.)

      Then perform a restart of the management agents by running the following commands:
      /etc/init.d/hostd restart
      /etc/init.d/vpxa restart

      Another workaround is that, if the previous version that was installed on the host was ESXi 7.0 U1 we can roll back to that version and the issue should go away.
      To roll back, please check this KB article: https://kb.vmware.com/s/article/1033604?lang=en_US

      Please let me know if you have any question and if any assistance is required from our end.

      • Luciano Patrao 29/07/2021 at 20:23

        It makes me smile the reply and the “solution” that VMware provided. Almost a month after I wrote that here ๐Ÿ˜‰

        • Mike Stone 03/08/2021 at 20:07

          Also funny, in describing my issue I simply noted the url to your forum post. LOL

          • Luciano Patrao 03/08/2021 at 20:12

            Well is not the first time that VMware support proposes my blog for a possible solution.
            Beginning the year they reply to one of my support ticket with a blog post that I have written. Need to remind them, that I wrote that ๐Ÿ™‚

            A couple of years ago, they did the same for an issue I had with vCenter and found a workaround, they reply that was no solution yet but there was a workaround and send my own blog post ๐Ÿ™‚

            Is ok by me, as long it helps people with issues. But they should have their own solutions. I think ๐Ÿ˜‰

  2. Lou Corriero 21/05/2021 at 16:51

    Hi Luciano, we are actually a VMware Cloud Partner Provider and this hit us really hard. I would like to know if we can connect to discuss this issue?

  3. Lou Corriero 21/05/2021 at 16:54

    Cause:
    As of 7.0 Update 1, the format of the ESX-OSData boot data partition has been changed. Instead of using FAT it is using a new format called VMFS-L. This new format allows much more and faster I/O to the partition.

    This ESX-OSData partition is where frequent data is written and combines the product locker and scratch log partitions which were used in previous versions of ESXi.

    This partition is more commonly seen as the /scratch partition.
    โ€ข The level of read and write traffic is overwhelming and corrupting many less capable SD cards.
    โ€ข The current versions of ESXi OS 7.0 Update 1/Update 2 are no longer throttling I/O to local boot drives.
    โ€ข The advanced setting to throttle I/O to local boot drives has been removed in 7.0 U1/U2
    Please feel free to get back to me if you have any questions.

    • Luciano Patrao 21/05/2021 at 17:15

      This is not the same issue. That issue was fixed in U2.

      This issue will happen in new implementations or upgrades. As long you use U2. If you are using U1 you will never see this problem. The /scratch and /coredump is a different issue that can be fixed by move the partitions and use the allowBoot in the ESXI boot.

  4. Lou Corriero 21/05/2021 at 17:17

    They did that to us as well and we finally were put in contact with the storage team and they provided new bootbank vmkusb drivers for U1 and U2 – this throttled the IO and resolved our issue; however, we did have to replace a lot of SD cards that were corrupted. Also, the patches had to be installed fresh on some “broken” hosts.

    • Luciano Patrao 21/05/2021 at 17:25

      On some of the broken ones, I reinstall with U1 nos issues with the SD cards, no card corruption, and no issues with the vmkusb(in DL360 G9 and BL430 G9). Only if I use the U2.

      So for me I not an SD Corruption issue, because the SD is good(if it works fine with U1), but the driver.
      But yes the SD can be corrupted in the process. But is not the main issue here.

      But with the workaround that I explain here, I don’t see any issues until now. So will wait until Monday to have 100% sure that this worked.

  5. Sergey U 24/05/2021 at 13:43

    Big thanks for your work and workaround!
    I have been struggling with this problem for a long time on Dell servers, and the server can work for me at least 2-3 days before this problem occurs. for example, the server is frozen today with 30 day uptime. I spent a lot of time updating all the drivers/firmvares and stuff.
    I use a Dell custom image build 17630552

    Recently vmware releaset 7.0U2A
    have you tried this image? HPE Custom Image for ESXi 7.0 U2 Install CD 2021-05-18

    I’m waiting for when it comes out for Dell

    • Luciano Patrao 24/05/2021 at 17:23

      Hi Sergey,

      Thanks for your message.

      Yes, I have installed a fresh install from that VMware-ESXi-7.0.2-17867351-HPE-702.0.0.10.7.0.52-May2021. No issues on that server. But I also apply the workaround in this server. So honestly I cannot answer if can fix the issue without the workaround. Since I am tired of o of this issue I will not do any more changes that will break my servers again. The workaround is working, so until I have issues again, or VMware has an official update for this issue, I will not touch and do any more changes on my servers.

      There is also a new vmkusb version that VMware provides when you open a Support ticket, VMW_bootbank_vmkusb_0.1-2vmw.702.0.20.45179358 that they say it will fix it. But I did not try and honestly, I will try for now. I will wait for some feedback or we have an official update with that new vi driver.

      If you need any help, you can send me an email I will try to help you.

  6. Oliver Antwerpen 24/05/2021 at 22:51

    I am facing the issue in HPE Custom ISO fpr 7 Update 2a and Synergy SSP2021.05.01. We have opened a Case at HPE and I think we will downgrade to 7 Update1 latest until a fix is published.

    • Luciano Patrao 24/05/2021 at 23:03

      Hi Oliver,

      If you still can, then yes should be your first option. Better than applying workarounds or drivers that are not eat tested 100%.

  7. David Onken 25/05/2021 at 16:19

    Support provided an older vmkusb .vib (701.0.0.44485813) and this has resolved our issues (along with the ramdisk changes mentioned). Obviously we’re not proceeding further with 7.0.2 until this is completely addressed by VMware.

    • Luciano Patrao 25/05/2021 at 16:50

      Hi David,

      Thanks for the update and for sharing the information. All information is very important in this issue.
      Did you apply this old version in your U2? If yes, how long are they worming without any issues?

      Thanks again for the update.

      LP

      • David Onken 25/05/2021 at 17:02

        Yes, we applied it to U2. Not ideal situation but two weeks now and no issues. PS: We have HPE hardware.

  8. Iwik 31/05/2021 at 09:22

    What is strange, we have seen this issues on HPE running 6.7. Issue started two patches ago. It is not so frequent, it was after 20-30 days and on 4 servers. Opened vmware support case and result was hardware failure – I don’t think so.

    • Luciano Patrao 02/06/2021 at 08:55

      Hi,

      I remember that was a similar problem back then in 6.5 I think (I think I wrote something about it). We have a lot of 6.7 and did not see this issue in any of those ESXi hosts. And I did not see in any of the forums people having this issue with 6.7.
      So your issues I don’t think are related to this particular issue.

  9. Javier Flores 01/06/2021 at 10:35

    Luciano, thank you very, very much for putting all this information online. It saved us from a big outage today.

  10. Kris 02/06/2021 at 17:43

    We have a Cisco UCS and updated to 7.0 U2 last week and the issue came up yesterday. Already had the scratch disks moved so trying the Ramdisk workaround. Thank you for posting this information. We were able to keep our production workload online even when the host showed disconnected and recovered. Any update on a new driver?

    • Luciano Patrao 03/06/2021 at 14:46

      Hi Kris,

      Until now I don’t have any updates. Still waiting for VMware to launch vSphere 7 update 2b (I hope). Since I have heard that they plan to launch update 3 in August and I hope they do not wait until then to launch a fix for this that long.

  11. Jean-Sebastien D'Amours 02/06/2021 at 22:26

    I had that issue on May 21rst with a brandnew Dell R740 freshly installed ESXi 7.0 U2a 24h ago. The problem occurs during the offline migration of a vm. I was not understanding what was happening. I tought that I was badlucky and got a faulty PERC controller or bad disks or other hardware problems. vmware completed the migration of the vm and removed it from the source server. Once done, I try to start it but all seems to be freezed. Finaly the vm starts and when I shutted it down, nothings happens. Try to force to stop but options becomes greyed out. Try to browse the datastore and cursor was spinning and nothing was showing. Try to restart the ESXi via vsphere client and nothing happends. Try to restart from ESXi console and a message appears that said I don’t have rights to do that…Finaly push the power button until it power-off and restart it. At that moment, I was not sure if my new server was better the old one, I had doubts !

    Once returned back to normal, I found my migrated vm “orphaned”. Try to reintegrate it into vcenter and it desapears from vcenter and from datastore. Had to restore it from my backups.

    Finaly, I completed the migration without any problems in the week end. On Tuesday 25th, Veeam was performing backups and all the process freezes. I looked at this to understand what was happening, using that software since 2014 without any issue, it appears that the problem is back. In my search on google, I found your blog with a description of the problem I had. At first I was rassured that it was not a hardware problem and also to know the source and the work-around command lines.

    Thank you very much for that post.

    • Luciano Patrao 03/06/2021 at 14:50

      Hi Jean,

      I am glad to help.

      Regarding the orphaned VMs should still exist in the datastore if you browse them looking for the folder/files for that VM you should have it and you just need to add them back to inventory (removing first the orphaned). If you restored you should double-check your datastores so that you don’t have any trash files(orphaned VMs that are not used anymore) that is consuming space in your Storage. Veeam ONE has a good feature to find this trash files.

  12. Jeff Creek 03/06/2021 at 14:48

    FYI –
    esxcli system settings advanced list -o /UserVars/ToolsRamdisk
    Unable to find option ToolsRamdisk

    Dell hardware
    VMware-VMvisor-Installer-7.0.0.update02-17867351.x86_64-DellEMC_Customized-A03.iso

    So, rolling back to U1 may be my only option.

    • Luciano Patrao 03/06/2021 at 14:54

      Hi Jeff,

      If you read the script that is what it does. Double-check if exists, then if not, it will create the ToolsRamdisk. But like I said, if you have the option to rollback, I would do it.

      • Jeff Creek 03/06/2021 at 15:10

        Hi Luciano,

        This is preprod. So, I plan on going back to 7u1.

        Thanks!

  13. Jeff Creek 03/06/2021 at 14:50

    Looks like you have to add it yourself.

    https://kb.vmware.com/s/article/83782

    • Luciano Patrao 03/06/2021 at 14:56

      In my infrastructure only one ESXi host did not have the ToolsRamdisk, the rest all was ok no need to add.

  14. Jone Merakus 03/06/2021 at 20:47

    Thanks for this! We encountered the problem yesterday after updating this weekend to 7 U2.

    What exactly does creating the RAM Disk do however? I’m a little confused about *why* that resolves the issue.

    • Luciano Patrao 03/06/2021 at 22:14

      Hi Jone,

      ToolsRamDisk doesn’t fix this particular issue. It fixes one of the issues that we encounter when using vSphere 7 update 2.
      Is part of some changes and tasks that we need to do to workaround the issue with SD cards.

      About Rmadisk, to make short, here you can have a small explanation of what is and what it does HERE

  15. Jone Merakus 04/06/2021 at 19:37

    Thank you for the explanation. As a point of clarification, do you have to reboot the host after setting the RAM disk?

    Thanks!

  16. Enrique Alonso 07/06/2021 at 10:39

    This post is pure gold! thank you very much!
    Thanks to it I was able to bing back to life our hosts and migrate the VM without problem.
    I had a support case with vmware and the lack of interest and effort they put into it was remarkable.

    • Luciano Patrao 07/06/2021 at 22:06

      Thanks for the reply.

      Yes, I agree. I still not fully understand VMware position in this huge issue.

  17. Fred Vertuel 07/06/2021 at 15:09

    Hello Luciano,

    Same issue on 18 Dell Poweredge r440, with Dell custom image 7.0.2a…

    The toolramdisk option didn’t fix the issue for at least 5 servers (I activate the option 2 days ago). I have installed the vmhusb vib update provided by VMware and mentionned in your post. Servers with VIB updated are working for 2 now, but I got warnings on those specific servers… My case is still open with vmware… Wait and see

    Thankls for you post, it allows me to get faulty host back to life, and at least reboot it properly

    • Luciano Patrao 07/06/2021 at 22:07

      Mine is also open and now I receive an email that since there were no updates, it went to archive ๐Ÿ™

  18. Patrick Long 08/06/2021 at 06:10

    Luciano – this is incredibly well-written and great information that really helped me. I’m in a bit of a dilemma as I can’t roll back. I upgraded my HPE Synergy hosts to U2 (image HPE-Custom-Syn-AddOn_702.0.0.10.7.5-14) to escape an issue I was having with Synergy nodes on prior image HPE-Custom-Syn-AddOn_700.0.0.10.5.6-19 where under I/O load hosts were showing very high KAVG values – even a single vm of minimal I/O was showing >1 KAVG which should never happen. Upgrading resolved the KAVG issue but just tonight I have had my first host lose connectivity to micro-SD card boot device – which I resuscitated using your information above. I have a support call with VMware tomorrow and will advise here if I get any additional information or confirmation of the /UserVars/ToolsRamdisk workaround.

    • Luciano Patrao 09/06/2021 at 08:37

      Hi Patrick,

      Yes, vSphere 7 update 1 had some issues and also in the partition’s boot that is why VMware decided to change the partitions structure in update 2. A big change and should be done differently not in an update.
      What I know until now is that after vSphere 7 update 1c customers see issues. In my case, was running this version before in almost 100 ESXi hosts without any problems.

      So if you cant rollback, you have only two options here, continue with update 2 and will fix the issues from time to time until there is a proper fix for this, or install from scratch all ESXi hosts with update 1 or update 1c.

      Unfortunately is the only option we have at the moment.

      Thanks for your update.

  19. fvertuel 08/06/2021 at 13:01

    Have you finally try to install the VIB updated? and if yes, what was the result on your side? thx

    • Luciano Patrao 09/06/2021 at 08:38

      No, I decided not to put more stress and issues in my infrastructure. Since I have been very busy, did not even test this in a test server.

  20. Georgi Petkov 08/06/2021 at 16:28

    Please add this command:
    esxcfg-advcfg -A ToolsRamdisk –add-desc “Use VMware Tools repository from /tools ramdisk” –add-default “0” –add-type ‘int’ –add-min “0” –add-max “1”

    ๐Ÿ™‚ thanks for this you’ve saved my life ๐Ÿ™‚

    • Luciano Patrao 09/06/2021 at 08:42

      Hi Georgi,

      That is what my script does. It adds ToolsRamDisk if doenst exists in the ESXi hosts running in SD Cards.

  21. Adrian James 08/06/2021 at 19:37

    Same issue here on Cisco UCS B200 blade servers booting from SD card. Thanks for your rescan workaround, it helps to recover the host enough to evacuate it. I rolled all prod back to u1 but 4 hosts had the same u2 update on both bootbanks, so they will be a rebuild back to u1. My ticket with VMware is also going nowhere.

  22. Patrick Long 09/06/2021 at 20:21

    I am implementing the /UserVars/ToolsRamdisk recommendations now because I can’t roll back to 7.01 (installed Nimble NCM after upgrading, so now both bootbanks are 7.0U2a), but it is interesting that the KB 2149257 describing this process does NOT even mention ESXi 7.x in the “Related Versions:” section, only ESXi 6.5 and 6.7. I am interested if anyone knows whether the “read operations to the SD card to access VMware tools” could still an issue for installations that have redirected productLocker to a shared storage location, as we have, seen in ls -n : productLocker -> โ€œ/vmfs/volumes//SharedLockerโ€, or if this method is an equally effective alternative to loading vmtools in a ramdisk in terme of reducing I/O to the SD boot media? Another question I have is regarding the new DRS mechanism using the vCLS vm’s (over which you as the vSphere admin have NO control over initial deployment location) – perhaps these system-generated DRS vm’s are ending up on the SD card boot media somehow? See https://www.yellow-bricks.com/2020/10/09/vmware-vsphere-clustering-services-vcls-considerations-questions-and-answers/ . I find these vm’s littered about thoughout my cluster’s datastores, even those meant for short-term existence like mounted SAN snapshot datastores. Super-frustrating that there is not an option to limit the scope of these deployments via a “Use datastores only from the specified list” similar to the HA heartbeat Datastores selection policy.

    • Luciano Patrao 10/06/2021 at 16:33

      Hi Patrick,

      For the ToolsRamDisk and /scratch partition move to a datastore, unfortunately, yes we need. All my ESXi hosts have their locker in a partition file set in a Datastore and still, I need the ToolsRamDisk implemented.
      Like I said in my article, if you are using SD Cards for your ESXi host installation, then you always need to move the .locker to a datastore so that you don’t have issues (regardless of the update 2 issue).

      For the SD cards vs DRS vCLS VMs, how can those VMs move to SD Cards? That could be true if you are creating a datastore with the free space of the SD Cards(something you should never, never do). But vCLS is set in shared Storage to run, only uses local datastore if you do not have a Shared Datastore.

  23. Patrick Long 11/06/2021 at 18:09

    Agreed about the vCLS vm’s – they could not be on the SD card unless there is a datastore there, which should never be done. My only point is that this new “feature” for DRS is not terribly well documented so who knows what if any hooks if any there could bee to a host’s boot device, heartbeating it for host liveness, etc.. Probably none but I’d like to know for sure.

    On the other issue /scratch partition, yes I also agree the symbolic link to scratch should ALWAYS be pointed to a datastore backed by high-endurance media so that .locker is not on the USB or SD boot media – and I do this as well. But what I was talking about instead was the symbolic link to productLocker, which I believe is where the host will look for it’s local vmTools bits to compare against the running tools version on the vms it hosts. By default in a vanilla ESXi installation this will be pointed to vmTools on the boot media; Per KB 2129825, I have changed this symlink to a folder on a datastore accessible by all my hosts so that when a new vmTools is released I simply upload the new tools bits once to the shared folder and restart mgmt agents on my hosts (or move vm’s around to new hosts), and instantly all my vm’s tools current version status is compared against the new version in the shared location – NOT whatever tools version came with the ESXi version on the host that the guest vm lives on. In this way, I can have hosts of various versions in the same cluster (during a round of ESXi upgrades for example) or even entire clusters of different versions (like a cluster of Gen8 running ESXi 6.5) and no matter which host a vm moves to, its tools versions is always compared against the tools version in the shared location, not the local version on the host. In this way they do not switch VMware Tools Version Status from “Current” to “Upgrade Available” as they move between different hosts or different clusters if such a move becomes necessary.

    I’m not sure how common it is to do what I described above, but where it IS relevant to this discussion of lowering I/O to the boot device to prevent disconnection when vmTools upgrade is invoked (as you have found) is that I suspect having the vmTools on a shared datastore location is already *functionally equivalent* and just as good of a remediation as enabling /UserVars/ToolsRamdisk, since my hosts should not be hitting the local tools bits on SD for either vm tools upgrades or for tools version comparisons against running vms. I have asked GSS for their opinion on this. I will enable UserVars/ToolsRamdisk on a host and see if the productLocker symlink changes to something other than my shared datastore location…

    • Luciano Patrao 11/06/2021 at 18:32

      Good point on that Patrick, but I never used that kind of configuration so I cannot answer that question.
      First is too much manual work, second, we do not mix versions in Clusters and we should since this is not the best practice.

      And honestly, I never touched the symlink. And if I had some time I could test that solution to see if I get any results when upgrading VMware tools. But at moment I don’t have time for that.

      Would like to finish by thanking you for your great contribution on this subject and for providing very good information.

    • Leo Kurz 20/06/2021 at 12:52

      Any news on activating the ToolsRamdisk after redirecting the ProductLocker to a shared disk?
      I have no experience which ESX 7, but I have installed ESX on SD/USB devices for many years and there’s no way of replacing all boot devices in every server just to update to version 7. From what I understand so far, redirecting the scratch partition (KB1033696) and redirecting the ProductLocker (KB2129825) to shared (SAN) device should solve the problem. I used both so far until 6.7 but I’m not totally aware of the implications with the new partition layout. Perhaps someone could help/clarify:
      – Would redirecting both links with adv. settings to a capable shared disk/LUN solve the problem?
      – Would the RAMdisk be still necessary?
      – Up to now, redirecting scratch also redirected log and coredump. Is this still valid?
      – From what I understand, when you set the advanced parameter “/UserVars/ProductLockerLocation” and reboot, the redirection of the symlink is not necessary
      As I use scripts to assist deployments, the above changes would not be a major effort and would solve the problem in a supported (KB) way w/o any workarounds, downgrades or special patches from support.
      Any ideas/input?
      __Leo

      • Luciano Patrao 21/06/2021 at 17:29

        Hi Leo,

        Lot of questions, let me try to answer.

        First, best practices say that you will always should have the locker/coredump in a datastore(if you are using SD Cards). That is the previous of this U2 issue. So you should always set that.
        — Would the RAMdisk be still necessary
        Yes in the KB from VMware I don’t see that is one or another.

        I have in my list to create a second blog post about this issue with some updates and also scripts that will automatic create all these changes. First, time is always an issue and also I am been a bit ill, so this is why I don’t reply too fast and did not write any big updates in the last 2 weeks.

  24. Perttu 14/06/2021 at 13:47

    Many thanks for this blog post. It helped us a lot!

  25. Philipp Menzi 14/06/2021 at 15:59

    I have the same problem here on two customers. One has HPE Hardware with SD Card ( upgraded from 6.7x to the newest version 7.x ) other Customer with Cisco HW ( upgraded from 6.7x to the newest version 7.x ). We have the same Problem on both customers ! No New patches are avaible to fix that problem.Thank for your blog, i try your fix and hopefully vmware will release a fix soon !

    • Luciano Patrao 15/06/2021 at 11:36

      There is no official date to launch update 3 with the fix. But VMWare employee wrote in the VMware communities forum that it will be launch in July.

  26. Wal Dimer 15/06/2021 at 02:26

    Thanks so much Luciano, we were tearing our hair out with this one.

    Just finished an argument with the VMware support team who couldn’t understand why I thought two SD cards in two different server generations wouldn’t just start breaking within a week of each other and that there might be more to it. Then I found your and others research.

    I too cannot understand why VMware haven’t dropped everything to fix this or generate a solid workaround. Must be too busy deprecating AD auth.

    • Luciano Patrao 15/06/2021 at 11:40

      Yes in my opinion, not the best support handling on this issue no.
      But I am glad that at least I am able to help some people with the workaround and have their systems back while we wait for the promised fix in July.

  27. Johannes Weidacher 17/06/2021 at 15:43

    Thank you for you post and also your PS but there is a small error
    If ($Setting.Value -eq $false)ย  {

    should be
    If ($Setting.Value -eq 0)ย  {

    • Luciano Patrao 21/06/2021 at 17:24

      Hi Johannes,

      Thanks for the command. But since 0 is false, and 1 is true, the command will have a false or true. So you can use both that it will work.

  28. Luke 18/06/2021 at 17:42

    We have encountered another novel error with v 7.0.2: unable to correctly remove snapshots and consolidate disks. Every snapshot created and removed, a disk consolidation warning appears and the machine must be turned off (!) to successully consolidate. This is on (3) DL380 Gen 9 booting from microSD and an MSA2040 SAN. After all the problems we have read about and the SD issues we reverted back to 6.7 and will sit and wait patiently as we don’t really want to beta test this for vmware on production servers.

    • Luciano Patrao 21/06/2021 at 17:21

      Hi Luke,

      When I apply the U2 I get a couple of those, but since this is normal from time to time I didn’t take much attention to that issue related to U2. But I did not get more of those after those initial ones.
      So I cannot state that this is an issue in the U2.

    • Lukas Lang 29/06/2021 at 10:25

      We had the consolidation issue after Upgrading from 6.7 EP15 to U2a. A fresh install of ESXi directly to Update 2a resolved the issue (along the hanging boot on the vmw_satp_alua loaded “successfully” bug)

      • Luciano Patrao 29/06/2021 at 12:33

        Hi Lucas,

        Always monitor those U2a installations. We have at least 10x new installs with U2a and still have issues.

  29. David Pasek 22/06/2021 at 15:23

    Hi Luciano,
    first of all, thanks for your blog post. Very informative and useful, mainly because the workaround which works for my customer experiencing the same issue.

    Disclaimer: I work for VMware as TAM

    You mention in your post not using the new vmkusb vib driver VMware support is providing. Is there any reason behind your decision?

    We are trying to get new vmkusb vib driver and validate if it resolves the issue.

    David.

    • Luciano Patrao 22/06/2021 at 17:39

      Hi David,

      First thanks for dropping by and for your message.

      I will give you a couple of reasons. First, my support ticket open was not handled properly and not 100% honest. When I show what was the problem and that is nothing related to a vendor or other issues. Or support wrongly pointing me to different paths and wasting time troubleshooting when the issue was completely different.
      Secondly, I have wasted so many hours on this issue, troubleshooting, testing, fixing, and finding a workaround for at least we have a way to put servers in production without having a huge outage in our running VMs, that I do not want to use vmkusb version that is still not 100% that will fix, or reduce this issue from happening.

      Even we have dozens of ESXi hosts, I do not have any ESXi hosts where I can test this properly(out of production) do honestly, I will not spend many hours again on this to try to fix it, when at least I have a minimum stable environment(the issue still appears time to time in a couple of servers, but is manageable.)

      Talking to some other customers using the new vmkusb, it did not fix the issue 100%.

      • David Pasek 22/06/2021 at 17:48

        Thanks for explanation. It makes perfect sense and thanks again for your hard work because I feel your pain.

        Anyway, I will work with my customer and VMware GSS to fully understand the root cause and fix because it is really annoying issue. I’ll keep you updated.

        David.

  30. David Pasek 22/06/2021 at 15:29

    Btw, recently I wrote the blog post “vSphere 7 – ESXi boot media partition layout changes” where is the section about various known problems you can observe when using USB or SD media.

    I’ve referenced your blog post and your workaround. I believe you don’t mind.

    My blog post is available at
    https://www.vcdx200.com/2021/06/vsphere-7-esxi-boot-media-parition.html

    • Luciano Patrao 22/06/2021 at 17:41

      Of course, you can. Is all about sharing content and helping. I will also include in my original blog post a link to your blog post.

  31. Jarrad 24/06/2021 at 03:28

    We just got ESX 7.0.2 build 18049868 from support that contains vmkusb 0.1-4vmw
    They said public release will be U3 ~15th Jul.
    Haven’t had a chance to deploy yet due to change control lead times so no idea if it’ll fix the problem

  32. floritto 24/06/2021 at 10:31

    Hi

    Thank you for this post Luciano, it helped us a lot.

    After we got no help from VMware for several weeks on this issue other than “wait for the the next release” we escalated through management.

    Now it turns out there is a hotpatch available for this. We only learned about this after escalating, support did not tell us about it.

    The hotpatch has to be approved on a per customer basis. We didn’t get it yet so I can’t tell if it really fixes the problem. Just wanted to let you know there is might be something available that solves this bug. Ask Vmware support about it.

    • Luciano Patrao 24/06/2021 at 14:04

      Hi Florito,

      Yes, I know that they are providing some new beta releases for some customers. But honestly I not implement anything of those not testing versions. And for sure I will not use my production environment to use as beta test for VMware.

      But thanks for sharing.

  33. Patrick Long 24/06/2021 at 18:42

    For Jarrad and anyone else with knowledge or who actually has their hands on the new ESXi build 18049868 with new vmkusb 0.1-4vmw due for public release mid-July – is there any indication of exactly HOW the issue was addressed? Does it resume the I/O throttling to USB present in prior releases or deal with the issue in some other more complex way? Were you given anything in terms of release notes or a fixlist that defines exactly HOW this problem was mitigated in the new build/driver?

    • Luciano Patrao 25/06/2021 at 15:43

      Hi Patrick,

      Those questions of course need to be ask to VMware ๐Ÿ˜‰

      But we hope we get more official information soon.

  34. Lukas Lang 29/06/2021 at 10:39

    Thanks for the great post and the Updates regarding this annoying issue. We have a Test Host installed on SD (BL460c Gen10) running about 35 VMs without any issues for 11 days, for now. We redirected productLocker for many years since there was no practical way to manage and centralize VMware Tools. Like others said before, redirecting the location of the Tools seems to reduce the IO Load heavily. Redirecting /scratch and coredump should also reduce unnecessary load. Hope they will get a fix and proper documentation on this, since our upgrade project is frozen now. In my 10 year VMware career I never experienced such a big issue and the initial U2 release is now almost !4! Months old. We used to have PSODs with faulty drivers, that were fixable, not SD cards getting stuck and shot in the nirvana.

    • Luciano Patrao 29/06/2021 at 12:34

      We have in one environment 10x BLC460c Gen9 and 10x G10, at least 2/3x times a week we have issues.

      and yes I agreed that is one of the worse bugs I have seen since I work with VMware. And this is since v2/2.5 ๐Ÿ˜‰
      ๐Ÿ˜‰

  35. robert 01/07/2021 at 23:54

    I’m facing this issue as well and have a support ticket open with vmware. So far they only provided the two rescan commands and restart commands as a workaround they say but after reading this article I’m guessing those only get it out of the unresponsive state and don’t fix the problem. Just wanted to post here so I could subscribe!

    • Luciano Patrao 02/07/2021 at 23:41

      Yes, the workaround copy from here ๐Ÿ˜‰

      No issues, is not the first time that VMware support provides one of my blog posts as a solution. I even open tickets in VMware and they have proposed me a solution that I wrote in my blog ๐Ÿ™‚

      PS: You don’t need to comment to subscribe. But glad that you did, all share is important.

  36. Matt 05/07/2021 at 12:23

    FYI: vmware support gave me this kb and a note that a patch is expected this month: https://kb.vmware.com/s/article/83963

    • Luciano Patrao 05/07/2021 at 15:24

      Thanks Matt, I will update the blog post with that KB.
      The information that I have is that it to be launch around 15th of July, Let us hope so.

  37. Alex W. 05/07/2021 at 16:10

    Hi,

    is there any update on this? Can you send a HPE case number for referencing ? We have the same issues on a few esxi. Can you explain what exactly is the problem coming from? We have re-directed the locker folder to local store – but the problem also come back after 24-48 hours. If i redirect all these Folder to local stores where is the “heavy load condition” ?

    Alex

    • Luciano Patrao 05/07/2021 at 22:32

      Hi Alex,

      I don’t have any HPE ticket open, only with VMware. Since its not a vendor issue, but a VMware issue.
      Changing the locker to a datastore is Best Practices when using SD Cards. Regardless of this bug.

      In this latest VMware KB, it explains a bit about the issue.

      Also like I stated in my blog post, upgrading VMware Tools, deploying OVA appliance, and also Backups (just notice and test this just a week ago) can trigger faster the issue.

      I think is because of the rw in the SD Cards (a bit explained in the KB).

      At the moment, after the changes that I explain in my blog post, I have 2 or 3 times the issue per week in about 50 ESXi hosts.
      I have also a couple of vSAN using BL460c G9 blades with 7 servers each, and rarely the issue is trigger. Still did not understand why (all have the same settings that I explain here).

      • Marco Corleone 06/07/2021 at 10:02

        We have the same issue on two of three hosts (brandnew Dell server from June 21). The trigger here is a vSphere shutdown task before backup with Veeam. This issues happens not all the time, but when it happens, the shutdown task was started at this time.

  38. Adam Tyler 13/07/2021 at 17:19

    This is unreal. I have this problem too, spent hours of my life reloading ESXi and removing VIBs I thought may be the root cause. Completely unacceptable that VMware still distributes this broken build of ESXi. I’m on VMware ESXi, 7.0.2, 17867351 ..

    Regards,

    Adam Tyler

    • Luciano Patrao 13/07/2021 at 22:26

      Hi Adam,

      Yes, I feel your pain. And also, I don’t understand why VMware is still providing this version with this big issue.
      Honestly, I think they are trying to do, what they did since the beginning, blamed vendors and the issue on vSphere 7.0.2a is just a consequence of the vendor fault, not directly from them. That is the only logic here, and is also what they have said to many customers when replying to support tickets (like mine).

  39. Adam Tyler 13/07/2021 at 23:03

    Do we know what previous build of ESXi is not impacted by this issue? I’m looking at downgrading at this point. Going to be painful, but….

    I’m on build: VMware ESXi, 7.0.2, 17867351
    it is definitely broken.

    Currently installed vmkusb vib:
    vmkusb 0.1-1vmw.702.0.0.17867351

    • Luciano Patrao 13/07/2021 at 23:16

      Is in the blog post ๐Ÿ˜‰

      Version VMware ESXi, 7.0.1.x

      Until version ESXi 7.0 Update 1c I did not see any issues.

  40. Adam Tyler 13/07/2021 at 23:30

    Man they have so many versions. Feel like their unpaid beta tester at this point.
    So you are saying “VMware-ESXi-7.0U1c-17325551-depot” and newer, bad?

    So release “VMware-ESXi-7.0U1b-17168206-depot” and older are good?

    Wonder how security patching figures into all of this. Like to patch the latest exploit in the 7.x branch do you need to be running the latest build of 7u2 or do they release security patches for the 7u1 branch?

    Regards,
    Adam Tyler

    • Luciano Patrao 15/07/2021 at 19:16

      You apply the latest ISO 7.01 using life cycle manager(previously VUM) and you are ok. If are going to apply all automatic patches, and not don’t do a manual upgrade with an ISO, you will get U2a.
      And then wait for the U3.

  41. Rob 15/07/2021 at 15:51

    Its the 15th! Any word on the patch release?

  42. Jason 15/07/2021 at 17:10

    Great article!!!! I kept running into this issue during snapshots being taken. I thought it was an issue with my backup software. You saved me a lot of hassle going through VMWare support. FYI, Dell has told us that they will not support vSphere 7 on SD cards. That probably explains the vendor finger-pointing that VMWare seems to be doing, and their “lack” of enthusiasm fixing this. Looks like both side feels that customers should not be using SD cards, which is odd. We’ve used them for years without an issue.

    • Luciano Patrao 15/07/2021 at 19:21

      Yes unfortunately is finger-pointing now VMware and vendors. But also Dell cannot just say they do not support SD cards for vSphere 7 when they sold servers with that, and for that. Yes is not their fault, but still, they cannot say that at this stage.
      If you can, and if your backup tool has this feature, use storage snapshots and not VMs snapshots. At least you will reduce the times the issue will appear.

      • Jason 23/07/2021 at 16:13

        Okay, wanted to give an update. Our host ran successfully for 8 days after applying the workaround. Backups ran normally. No issues. Then, we started to see “Aborted Disk Commands” on the USB device from our RMM monitoring. I looked into the host. I could still login. But when I went to logs, the screen would lock up for over 5 minutes. Even when on console window on the host via the iDRAC, looking at the logs menu would lock up the screen. When the menu became responsive again, I was able to enable SSH. I ran the “esxcfg-rescan -d vmhba32” and “esxcfg-rescan -d vmhba32” commands. I did get errors and was able to clear them. VMWare logs started to display again. “NMP: nmp_DeviceStartLoop:740: NMP Device “mpx.vmhba32:C0:T0:L0″ is blocked. Not starting I/O from device.”. So, it looks like this fix is temporary. Found an article on VNinja that shows someone who came across the 8 day issue, same as me. “https://vninja.net/2021/06/01/esxi-7.0-sd-card-issue-temporary-workaround/”. Looks like we HAVE to wait for that VMWare patch, which we will be applying ASAP.

        • Luciano Patrao 24/07/2021 at 02:15

          As I say in my blog post, my workaround is just that, a workaround to reactivate the ESXi host and be able to move VMs and reboot the server. This is not a fix or a temporary one.
          I am tracking my ESXi hosts issues and is random. It can take 48/72h or a week to have an issue. But never in the same ESXi hosts for at least 2 weeks.
          So yes, we are all anxiously waiting for the VMware fix for this issue,

  43. Jonny 15/07/2021 at 20:21

    Patch is postponed until late August …

  44. JD 16/07/2021 at 19:01

    Got an update from our TAM. ESXi 7.0 U3 has been pushed out Sept. 21, 2021 based on the beta testing results. Still waiting on issue/resolution for bootbank errors and SD card getting unregistered.

    • Luciano Patrao 16/07/2021 at 22:13

      Different dates, different rumors. My inside connections say the end of Augsut.
      Either way, is too long, too long…

  45. Adam Tyler 16/07/2021 at 23:09

    This sucks. At this point it looks like downgrading OR installing a single traditional SATA/SAS disk in each host is the only option. I mean, other than the workaround posted in this article.

    Can anyone explain to me how security patches work with vSphere? For example, if I downgrade to a build of vSphere 7u1 that doesn’t have this SDcard/USB problem, am I choosing to run a vulnerable ESXi build?

    It’s my understanding that the last ESXi 7 release that didn’t break SDcards/USB was ESXi-7.0U1b-17168206-standard (Build 17168206). Is that accurate?

    • Luciano Patrao 19/07/2021 at 15:42

      Yes, you can downgrade to that version since is the latest I test and was working without this bug.

      Check what was it vSphere 7 U2a release notes what was fixed. Some security yes, but nothing special.

      After if you want to apply the security, take very attention to the updates that you are applying for. Just apply the security, not Update patch etc., or you will go back to U2a again.

  46. kamil 20/07/2021 at 06:29

    they didnt release patch but they publish part of your workaround ๐Ÿ™‚
    https://kb.vmware.com/s/article/83963?lang=en_US

  47. Fish 23/07/2021 at 18:46

    Depending on the scale of your 7.02 deployments you all might consider putting in place nightly reboots of your clusters. Luckily we only have one cluster at this revision. We are supposedly on a pre-release of the patch with vmware or whatever support calls it. They were saying mid August for official release. I don’t know if that means a single small patch or U3. Thanks for posting this work around it is helpful!

  48. Adam Tyler 23/07/2021 at 23:40

    So this command has been saving my bacon lately.
    esxcfg-rescan -d vmhba32

    I really don’t want to downgrade and rebuild hosts. Is there no cron mechanism in ESXi? Seems like it would be a pretty easy fix to just run this command every couple of hours with a local cron job if supported. The PowerShell method is probably better if you have hundreds of hosts, but that isn’t me.

    Something like this maybe?
    https://vswitchzero.com/2021/02/17/scheduling-tasks-in-esxi-using-cron/

    Regards,
    Adam Tyler

    • Luciano Patrao 24/07/2021 at 02:19

      That is not a feasible option, since the commands can and should only be run when you have the issue. Running the issue without having the issue, will not prevent the issue to appear.
      And even if you have an issue, you should immediately reboot the ESXi hosts after you recover the SD Card.

  49. adam@tylerlife.us 24/07/2021 at 02:57

    Well, I’m going to give it a shot. My cron file looks like this on all of my hosts now. Running the rescan every hour.

    #min hour day mon dow command
    1 1 * * * /sbin/tmpwatch.py
    1 * * * * /sbin/auto-backup.sh
    0 * * * * /usr/lib/vmware/vmksummary/log-heartbeat.py
    */5 * * * * /bin/hostd-probe.sh ++group=host/vim/vmvisor/hostd-probe/stats/sh
    00 1 * * * localcli storage core device purge
    */10 * * * * /bin/crx-cli gc
    * */1 * * * esxcfg-rescan -d vmhba32 && esxcli system syslog mark –message=”Running esxcfg-rescan”

    Will let you know how it goes. Had some unplanned downtime this AM with two hosts going offline from vCenter’s perspective.

  50. Brandon M 26/07/2021 at 18:38

    We had this issue after updating a UCS cluster (8 blades, each with it’s own SD card) to 7.0.2 and was told by VMware to apply the ramdisk workaround, but after a couple weeks we had another host go down. So I can confirm that the workaround is NOT going to last forever and we really need a fix from VMware. My idea is to move away from SD cards and setup ESXi boot from SAN. Has anyone else tried this method?

    • Luciano Patrao 27/07/2021 at 11:35

      The workaround has two steps. One is to recover your ESXi hosts and be able to reboot and fix the issue (until it breaks again). The second one, ramdisk is to fix the issue and will at least extend the issue to happen more often. In some cases I have(2 or 3), it did fix until now. But is not 100% accurate that will work 70/80% of the time.

      But like is stated in the blog post, none of these workarounds are set to fix the issue permanantely.

    • Luciano Patrao 27/07/2021 at 16:48

      Sorry did not reply to your boot from SAN question.

      We have some Dell servers where we use boot from SAN. It works fine, but I always only used it if the Storage has High Availability (LUN replication etc).
      Because if you lose your Storage, all of your environment goes down.

      It depends on the type of HA you want in your environment. But it works fine, with no issues.

  51. Luke 27/07/2021 at 12:10

    Hi Brandon, regarding your question about booting off the SAN, yes we did it several years ago tihe the first generation blades (I believe BL460 and BL490) using QLOGIC iSCSI mezzanines and setting the controllers up with the boot LUNs on a P2000G3 SAN. This worked fine and would still do of course. I setup 8 dedicated LUNS (one per blade as we had a C3000) with a 32GB partition each. Never had an issue even if the cold boot sequence is not too fast, but that’s understandible. At the moment we are transitioning to M.2 SATA SSDs on Gen 10 Servers installed on the riser and ditching the SD. For older Gen 9 and Gen 8 machines the solution is installing a boot disk(s) or opting for a SATA SSD workaround which we need to test.
    It is quite obvious to me from VMware’s reluctant behaviour on releasing a quick fix on this issue not vendor related that the “smaller” fish are not of much interest any longer, its all big corporate project and installs they point on, at least that’s my take. This raises a big red flag.

  52. David Pasek 27/07/2021 at 22:49

    Hi Brandon. Especially Cisco UCS blade servers were designed by Cisco UCS designers (Silvano Gai and his team) to leverage Boot from SAN as a preferred boot method. Such method allows “Cisco UCS stateless computing”. Cisco UCS Service Profiles (Logical Server Specifications) and Boot from Fibre Channel SAN was my typical recommendation, design decision and implementation when I worked for Cisco Advanced Services as a UCS Consultant. It was 10 years ago but this is still in use on some customers I’m in touch till today with a huge success, because it is the biggest UCS advantage over other server vendors. To be honest, it was not my nor Dell typical recommendation on Dell Servers when I worked for Dell Consulting Services as it is not as native as for Cisco UCS but it is still the design option. Now I work for VMware, and we are hardware agnostic, however, ESXi 7 and newer will require ESX-OS-Data storage device ideally having 240GB+. In your particular case, if you have free 2TB storage space (8x240GB) on your FC storage, I would definitely consider Cisco UCS with Boot from SAN and leverage ~250 GB FC LUNs as boot devices. This will quickly solve your current challenge with VMKUSB driver. Btw, if you will leverage UCS Service Profiles, such design would enable you to rip & replace any physical UCS server with just reassigning Service (Server) Profile and you are done in few minutes as it can preserve HBA WWNs, NIC MAC addresses, hardware UUID, etc. UCS service profile also contains firmware and hardware settings but this is off-topic. In terms of ESXi 7 boot device, I have expressed my design thoughts on my blog post at https://www.vcdx200.com/2021/06/vsphere-7-esxi-boot-media-parition.html Hope you will find this helpful.

    • Luciano Patrao 29/07/2021 at 15:16

      Good reply David.

      Thanks again for contributing to the discussion and with good input to this blog post.

  53. Ross 28/07/2021 at 11:01

    One host got this issue today again, but this time even vmhba32 is gone. esxcfg-rescan gives “Error: Invalid adapter specified or unable to get adapter ‘vmhba32′”. The host seems still responding and VMs seem running fine. Is reboot the only option?

    • Luciano Patrao 29/07/2021 at 15:15

      You will always get errors. You need to wait some minutes and try again. After 3/4 times you run the command, the error will disappear. After you can vmotion the vms and reboot the server. Yes, reboot is the only option to ix the problem and you have your ESXi host back to operation.

  54. Florian 29/07/2021 at 15:21

    Similar problem here. HPE Standalone Host on Update 2(a). Boot from interal usb-drive.
    After a few days suddenly no Veeam Backup (Application Error, could not initiate NFS filestream from datastore), no manual Snapshot Creation on host itself (stuck at 0%). VM status not reported correctly (guest shutdown, vm still shown active).
    Logfiles inaccessible using SSH, session keeps hangig when accessing/listing filesystem.

    Current workaround: manual shutdown of every single vm, reset host, cold boot.

    As this happens every 2-3 weeks we’ll try to get along with periodic reboots until Update 3. Hope this will be fixed soon! Total mess.

    • Luciano Patrao 29/07/2021 at 15:48

      If you run the workaround running esxcfg-rescan -d vmhba32 that I specify here, no need to shutdown the VMs.
      After you have your ESXi host back, you can put in maintenance mode and all VMs will vMotion to a working ESXi host, then you can reboot the host.

      Sometimes when we do not run the esxcfg-rescan -d vmhba32 for a long time and the host is with the issue more than +\- 12h then is more VMs start getting stuck, some even can get invalid or orphaned. But after you fix the host, they will go back to normal.

      If you don’t want to power off the VMs(in my case most of my production VMs cannot afford a power off outside the maintenance window), you need to take patience a try and retry the command to recover the host without the need to power off VMs.

  55. Robert 29/07/2021 at 23:31

    I’ve never had to reboot my hosts after running the 4 commands. They run for days or weeks after that until I need to run the commands again. Veeam seemed to trigger the problem a couple times but hasn’t lately.

  56. Patrick Long 29/07/2021 at 23:49

    Is it my imagination, or did https://kb.vmware.com/s/article/83963 previously include the esxcfg-rescan and mgmt agent restarts almost verbatim from this blog post in the Workaround section…but this information has since been removed from the KB? I have been following so many pages regarding this issue they are all getting blurry now ๐Ÿ˜‰ I continue to encounter this issue sproadically (despite already implementing years ago all known mitigations to reduce I/O to the SD boot device), but luckily I only upgraded a small portion of my environment to 7.0 U2a so it is manageable for me in the short term. The workarounds from this blog post have worked every time.

    • Luciano Patrao 30/07/2021 at 17:10

      Yes, it was. It seems was removed.

      So many things strange in this bug, in this support, in how VMware is handling this.

  57. Markus 02/08/2021 at 13:43

    Pls find the LINK for Dell EMC PowerEdge serversโ€”SD card compatibility matrix with VMware vSphere ESXi 7 if this helps. Problem when checking: There is no way to read the exact type of SD Cards from the System without open it physically, as it is part from a special USB design. You/we have to open every single Host and check.

    https://www.dell.com/support/manuals/de-de/vmware-esxi-7.x/vmware_7.0.x_vsphere_compmatrix_pub/Dell-EMC-PowerEdge-serversSD-card-compatibility-matrix?guid=guid-89b7699f-9dbe-4efd-a325-d4cdf9cfd927&lang=en-us

    Pls let me know, if somebody found a way to check it online, I would highly appreciate that.

    Thanks, Markus

    • Luciano Patrao 02/08/2021 at 14:39

      I don’t know any way to check it without look physically the SD card in the server. In BIOs, you don’t get much information.
      For us is easy because we have a track of all SD cards that we have in our servers.

    • Andy 04/08/2021 at 16:16

      If you access the Dell Support page with the service tag of your PowerEdge server, you can view the entire configuration of the system (Quick links > View product specs) and when you expand the row referring to the SD Cards (not the IDSDM card reader, there is a different line for the SD cards themselves!), i.e. 16GB microSDHC/SDXC Card, you will find the part number of the SD cards installed in you server, e.g.:

      FH2KP ASSY,FSD,SDIG,16G,UHS,IDSDM,KN 2

      This part number is referenced in the (confidential) VMware SD card compatibility matrix PDF that is making the rounds on the internet.

    • Andy 05/08/2021 at 09:55

      Go to the Dell Support site and enter the Service Tag of your PowerEdge server. On the right under “Quick Links” you can view the system’s configuration as shipped from factory. There you should find an entry like “16GB microSDHC/SDXC Card”. Expand it and you have the part number of the installed cards that can be cross-referenced to the compatibility matrix.

  58. Adam Tyler 02/08/2021 at 19:00

    Can someone explain to me why it matters what kind of SD card is used? I realize that some SD cards are better than others. Faster or can handle more writes, etc.. But I was under the impression that this bug is related to the USB controller behind the SD card. All SD cards should work as far as I understood. If you use a crappy one, yes it is going to be slow and may fail to write, but it won’t go offline constantly like we are seeing.

    Am I off base here?

    • Luciano Patrao 03/08/2021 at 20:01

      Is the same as any hardware. It needs to be on the HCL list. Many are not support but still work with VMware. Particularly SD cards is about the quality and the how many r/w and i/o can handle.

      But I agreed, that is not the root cause for this bug.

  59. ultrium 02/08/2021 at 21:57

    Same error here with bullion hardware (bull sequana s400). Opened a ticket in vmware, but already expecting the same answer from the others. Workaround saved us from having downtime on 200 VMs. Thanks!

  60. ultrium 03/08/2021 at 20:42

    Responde from Vmware:

    I have received the esxiยดs logs (Thanks for uploading it)

    Based on the findings and the issue at hand everything points at this document

    https://kb.vmware.com/s/article/83376

    Basically the fix is to modify the way esxi handles the SD cards

    Note: Keep in mind that a reboot is required

    Here are the commands you need to run

    esxcfg-advcfg -A ToolsRamdisk –add-desc “Use VMware Tools repository from /tools ramdisk” –add-default “0” –add-type ‘int’ –add-min “0” –add-max “1”

    esxcli system settings advanced set -o /UserVars/ToolsRamdisk -i 1

    PD: This should be fix in 7.0 U2c (Should be released mid September 2021)

  61. fvertuel 04/08/2021 at 16:03

    Quick update… After upgrading 19 hosts to 7.0.2, I reinstall from scratch 14 of them to 6.7. The 5 others have VM with VM hardware 19. VMware support provide me a release of 7.0.2 build number 18112519… No issues for 3 weeks on those five hosts… Good luck!

  62. Roy Debets 06/08/2021 at 11:55

    We are hitting the same bug on 1 cluster in our environment. This was the first cluster to be upgrade to ESXi 7u2a (build 17867351). We are on Dell R640 rack servers.

    VMware support provided the well known workaround and we have been executing it for 8 weeks now. Normally we are hitting the bug around 2 times a week. Last week we almost hit it every day!

    Also, requesting a private hotfix was no use according to VMware support. They told me that the patch release will be released earlier than the private hotfix, but that was 8 weeks ago.

    Hopefully VMware will fix this soon!!
    Downgrade to ESXi 7u1 is an option, but I’m affraid that after we downgrade VMware will release the patch.So I’d rather wait for the patch release.

  63. Soufiene 06/08/2021 at 16:20

    I started to update our infrastructure, I did 90% (30Server) of server to the version 7.0.2 update b, all of them HPE gen9 and 10, i have the SD card wwith one of them i tried the warkaround, i succed to fix the problem hardly, we still have 10 servers not update still running in the version 7.0.1, i don’t know if i upgrade them or i wait until VMware find the fix to this issue

    • Luciano Patrao 09/08/2021 at 02:16

      Is up to you, but I would not. Why adding more issues to your environment? Just wait for a good version and then upgrade all to the same version.

  64. Steez 09/08/2021 at 10:26

    Does this affect only SD cards? We have a case where under certain load datastore along with the controller goes offline, datastores reporting 0B in size, affects all our VMs. Haven’t tried rescanning controller/disks, but will give a try next time. Also running 7.0U2.

  65. Adam Tyler 09/08/2021 at 16:20

    I’m just in disbelief that VMware hasn’t resolved this problem yet in an official release. We’ve completely given up on vSphere 7 on SSD/USB boot media at this point. We’ll stay on vSphere 6.7 as long as possible and build into our budget new internal 120Gb SSD drives (RAID 1) for each host moving forward. Unbelievable.

  66. jflint 16/08/2021 at 16:17

    Has there been any news of U3 being released?

    • Luciano Patrao 18/08/2021 at 09:10

      Unfortunately no ๐Ÿ™

    • djf2884 19/08/2021 at 09:12

      I had been in contact last week with VMware about the very same issue which happen on three of our ESXi in the same vSAN cluster, it was starting to become complicated… (bad luck here..), They told me that the ETA for 7.0u2c which is suppose to correct this issue (and not 7.0u3) was 24 of August…

      They also proposed me a new VIB to correct the issue but i decide to wait for the final release. Let’s see.

      • Luciano Patrao 19/08/2021 at 11:57

        Hi,

        Well they have said so many different things to different customers that at the moment I don’t what is the correct information.

        For the VIB(vmkusb) it works for some people, others don’t work. So can’t say it would work in your environment.

        Regarding your infrastructure, strange or not, I have one vSAN system with HP BL460c and there were some issues in the beginning and after I apply some of the changes I write here in my blog post, in 2 months, I think I had 2 times only issues.

        Still, don’t understand why is that. Since all the other systems have the same configuration(except the type of servers, not blades. But one system is also blades, not vSAN, and has issues every week), just not vSAN.

  67. […] the update 3, where this issue is fixed. But workaround exist. More about workaround you can see here […]

  68. Jiayu 18/08/2021 at 08:33

    You SAVE MY LIFE!!!!

    Thank you VERY MUCH!!!

  69. Nick Eoannidis 22/08/2021 at 15:54

    I have a support ticket with VMware for this, they tell me that ESXi 7.0 P03/U2c tentative ETA is 08/24/2021.

  70. Matthew 23/08/2021 at 13:55

    just found VC 7.0 U2c released on the downloads page , linked release notes are not yet online ( https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-vcenter-server-70u2c-release-notes.html )
    downloading now

    • Luciano Patrao 23/08/2021 at 16:16

      URL is not working.

      Also in ESXi downloads, there is no new update. Maybe during the week. Let us wait.

      • Matthew 23/08/2021 at 20:26

        visit https://customerconnect.vmware.com/patch/ select Vc and 7.02 u will see the 7.0 U2c patch with a release date 24/8 and downloadable, have mine already installed

        • Luciano Patrao 23/08/2021 at 20:45

          vCenter is not the same for ESXi.

          Issues happen on the vSphere level (even when they are standalone). So what we need to wait for is a vSphere 7.0.x, not vCenter.

          • Luciano Patrao 23/08/2021 at 20:47

            They have launched vCenter, so vSphere should be next.

            Because we should always update/upgrade vCenter before vSphere.

  71. Matthew 23/08/2021 at 20:20

    it’s there, goto https://customerconnect.vmware.com/patch/ select VC and version 7.02, u will see the patch
    with release date 24/8 my dev is already upgraded without an issue (now lets wait for the release notes)

    • Djf2884 24/08/2021 at 15:12

      It has been removed ๐Ÿ™ i have saw it yesterday but release note was returning a 404 and there is no track of ESXi 7.0u2c

  72. matthew Koeman 24/08/2021 at 18:29

    patches are now posted and release notes are working..!
    esxi patches are also there (not yet visible on vmware site, but use this : https://esxi-patches.v-front.de/) for the links and release notes. looks like the fix we look for is in there

  73. Manta Watts 24/08/2021 at 22:42
  74. Brandon M 25/08/2021 at 05:21

    I see two updates released yesterday under Lifecycle Manager, 7.0 U2c – 18426014 and 7.0 U2sc – 18295176. I will apply this update in the next few days and see if the issue is resolved. We use SD cards, not USB cards, but the symptoms are the same. Fingers crossed!

    https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-70u2c-release-notes.html#resolvedissues

    PR 2777003: If you use a USB as a boot device for ESXi 7.0 Update 2a, ESXi hosts might become unresponsive, and you see host not-responding and boot bank is not found alerts
    USB devices have a small queue depth and due to a race condition in the ESXi storage stack, some I/O operations might not get to the device. Such I/Os queue in the ESXi storage stack and ultimately time out. As a result, ESXi hosts become unresponsive.
    In the vSphere Client, you see alerts such as Alert: /bootbank not to be found at path ‘/bootbank’ and Host not-responding.
    In vmkernel logs, you see errors such as:
    2021-04-12T04:47:44.940Z cpu0:2097441)ScsiPath: 8058: Cancelled Cmd(0x45b92ea3fd40) 0xa0, cmdId.initiator=0x4538c859b8f8 CmdSN 0x0 from world 0 to path “vmhba32:C0:T0:L0”. Cmd count Active:0 Queued:1.
    2021-04-12T04:48:50.527Z cpu2:2097440)ScsiDeviceIO: 4315: Cmd(0x45b92ea76d40) 0x28, cmdId.initiator=0x4305f74cc780 CmdSN 0x1279 from world 2099370 to dev “mpx.vmhba32:C0:T0:L0” failed H:0x5 D:0x0 P:0x0 Cancelled from path layer. Cmd count Active:1
    2021-04-12T04:48:50.527Z cpu2:2097440)Queued:4

    This issue is resolved in this release.

    • Luciano Patrao 25/08/2021 at 11:04

      Yes is out. Let us see if this will fix the issue and/or not bringing more bugs ๐Ÿ˜‰

      No the first time that we have a bug fix with a new version, but then we have others. It happens with the latest U2b. They fix the issue of the partition, but then we have this huge issue.

      For my part, I will apply this in a couple of Clusters (not in all environments) and then check for a week or so if all is ok.

  75. John Mccarthy 25/08/2021 at 18:02

    Just got off with Vmware support on a different issue, patch is out, 7.02C. We’re only seeing it in 1 cluster on the same blade server, so we’ll wait to apply it.

  76. Philipp Gruhn 26/08/2021 at 10:57

    Any idea, if it is recommended to upgrade a Fujitsu custom image with the generic VMware patch? We’re currently running FJT-Addon-for-FujitsuCustomImage_7.0.2-520.1.0 (Fujitsu) and I’m not sure, if I want to wait until the custom image is out in about a month or three. This is my only ESXi 7 machine I’m running on our locations, so there isn’t much of a failover solution as we only have a small shop there.

    Anyone running another OEM image and updated it it U2c?

    • Luciano Patrao 26/08/2021 at 14:34

      At the moment there is no ISO to download, only an update. So you can apply the updates with vCenter Lifecycle Manager, or you can download the patch HERE , select ESXi and 7.0 and do it manually.

      A bit busy at the moment, but plan to write a blog post today or tomorrow about this new patch.

  77. FJ TSE team 26/08/2021 at 15:06

    Hi Phillipp,

    About your question regarding the Fujitsu ESXi custom image.

    Your installation was made with FJ CI v520-1 which is based on ESXi 7.0 U2.
    Sure, you are allowed to install ESXi 7.0 U2c to get the latest critical/security fixes.
    If your ESXi installation is on 7.0 U1 or even 7.0 we do advice to update using the Custom Image/Offline Bundle v520-1 before.

    Before actually updating you could take a backup of the existing ESXi configuration. In case of an emergency you can install with the identical vmkernel build (!) and restore the existing ESXi configuration.
    Instructions: https://kb.vmware.com/s/article/204214

    • Luciano Patrao 27/08/2021 at 12:21

      You can apply the patch, if anything goes wrong, you can always rollback to the previous version(the one that you have today). In my view, no need to do backup of ESXi configuration. But if you want, is always better to create a Host profile and if needed to reinstall, reapply the host profile in the new ESXi host.

  78. John Kekatos 27/08/2021 at 00:56

    If i am running a HPE Customized image do I need to wait for HPE to release their version or can I apply this latest patch ?

  79. Albert 30/08/2021 at 12:36

    Hi, this has ven fixed in the latest release then? I have a Dell custom image and installed on RAID1-SD cards. Which is the correct procedure to fix this issue? Thanks a lot

  80. Steven Smith 30/08/2021 at 22:58

    Luciano Patrao are you saying if we have a Dell or Lenovo custom image of 7.0u2b currently we can just apply the non custom image 7.0u2c without any issues? Its only if we were doing a new build we would need the custom image of the required version?

  81. Sandeep 06/09/2021 at 19:15

    Error we are facing : Lost connectivity to the device mpx.vmhba32:C0:T0:L0 backing the boot filesystem /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0. As a result, host configuration changes will not be saved to persistent storage.

    ESXi – 6.7 P05 – 17700523
    Cisco HX240C-M4SX

    any solution…?

    • Luciano Patrao 07/09/2021 at 16:32

      That could be a corrupted SD card. It can happen.
      In 6.7 this USB/SD bug doesn’t happen. We can have SD issues, but not related directly to this type of bug.

      Long the years we had to replace some SD cards (particularly the ones that were in HPE servers) with better ones.
      I propose you open a ticket with VMware support.

  82. Jason 07/09/2021 at 17:00

    7.0U2c just been released publicly to address this issue.

  83. Suresh Kumar 12/09/2021 at 08:31

    The new patch seems to be stable. Did not find any issue after 5 days of applying it.

  84. JEff 14/09/2021 at 18:26

    Update 2c while it fixes the SD issue brings more problems along with it. ESXi 7.x is a nightmare and should be pulled from being able to download. 64 Node vanilla cluster decimated buy it.

    • Luciano Patrao 17/09/2021 at 07:13

      You need to be more specific. Because I have hundreds of ESXi servers and only 2/3 I had issues. And it was not related to the update itself, but with ESXi profiles or some vibs mismatch.
      So if you say it brings lot of problems, please share what problems you have.

  85. Nathan 15/09/2021 at 13:48

    7.0U2c resolved this issue for me.

  86. Ken 19/11/2021 at 23:17

    Thanks, your work around saved me a lot of and time and heart ache. I was able to migrate all my VMs to another host without any downtime and update the host which was having problems with the SD card to 7.0U2c. Our next step is to replace all our SD cards with SSD Disks and update all hosts.

  87. Organick Organicksen 20/12/2021 at 00:13

    I just updated to VMware ESXi, 7.0.2, 18538813, and am running on Dell SD card, do you know if this issue persist?

    • Luciano Patrao 24/12/2021 at 15:09

      Hi,

      As long you don’t use the ones that still have the issue, you are ok. But you should read the full disclosure about using SD cards in ESXi. We can still use it, but with some changes. And in the future will not be supported anymore.

  88. Ganesh Bhosale 14/02/2022 at 12:33

    Big Thanks for this solution, you saves lot of downtime ๐Ÿ™‚
    this issue is present with 7.0.3 18644231 as well.

    • Luciano Patrao 14/02/2022 at 17:24

      Can you provide more details on that? Because I have upgraded some(at least 10 until now) and did not get any issues.
      What server was it? Type of SD cards?

  89. ballhawk45 20/02/2022 at 15:54

    Ganesh Bhosale – I too am interested to hear about any issues you see that are persisting into 7.0 U3c. Luciano, how has your limited rollout of 7.0 U3c 19193900 been going, and have you installed it on any of your diskless servers (USB/SD/microSD boot device)? Have you reached any level of comfort with that version? I note that the security patches released this past Tuesday 2/15/21 were included in the 7.0 U3c release. Like you, I agree that this past year has been… frustrating from a long-time vSphere admin perspective. Have you reached the point of being able to unequivocally recommend 7.0 U3c to other admins with diskless servers? Or would you recommend sticking with 7.0 U2d which I have found to be quite stable (or now U2e from this past week.) Of course a retrofit with local M.2 is in our future for these servers, but is still months away due to lengthy procurement process and will be a major effort given our # of hosts.

    • Luciano Patrao 20/02/2022 at 16:21

      As I said until now I did not find any issues.

      Besides 10 servers with SD cards, my own lab is also running on 16Gb SD cards and I don’t get any issues. But of course, my lab doesn’t have the amount of usage as a production environment.
      In that production environment, we have 10 HPE DL360-G10 and until now all are stable. But they are still working with the solution that I provided here regarding the scratch and core dump partitions.

      With the new version or not, any ESXi running with SD/USB should have those changes made so that we have a stable environment.

      I understand that now companies, and professionals, are a bit afraid of the new versions. But my personal opinion, I think now VMware learn their lesson(on the hard way) and will make sure no broken version would be released. With some issues, that is normal on any system, OS, or new software version, but not serious what he has seen on that version.

      So the decision to go for new versions or not is up to you. We can only test and say if we found any issues or not. Particularly since I am not a VMware employee ๐Ÿ˜‰

  90. ballhawk45 13/07/2022 at 17:23

    Luciano – ESXi 7.0 U3f was just released and includes a NEW vmkusb driver, VMware-vmkusb_0.1-7vmw.703.0.50.20036589. Unfortunately the release notes entry for this item only says “Updates the vmkusb VIB.” so no useful information. I wanted to post this here as the change introduced with the vmkusb driver in 7.0 U2 was the one that unleashed unthrottled I/O on USB devices and caused all this trouble to begin with, and now we have a new version with absolutely no indication of what issues it addresses or improvements it makes. Presumably it maintains the I/O throttling of post-7.0 U2 versions that stabilized ESXi environments with USB-based boot devices. Note this version is the first newly-released version of vmkusb driver since vmkusb_0.1-6vmw.703.0.20.19193900 released with ESXi 7.0 U3c in Jan 2022 (U3d and U3e did not update this driver), so be vigilant for any changes in behavior.

    • Luciano Patrao 20/07/2022 at 22:50

      Hi Patrick,

      Thank you for your message.

      Since I am on vacation, I am not doing any updates for now. Not even test any new versions.
      When I am back, I will test these new updates, and I will update the blog post if needed.

      Thank you again for your support on this subject.

Leave A Comment