Today I get a strange error in 3 of my ESXi hosts. One was a physical ESXi host, and two were nested ESXi hosts.
All have a warning: “All shared datastores failed on the host xxx.xxx.xxx.xxx”
Did not change anything in the iSCSI configurations. Did a double check in iSCSI configuration, check the iSCSI vmkernel, check iSCSI Software Initiator, check port bindings, etc., all were ok. All Datastores were on the host, and all VMs were running without any issues, so was a strange error.
Looking at the ESXi log and vmkernel log did not find any error related to this issue. So I start to remove port binding iSCSI interfaces and add back again and did a rescan, the issue was still in the hosts. Try to remove all interfaces, create all iSCSI configurations from scratch and still no luck.
Next step was to restart vCenter agents in the ESXi host: “/etc/init.d/hostd restart” and “/etc/init.d/vpxa restart”. But again, this did not work, so I move VMs from the host and reboot the host.
After the restart, the error persists, so I decide to remove (remove the host, not disconnect) the ESXi host from vCenter and add back again. When I added the host add a different error regarding vDS host sync. So I added the host back to the main Virtual Distributed Switch (vDS) and both errors were gone.
In the nested ESXi hosts, I try a different approach, I enable HA (High Availability) and then disable again. I know this would refresh and enable HA agents and also management network, and the Datastore issue was clear.
I try to google this issue in VMware KB did not find anything related to this issue.
Did not notice that this error had any impact in the exiting datastores in these ESXi hosts, all VMs were running without any problems.
Hope this can help you remove this annoying error.
Note: Share this article, if you think it is worth sharing.
Are you using software iSCSI ? If yes, the issue can be related to high CPU load.
Hi Adrian,
First, thank you for your comment.
Regarding the issue, yes is iSCSI and honestly don’t think this could be a CPU issue. Why? First I did receive any warning regarding CPU load, second the issue was in different ESXi hosts. The Physical one yes could have high CPU load, since is one of my home lab ESXi host, but these nested ESXi hosts do not have any VMs power on (have one VM that is power off).
We could say if the physical CPU was over loaded the nested subsequent will be also, but they run in different ESXi hosts.
But honestly I did not check, or noted any CPU issue, so if was, I did not notice and did not get any warnings.
When I have more time, I can try to troubleshoot this better and try to provoke a high CPU load and check if I get again this issue.
Thank again for your comment.
Luciano Patrão
what happens when you use vmkping to test connectivity to your iscsi storage? check jumbo frames as well
Hi Paul,
Thank you for your comment.
There were no issues to Storage connections. Like I said in the article all iSCSI port binding and connections were recreate and connection to the Storage was not an issue.
Jumbo frames are not enable in this environment.
Thank You
Luciano Patrao
Olá você recebe essa mensagem quando conecta diretamente no seu host?
Ola André, como a maioria dos meus leitores não falam Português, irei responder-te em Inglês para outros poderem entender e apreender tambem.
This message will be shown if you use vCenter, or ESXi host connections.
So if you connect to vCenter, you click in one of the hosts you have the message in the host itself, but if you connect to the ESXI host directly you will see also the same message. But this is a host issue, not a vCenter issue.
Ola Andre,
So percebi que os meus replies não estavam a enviar um email ao utilizador para informar que tinha respondido ao seu comentário.
Assim sendo, agora deves receber essa informação.
Luciano Patrão
Hello Luciano Patrao,
I faced the same issue, and I followed your article to resolve it. Thanks it has helped me
Regards,
Mithil Surve
Hi Mithil,
Glad that I could help
LP
Hello Patro ,
We have three hosts in a EVC cluster,we faced “all the shared datastores failed to the host” and VMs went offline..We have NFS storage and standard vswithces..once we disable one of the unlink to the storage all the VMs are up and host error also gone..what could be the problem.Network switch end no error logs.
Vmkernel log has some error as following “Vmnic:Ntg3HandleTxMappingErr:360:Avoid frag of 4096 bytes at 0x5dfffff000
This happen when you disable a interface for any ESXi hosts? What versions and build ESXi and vCenter are using? Those are the network interfaces used for Storage network.
It could be a network firmware version. But only checking the full vmkernel.log I can check what could be.