Fix wrong vSAN Cluster partitions

/, Virtualization, vSAN/Fix wrong vSAN Cluster partitions

Fix wrong vSAN Cluster partitions

My vSAN was getting some warnings about vSAN cluster partitions. Checking my vSAN partitions, I notice vSAN has some Cluster partitions.

As we can notice in the above image, we have two different partitions. To understand why we need to check in each ESXi vSAN Cluster.

Connect to ESXi trough ssh and starting to run “esxcli vsan cluster get” to check each ESXi Cluster member.

  • vSAN ESXi Node 04

  • vSAN ESXi Node 02

  • vSAN ESXi Node 03

  • vSAN ESXi Node 01

As we can notice in the above information, somehow I have 2 different vSAN Clusters. Each one with two ESXi hosts. Node 03 and 01 is in one vSAN Cluster with the UUID:52b57974-6769-70cc-346a-b99c5762a232 and node 04 and 02 is in another vSAN Cluster with the UUID:52b57974-6769-70cc-346a-b99c5762a232, but with Sub-Cluster Membership UUID:23dab5a-e0be-04e2-ce76-005056968e4b. When should be the Master UUID:c13dab5a-fff4-d93e-0f0e-0050569646fa.

Each vSAN Cluster has their Master and Backup. When for a four nodes vSAN Cluster I should have 1 Master, 1 Backup and 2 Agents, all using the same vSAN Cluster UUID and Sub-Cluster Master UUID.

This can happen if you removed an ESXi host from the Cluster and added again. Or lose connection, and you need to reconnect. Since this a vSAN playground, I am pretty sure some of my tests did cause this.

How to fix it.

To fix wrong vSAN Cluster partitions, we need to remove ESXi from the false clusters and add all nodes just to one Cluster.

First I decided to remove ESXi node 02 from the cluster and add to the proper vSAN Cluster.

Use the command to remove vSAN ESXi node from vSAN Cluster

Immediately when I remove this ESXi node from that vSAN Cluster, it shows now that is a member of the other vSAN Cluster. Something I thought was strange.

Checking Node 01 and 03, both show also 3 members with node 02 included in the vSAN Cluster with the UUID c13dab5a-fff4-d93e-0f0e-0050569646fa.

Now I need to do the same for ESXi Node 04 and display the vSAN Cluster information for the node.

But this ESXi node did not automatically add to the other vSAN Cluster. That is the typical behavior.  If we remove from the vSAN Cluster then should be an ESXi node without vSAN Cluster.

Usually, we should run the command join cluster to add to a vSAN Cluster.

First I run the get command in one of the nodes to get the proper Cluster UUID to join the vSAN Cluster. Then run join cluster command using UUID.

But surprisingly the ESXi Node 04 did not join in the vSAN Cluster as an Agent, like was supposed to, but instead created a new vSAN Cluster again and added ESXi Node as Master.

Remove ESXi node from that vSAN Cluster and also remove the host from the vCenter vSAN Cluster, try again and again the same result. Needed to recheck all information and check where is the problem.

Then is when I notice the ESXi Node 04 system UUID was the same as ESXi Node 03. This was the result of vSAN Cluster strange behavior. Since ESXi Node 01, 02 and 03 were already fixed and running in one vSAN Cluster, any changes need to be in ESXi Node 04.

Run the command to check System UUID in all ESXi Nodes and confirm that ESXi Node 04 and ESXi Node 03 add the same UUID.

Note: Since this is a Nested vSAN and Nested ESXi, maybe initially I forget to recreate the System UUID when I deploy the Template, or when I restore one of the ESXi (discussed in my previous article). Regardless of the root cause, we need to fix it.

How to fix wrong System UUID?

There is a procedure to fix this as I have shown in a previous article How to deploy Nested vSAN.


Afterward just reboot the ESXi to have a new System UUID.

After the reboot check the System UUID if did change.

 

Now I can try to rejoin the host to vSAN. First I will check the vSAN Cluster UUID once more.

Displaying vSAN Cluster information for the faulty ESXi Node now I can see that belongs to the write vSAN Cluster and I have a 4 Nodes vSAN Cluster.

Note: If you move the ESXi host back to vCenter vSAN Cluster will also rejoin the vSAN Cluster.

Now we have a vSAN Cluster working again, and all VMs are available (with the vSAN Partition issues VMs were in an inaccessible state).

Note: Afterwards I get some warnings regarding vSAN Disk Balance. That is normal since we have some issues in the partitions and now we need to run “Proactive Rebalance Disks.” But I back to this and other health procedures to fix some of the vSAN issues.

Hope this article will provide you some help how to troubleshoot some of the vSAN issues and how to fix them.

Note: Share this article, if you think it is worth sharing.

©2018 ProVirtualzone. All Rights Reserved
By | 2018-03-16T18:51:58+00:00 March 16th, 2018|Storage, Virtualization, vSAN|0 Comments

About the Author:

I am over 20 years’ experience in the IT industry. Working with Virtualization for more than 10 years (mainly VMware). I am an MCP, VCP and vExpert for the last 3 years. Specialties are Virtualization, Storage, and Backups. I am working for Elits a Swedish consulting company and allocated to a Swedish multinational networking and telecommunications company as a Teach Lead and acting as a Senior ICT Infrastructure Engineer. I am a blogger and owner of the blog ProVirtualzone.com

Leave a Reply

%d bloggers like this: