/How to fix wrong vSAN Cluster partitions

How to fix wrong vSAN Cluster partitions

In this How to fix wrong vSAN Cluster partitions, I will explain how to check and fix these issues in your vSAN Cluster.

In my vSAN, I was getting some warnings about vSAN cluster partitions. Checking my vSAN partitions, I notice vSAN has some Cluster partitions.

How to fix wrong vSAN Cluster partitions

As we can notice in the above image, we have two different partitions. To understand why we need to check in each ESXi vSAN Cluster.

Connect to ESXi through ssh and start to run “esxcli vsan cluster get” to check each ESXi Cluster member.

  • vSAN ESXi Node 04

  • vSAN ESXi Node 02

  • vSAN ESXi Node 03

  • vSAN ESXi Node 01

As we can notice in the above information, somehow, I have 2 different vSAN Clusters. Each one with two ESXi hosts. Node 03 and 01 is in one vSAN Cluster with the UUID:52b57974-6769-70cc-346a-b99c5762a232, and node 04 and 02 is in another vSAN Cluster with the UUID:52b57974-6769-70cc-346a-b99c5762a232, but with Sub-Cluster Membership UUID:23dab5a-e0be-04e2-ce76-005056968e4b. When should be the Master UUID:c13dab5a-fff4-d93e-0f0e-0050569646fa.

Each vSAN Cluster has its Master and Backup. When for a four nodes vSAN Cluster, I should have 1 Master, 1 Backup, and 2 Agents, all using the same vSAN Cluster UUID and Sub-Cluster Master UUID.

This can happen if you remove an ESXi host from the Cluster and added again. Or lose connection, and you need to reconnect. Since this is a vSAN playground, I am pretty sure some of my tests did cause this.

How to fix it

To fix the wrong vSAN Cluster partitions, we need to remove ESXi from the false clusters and add all nodes to one Cluster.

First, I removed ESXi node 02 from the cluster and added it to the proper vSAN Cluster.

Use the command to remove the vSAN ESXi node from vSAN Cluster

Immediately when I remove this ESXi node from that vSAN Cluster, it shows now that it is a member of the other vSAN Cluster. Something I thought was strange.

Both Node 01 and 03 show 3 members with Node 02 included in the vSAN Cluster with the UUID c13dab5a-fff4-d93e-0f0e-0050569646fa.

Now I need to do the same for ESXi Node 04 and display the vSAN Cluster information for the node.

But this ESXi node did not automatically add to the other vSAN Cluster. That is typical behavior.  If we remove it from the vSAN Cluster, then it should be an ESXi node without vSAN Cluster.

Usually, we should run the command join cluster to add to a vSAN Cluster.

First, I run the get command in one of the nodes to get the proper Cluster UUID to join the vSAN Cluster. Then run the join cluster command using UUID.

But surprisingly, the ESXi Node 04 did not join the vSAN Cluster as an Agent, like it was supposed to, but instead created a new vSAN Cluster again and added ESXi Node as Master.

Remove the ESXi node from that vSAN Cluster, remove the host from the vCenter vSAN Cluster, and try again and again with the same result. I needed to recheck all information and check where the problem is.

Then is when I noticed the ESXi Node 04 system UUID was the same as ESXi Node 03. This was the result of the vSAN Cluster strange behavior. Since ESXi Node 01, 02, and 03 were already fixed and running in one vSAN Cluster, any changes need to be in ESXi Node 04.

Run the command to check System UUID in all ESXi Nodes and confirm that ESXi Node 04 and ESXi Node 03 add the same UUID.

Note: Since this is a Nested vSAN and Nested ESXi, I may initially forget to recreate the System UUID when I deploy the Template or restore one of the ESXi (discussed in my previous article). Regardless of the root cause, we need to fix it.

How to fix the wrong System UUID?

There is a procedure to fix this, as I have shown in a previous article How to deploy Nested vSAN.


Afterward, reboot the ESXi to have a new System UUID.

After the reboot, check the System UUID to see if it did change.

Now I can try to rejoin the host to vSAN. First, I will check the vSAN Cluster UUID once more.

Displaying vSAN Cluster information for the faulty ESXi Node now I can see that it belongs to the write vSAN Cluster, and I have a 4 Nodes vSAN Cluster.

Note: Moving the ESXi host back to vCenter vSAN Cluster will also rejoin the vSAN Cluster.

Now we have a vSAN Cluster working again, and all VMs are available (with the vSAN Partition issues VMs were in an inaccessible state).

How to fix wrong vSAN Cluster partitions

Note: Afterwards, I get some warnings regarding vSAN Disk Balance. That is normal since we have some issues in the partitions, and now we need to run “Proactive Rebalance Disks.” But I am back to this and other health procedures to fix some of the vSAN issues.

Hope this article will provide you with some help on how to troubleshoot some of the vSAN issues and how to fix them.

Share this article if you think it is worth sharing. If you have any questions or comments, comment here, or contact me on Twitter.

©2018 ProVirtualzone. All Rights Reserved

 

By | 2023-06-09T12:43:32+02:00 March 16th, 2018|Storage, vCenter, VMware Posts, vSAN|2 Comments

About the Author:

I have over 20 years of experience in the IT industry. I have been working with Virtualization for more than 15 years (mainly VMware). I recently obtained certifications, including VCP DCV 2022, VCAP DCV Design 2023, and VCP Cloud 2023. Additionally, I have VCP6.5-DCV, VMware vSAN Specialist, vExpert vSAN, vExpert NSX, vExpert Cloud Provider for the last two years, and vExpert for the last 7 years and a old MCP. My specialties are Virtualization, Storage, and Virtual Backup. I am a Solutions Architect in the area VMware, Cloud and Backup / Storage. I am employed by ITQ, a VMware partner as a Senior Consultant. I am also a blogger and owner of the blog ProVirtualzone.com and recently book author.

2 Comments

  1. Reza 08/08/2020 at 17:45

    You saved my day! Thank You

Leave A Comment