/vCenter crash: duplicate key value violates unique constraint

vCenter crash: duplicate key value violates unique constraint

The vCenter crash was initiated by a network storage switch failure that affected all the iSCSI LUNs in my home lab. The failure occurred as a result of changes I made to the switch’s VLAN capacity, which necessitated a reboot. I approved the reboot request without thinking that I had my VMs running.

The Switch was back, but I had many problems with some of the LUNs not being connected and recognized by the host. So I needed to reboot the two ESXi hosts to recover the iSCSI LUNs while some VMs were running. Unfortunately, one of those VMs was the vCenter.

After the hosts were back and recovered all iSCSI LUNs and recognized all VMs, when I powered on vCenter, it was full of problems. The VM could identify the virtual network Switch (a Standard Switch) and complains that the Switch needs to be ephemeral (that we now are the only type vDS we should use when adding the vCenter network). The problem is that this was not a vDS but a Standart Switch.

Somehow the vCenter with the crash confused. But I had more management networks that I could add to the vCenter. But non was possible to add, even when trying to check the network in the vCenter config GUI(using VM console) no IP, no Gateway. Everything was a mess with that network.

So since I didn’t had many options, I decided to remove the network interface from the vCenter VM and then add a new one. After I did that, vCenter recognized the same as eth0, but again some issues with networks. I created a new Standart PortGroup on the ESXi host and added it to the vCenter. After that, I could power on the vCenter and had a network.

But as aspected, vCenter services did not start properly, and I began to see a lot of issues with vCenter and its services(not starting or starting and stopping etc.).

When trying vCenter on the browser, I get “no healthy upstream” and nothing happens. So it was time to check logs to see what happened inside the vCenter.

First, I checked the vpxd.log located in /var/log/vmware/vpxd, and that is when I started to see a lot of “duplicate key value violates unique constrain”.

Some examples of errors in the vpxd.log

So I have some duplicated keys on the vCenter DB.

Next is to check the vPostgres logs in /storage/log/vmware/vpostgres/.

Check the latest logs.

Now search for anything related to duplicate keys

Logs 04 and 05 had a lot of entries regarding the problem.

So I need to double-check the vPostgres DB and fix any duplicate keys that exist regarding this ID 38661

Note: Before login to vPostgres DB, stop the vpxd service with: service-control –stop vmware-vpxd

Connect to vCenter vPostgres DB.

Then search for the ID in VPX_ENTITY

Found one entry, and it says “vcenter”, which seems to be the network I created after the crash.

Check the parent_id to make sure this was a network.

The format is not very good, but I can see that this is all networks. So the problem is in that network “vcenter“.

I needed to double-check and make sure I checked the type_id 19 which is what this network is using.

So it is confirmed this is the network I created and is a Standard Switch portgroup.

So what I need to do next is to delete the duplicate key.

Important Note: Never touch the vCenter DB without creating a snapshot of the vCenter VM, or if you want a backup of the DB (check VMware KB for this)

After deleting the entry, we can start the service again with service-control –start vmware-vpxd, but I see many times that this is not enough, so it is better to reboot the vCenter.

After vCenter is powered on, the vCenter service still doesn’t want to start. So I check the vpxd logs again to see if there is anything else to fix, and I found this:

And in the vPostgres logs, I have this:

So I have more duplicated keys again. This time in “pk_n_vm_config_info“. This is different because it is related to VMs.

I connected again to vPostgres DB and checked the id 38663.

Note: Again, don’t forget to create a back or a snapshot and stop the vpxd service.

Strange, this is the DRS vCLS VMs. Checking the type_id I see that exists two, so one is duplicated(the 38663)

So I need to check the vpx_vm_text regarding this VM id.

I find 5 rows here, so I need to delete everything from this VM id.

Here honestly, I didn’t know for sure witch tables I needed to remove regarding this VM id. Because it is not just from vpx_vm_text. There are entries in other tables. So I found a very useful blog post from a colleague Kabir from my company ITQ. He had a similar issue and listed all the tables we needed to delete. Since I had a backup, I gave it a try.

Check if still exists any VM_ID=’32684′ and there was none. So all is clean for the vCLS VM.

Note: Thanks Kabir for the tips. This is why the #vcommunity and sharing knowledge is important.

Reboot the vCenter again and wait to see if all is good.

After the vCenter was rebooted, I checked vpxd and vPostgres DB logs, and all was clean about duplicate keys. So this problem was solved but I still have issues to have vCenter working and being able to login. vCenter service was running, and login page was shown, but when I tried to login is just was thinking, thinking, and nothing happened.

So go to the vpxd logs again and see some issues regarding the certificates and bad passwords. Strange error, but since I tried to change(or add since was empty) the IP and gateway manually and not from VAMI(VMware recommends that this change needs to be done through VAMI, or we can have some issues), then maybe I did also mix up the certificates. So I decided to recreate all of them.

How to recreate vCenter certificates? Use the certificate manager that is here: /usr/lib/vmware-vmca/bin/certificate-manager

Here you have the option to recreate one or all certificates. Since I don’t know what happened, I decided to recreate all certificates using option 8.

This is straightforward, just use the defaults and add your vCenter IP and FQDN for the hostname and VMCA. You can check VMware KB for how to do it.

It takes some minutes to finish(depending on the size of your vCenter), and then it reboots automatically.

After the reboot, I login again, and… VOILÀ!!! Finally, I have the vCenter back.

There were some minor issues regarding the networks that I needed to fix, but nothing special, and then vCenter was fully functional.

As we can see, by troubleshooting vCenter logs, we can easily find the root cause of the problem. In this case, since it includes vPostgres DB with duplicate keys, it can be tricky.

Share this article if you think it is worth sharing. If you have any questions or comments, comment here, or contact me on Twitter.

©2023 ProVirtualzone. All Rights Reserved
By | 2023-02-06T03:47:13+01:00 February 6th, 2023|vCenter, VMware Posts, vSphere|0 Comments

About the Author:

I have over 20 years of experience in the IT industry. I have been working with Virtualization for more than 15 years (mainly VMware). I recently obtained certifications, including VCP DCV 2022, VCAP DCV Design 2023, and VCP Cloud 2023. Additionally, I have VCP6.5-DCV, VMware vSAN Specialist, vExpert vSAN, vExpert NSX, vExpert Cloud Provider for the last two years, and vExpert for the last 7 years and a old MCP. My specialties are Virtualization, Storage, and Virtual Backup. I am a Solutions Architect in the area VMware, Cloud and Backup / Storage. I am employed by ITQ, a VMware partner as a Senior Consultant. I am also a blogger and owner of the blog ProVirtualzone.com and recently book author.

Leave A Comment