This week we had an outage with our vCenter Appliance 6.0 vPostgres DB. After some Storage issues vCenter stop working and was not possible to recover, so I decided to write this article vCenter 6.0 Appliance (VCSA) fix and restore to help others with the same issue.
In this article vCenter 6.0 Appliance (VCSA) fix and restore will not go through how to troubleshoot the issue, or what the root cause(was not possible to identify with 100% sure what was the root cause), but only the way to recover and restore the vCenter.
We have Veeam to backup most of our Virtual Infrastructure, but this is one of the vCenter that are not included in our backup plan. For the safe side, I did a backup a couple of months ago of this vCenter, so the backup that I had to restore this vCenter was too old (40 days).
The vCenter had the DB running, but ODBC was broken, and some services were not running and was not possible to restart (mainly vpxd), and the connection to the DB was always dropped. After I give up trying to fix it, was time to restore.
How to restore?
That was the question since the backup was 40 days old and restoring a vCenter this old is a big issue. Since all hosts will be out of sync, all new VMs that were added to vCenter will show orphaned and a lot of problems. Any changes done in the last 40 days trough vCenter will not be restored, and trough ESXi host will not be sync.
We can see here an example of the errors that you will encounter with a restore that old (error in ESXi hosts).
Would take many, many hours to fix all these issues. Since I would need to remove all hosts(around 50) one by one from vCenter and add them back again to correct the first two errors and also many VMs that were orphaned that need to be removed from vCenter Inventory and add back again to vCenter. This is a manual work that takes many hours. Needed to do this in the past, and I know the pain to do this. If there were no other option, I would do it, but I want to find a different approach that doesn’t need this kind of effort.
So need to create a restore plan for this vCenter. I have a broken vCenter that DB is running (so that is available) and I have an old VM backup that I can use it.
So I decided to go with the following plan.
- Power on broken vCenter Appliance.
- Check DB is running and do a vPostgres DB backup.
- Power off Broken vCenter.
- Use Veeam Backup and Restore vCenter Appliance.
- Power on Restored vCenter Appliance.
- Update vCenter Appliance(to have the same build as the broken vCenter).
- Stop vCenter Appliance services.
- Restore vPostgres DB from Backup created in step 2.
- Reboot vCenter Appliance and check consistency.
After I have the plan is time to go for the tasks one by one and how to.
First, you need to download the VMware script to backup and restore the vPostgres DB. You can download from the VMware KB HERE.
Step 2 (step 1 to 3):
After download the script 2091961_linux_backup_restore.zip you need to upload the file to vCenter Appliance so that you can run the command to backup the DB. I use WinSCP to move the file from my laptop to the vCenter.
The zip file has two script files backup_lin.py and restore_lin.py. One for the backup and one for the restore. So for this step you need to upload the backup script to the broken vCenter.
Will show the process with some images.
Connect to your vCenter Appliance and then choose the folder /tmp (right window) and move the backup script file to it (drag and drop).
Note: To be able to upload files and login using ssh to the vCenter Appliance, SSH needs to be enabled. Check HERE how to enable.
After you have the file in your vCenter Appliance, you need to make the file executable with: chmod 700 /tmp/restore_lin.py
You now have the script executable, run it and backup the Postgres DB using: python /tmp/backup_lin.py -f /tmp/backup_VCDB.bak
After you backup the DB you need then to move/download to your computer, so then you can upload in the new vCenter Appliance that you will restore in the next steps.
As you can see in the next image, the backup backup_VCDB.bak as 93Mb.
Step 4 (step 4 to 6):
- After power off the broken vCenter Appliance, now its time to restore vCenter Appliance using the Veeam backup.
Note: Will not go through this process in this article. You can use a Veeam backup or other Virtual Backup tool that you have in your environment.
- Power on the restored vCenter Appliance and check if everything is working.
- For safe side do a snapshot before doing any changes to the new vCenter.
- Login to https://FQDN:5480, or use IP address and update your vCenter Appliance and update your vCenter Appliance.
Step 7 (step 7 to 9):
After your vCenter Appliance is updating is now time to restore the DB using the backup we created in Step 2.
Again using WinSCP you upload, not only the DB backup but also the restore script
- Before you restore the DB, you need to run some commands in the VCSA
1 – Make the restore script executable.
chmod 700 /tmp/restore_lin.py
2 – Stop VCSA services.
service vmware-vpxd stop
service vmware-vdcs stop
3 – Restore the vCenter Appliance vPostgres DB.
python /tmp/restore_lin.py -f /tmp/backup_VCDB.bak
- After your vPostgres is restored, restart the services to check if there are any issues.
service vmware-vpxd start
service vmware-vdcs start
After all tasks from the plan are completed, reboot the vCenter for a clean start.
As you can in next image, vCenter is running issues or warnings in the ESXi hosts.
After this process, you have an Up To Date vCenter Appliance running.
Note: In this article, I use a VCSA Backup from Veeam (or other Virtual Backup Tool), but if you don’t have a backup of the VCSA you can follow the same plan, but instead of a backup you install a new VCSA (same FQDN, IP, etc.) and restore the DB. Is a similar process, you just need to do some changes when installing VCSA from scratch.
Hope this article can help you restoring your broken VCSA.
Note: Share this article, if you think it is worth sharing.