In my previous, I wrote about my first encounter with Veeam in production and a story about why it was essential to backup all your Virtual Infrastructure, something that was not on the mind of many administrators at the beginning of Virtual Infrastructures. I started this Veeam Stories series, and the plan was to write every couple of months a story about my experience with Veeam. Still, unfortunately, for many reasons, I didn’t follow my plan.
In this Veeam Stories: vCloud Director disaster recovery, I talk about a similar(also a full disaster recovery) story but for the for vCloud Director. The same behavior and mindset for handling Virtual Infrastructure back then were similar to how to handle and backup Cloud VMs.
The mindset is, no need to backup the Virtual Machines. We have the backup of the data and also from the vCloud Director itself. It was the same 15 years ago. We have the data backup and also image backup of the physical servers.
Veeam launched support for vCloud Director in version 7.0 back in 2013/2014, and we installed it in the company in maybe 2014/2015(I can’t remember the correct date). We had several Veeam licenses but never used Veeam in vCloud Director, only backup VMware environments.
Our Veeam licenses belong to the infrastructure department to backup all our Virtual Infrastructure. Still, many departments were using Virtual Infrastructure, particularly the R&D that tested Virtual Applications and how they worked in vCloud Director.
As usual in Enterprise companies, every department has its budget, particularly for IT. So licenses that belong to Infrastructure were to use in the Infrastructure systems. If we needed backup of any infrastructure outside our department(backups and support to other departments were regular since we were the infrastructure team), that department was required to pay for the used licenses.
But as always, since department fight for budgets, they want to spend as less as possible, and licenses to backup their systems were only the critical ones. The rest was not to pay any licenses or spend money on it.
But as I said, these teams were already testing many Applications in the vCloud Director, and a lot of work and hours were spent on it. I asked a couple of those teams if they want to start to backup their vCloud Directores, but since this means more licenses and payment to the Infrastructure department and needs to be approved by management.
Even the teams said yes, but the department management said no. There was no need for R&D infrastructure and work. Teams didn’t like it, but that was the management’s decision.
Our backup plan for Infrastructure included some of the R&D VMware infrastructures. We were backing the VMs on those systems, including the VMs for the vCloud Director(some were Linux machines and SQL Servers). So management’s statement was that was more than enough.
When I rebuilt our backup Infrastructure, and since we had some Veeam licenses left, I decided to include one of those vCloud directors (the biggest one and with the most work) in our backup plan. Even was not officially requested, I decided to do it to be saved from a possible disaster on that vCloud Director Infrastructure.
To backup, a simple Tenant vCloud Director VMs with Veeam is a simple design.
So now, besides our normal backups(around 800 VMs), I am also backing up one of the biggest and more critical vCloud Directors from the R&D with about 100 VMs and vApps.
The disaster happened
I think the backup ran for about six months or something similar when I received a support request that one of their vCloud Directors was completely wiped out.
They had a problem with the Storage, then all the internal vApps and VMs inside their vCloud Director were lost. There was a backup of the vCloud Director itself so that we could restore it. But the main work and thousands of hours of testing on those VMs and vApps were lost.
I let them soak up the panic for a while(some hours) and with a lot of escalation about the problem and what it means. Not only for the work that was done but also for the plans for the year on the R&D and certification of the products running on vCloud Director
Then I informed them that even with no agreement from the management, I had a backup of all the vCloud Infrastructure, and I could restore it just in some hours. Informed that all the work was safe, and they could continue to work where they left.
Of course, many people were happy to know that systems could work the next day and nothing was lost.
After this large disaster in the vCloud Director without any backups, management authorized and paid licenses to backup all the critical vCloud Directors that teams were working on.
After backups had been running for several vCloud Directors for months, there was another disaster in another team in the R&D where vCloud Director VMs and vApps were lost, and we restored them quickly and with no panic.
Veeam Backups for vCloud Director restores were used in the following months. Many requests to restore individual vApps and VMs that didn’t come up after some changes(even they use snapshots many times, sometimes only happen some days after).
Why so many disasters in this vCloud Directors? As I said, these were R&D and test teams. That were testing applications, vCloud Director, Storage, etc., which means a lot of changes, much testing for the first time, and sometimes systems don’t power on after a change.
What did we learn in this Veeam Stories: vCloud Director disaster recovery? Technology is continuously improving, and new ways to store your applications and workloads are coming every day, but there is something that never changes: ALWAYS BACKUP your systems.
You should always backup your Infrastructure, whether it is an ESXi infrastructure, Cloud VMs(on-prem or a Hyperscaller), or even new technologies like Kubernetes and Containers. On those, Veeam also provides backups with its tool called Kasten. Regardless of your infrastructure, backups are essential and can save a lot of hours and headaches for administrators and management. Backups are crucial, and you only miss them when you need them. And sooner or later, you will need them one day.
Thanks for spending some minutes reading this first Veeam Stories and this example of a disaster that was prevented and minimized but could be a significant problem and lost thousands of hours of work. Because there are still(to this day) some management that doesn’t understand the new technology and new systems and how Virtual Infrastructures work.
Share this article if you think it is worth sharing. If you have any questions or comments, comment here or contact me on Twitter.