In our Veeam Server sometimes for several reasons, we can lose a backup repository disk, LUN or Volume. When this happens, the backup will stop working and then we need to reconfigure the backups, and the Backup Repository in this Veeam how to reconfigure backups from a dead backup repository, I will explain how to do this.
In my case I had 4 repositories in my Scale-Out repository and lost 2.
In my Veeam Backup Repositories I have the two Disks unavailable.
When you run the Job, you get this for all VMs that are a Full or incremental backup in the dead Backup Repositories.
What did happen?
Well, this is not the first time I have issues with disks formatted with ReFS and again 2 of those iSCSI LUNs that were attached to my Veeam Server as local disks and formatted as ReFS, had an issue and partition went to RAW Data. When a ReFS partition goes to RAW Data, you have 50/50 of changes that your data is all lost and is not possible to recover. Even you use a recover data tool, it is not 100% sure that you will be recovered those backups. Unless, of course, you pay for a company to recover that data.
What happened was that my Backup Repository Storage (QNAP TVS-EC1680U-SAS-RP) had warnings that were full and 0% space. Most of the time, when a Storage System reaches 100%, Volumes state will be read-only.
I login to QNAP release some space, and all was good. But somehow some of the LUNs that were formatted with ReFS in my Veeam Server didn’t like that and went RAW Data. I have 6 ReFS from that QNAP, and only two was corrupted. If was the read-only that trigger this problem, I don’t know, but pretty sure that was. What I know is that now I lost 60/70% of the backup from one area, and only 2 of the 6 ReFS partitions did not survive.
Sometimes it is possible to recover the ReFS using a tool called ReFSutil. But unfortunately is only available in Windows Server 2019. You cannot use it in Windows Server 2016.
I will not go through how to use ReFSutil and how to recover a ReFS partition; Andrew has a good article on how to recover this type of ReFS partition using ReFSutil. Before you give up, try to follow Andrew tips and see if you can recover your ReFS and data. In my case, it was not possible to recover.
In my case was a Windows Server 2016, so no ReFSutil to use, but I attached the LUNs to a Windows Server 2019 and try to recover, no luck. So after 2 days, I give up recovering the partition. It was not possible to wait longer since backups were stopped until all disks are available.
How to fix or reconfigured your Veeam Backup to clean all the lost backup and remove the fault disks and add new ones.
First, while trying to recover the ReFS (bear in mind that this task did take two days before I give up) and you want your Backup continues to work, or at least backup some of the VMs, you need to do some tasks.
Note: To continue to backup the VMs and to use your Scale-Out Repository, you need to have enough free space to store the backups (mainly that the first backups will be full backups). In my case was not possible since I did not have enough free space in the remaining Backup Repositories(local disks). The ones that did survive the ReFS issue.
First, you need to Seal the backup faulty Backup Repositories that are inside the Scale-Out Repository.
What Seal Extend does?
It will free that Repository and will not accept more backups, and any backup will not try to use that repository to store any backups.
Go to Backup Scale-Out Repository, click in your Scale-Out Repository to show all Backup Repositories, and right-click in the faulty Repository and select Seal extend.
In my case, I need to do this process for the two faulty Backup Repositories.
Note: As the warning says, if the extend(backup repository) have any full backup of the job that you try to run, the next backup will be a Full Backup (regardless of the configuration you have in your job).
Besides this, I also enable in the Scale-Out Repository the option “Perform full backup when required extend is offline”.
Note: If you have enough space on in all your Backup Repositories, in normal daily operations, this option should always be enabled. In case you lose a disk, backups continue to run.
In your Scale-Out repository, select Properties, and in the Advanced option you will find this option.
After the above changes, the Veeam Backup jobs should be running while you are trying to fix your ReFS. Or any other issue that you may have in a Veeam Backup backup repository (LUN, or Volume).
In my case was not possible to recover, so I gave up and deleted the Disks and create two new LUNs in the Storage and added back again to the Veeam Backup Server.
First, I need to remove the dead Backup Repositories from the Scale-Out Repository.
Again select Properties in your Scale-Out Repository, and in the Performance Tier tab, select the Backup Repositories and remove them.
Now you can also add the new Backup Repositories.
Remove VMs dead retention points.
Even now your Veeam Backup Server is back, and you can do normal backups, you still have a lot of VMs Backup dead in Veeam Backup Server. Any Full backup, or incremental that was in those dead Backup Repositories are now not accessible and need to be removed from your Veeam Backup Server.
Go to Home and Backup in Disk. Here you will see all the Backup Jobs and drilling down all the VMs Backup. Right-click and properties you will the full Backup Chain for that job. All the retention points that are gray out are the ones that are dead and were stored in the dead Backup Repositories. We need to remove them all.
There are two options, Forget and Remove from Disk. We can just click Forget, and all will disappear from the inventory. But this will not delete any incremental that exists without a Full Backup, and that increme,ntal is pointless without the Full Backup file.
So the best option here, and to safe space also, is to select the option Remove from Disk. Here we have two options, we can just do that for the only VM that we have select and simply for all Unavailable Backups, for this case, it makes sense for all backups since we want to clean all the dead backups.
Note: You need to do this operation per job. This option only VMs retention points in the selected Backup Job.
After the above steps, our Veeam Backup Server is back to normal operation. Unfortunately, if you don’t have a Second Backup destination (like this environment), then this is what happens. You lose all your backups.
Regarding ReFS, I think it is a good File System and is very good for Backups and this type of data but honestly is still has a lot of issues. Even Microsoft has fixed a lot of the bugs; it still not 100% bulled proof(is their one that is?) and is very sensitive regarding the LUNs that are present to the Windows Server. Or issues the read-only, or even issues with the RAIDs and RAID controllers.
This is my third time that I have issues with my ReFS partitions in some of our Veeam Backup Servers, and I am thinking of moving back to NTFS.
I hope this information was useful.
Note: Share this article if you think it is worth sharing. If you have any questions or comments, comment here or contact me on Twitter.