One of our had an issue vCloud Director stop working with partition root full.
When the team informs me that one vCD (a Linux version) was down, I needed to troubleshoot what was happen (since it was working 1h before).
The first thing was to login with ssh to vCD and check services.
1 2 3 4 5 6 |
[root@vCloud ~]# service vmware-vcd status vmware-vcd-watchdog is not running vmware-vcd-cell is not running [root@vCloud ~]# |
So service was down, and even before I check the logs, I check partitions.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
[root@vCloud VCD]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 4.8G 0 4.8G 0% /dev tmpfs 4.9G 16K 4.9G 1% /dev/shm tmpfs 4.9G 9.0M 4.8G 1% /run tmpfs 4.9G 0 4.9G 0% /sys/fs/cgroup /dev/mapper/centos_vCloud-root 50G 50G 0 100% / /dev/sda1 497M 241M 257M 49% /boot /dev/mapper/centos_vCloud-home 95G 33M 95G 1% /home tmpfs 984M 0 984M 0% /run/user/0 [root@vCloud /]# |
Immediately I know what the problem was, the lack of space in the root partition. But why? What was consuming 50Gb of space?
I check the log and tmp area to see if big logs were consuming this space, not consuming much space. So the problem was not here.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
[root@vCloud /]# du -h --max-depth=1 / 216M /boot 16K /dev 0 /home 0 /proc 9.0M /run 0 /sys 37M /etc 72K /root 1.9G /var 2.7G /usr 0 /media 0 /mnt 46G /opt 0 /srv 12K /VCD 48M /tmp 50G / [root@vCloud /]# |
But /opt partition was consuming 46Gb, which is a huge size for this type of vCD (not small, but not so big).
So I start drilling down until I arrive in this folder. I notice this was the one that was filling up the /root partition /opt/vmware/vcloud-director/data/transfer.
I immediately suspect that there was no Shared Folder mounted on the vCD /data/transfer folder. Mount partition that I confirmed in the above initial df -h. But was looking for the full partition and did not even remember to check a mount partition.
1 2 3 4 5 6 |
[root@vCloud /]# df -h /opt/vmware/vcloud-director/data/transfer Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos_vCloud-root 50G 50G 0 100% / [root@vCloud /]# |
Checking the mount and the folder, I confirm the problem was in the transfer folder and confirmed that the transfer folder was mounted on the /root partition.
Checking the transfer folder was a lot of temporarily vmdk files.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
[root@vCloud /]# cd /opt/vmware/vcloud-director/data/transfer [root@vCloud transfer]# ls 115c128a-655b-42ad-9132-d6238b5caaa7 @Recently-Snapshot @Recycle [root@vCloud transfer]# cd 115c128a-655b-42ad-9132-d6238b5caaa7 [root@vCloud 115c128a-655b-42ad-9132-d6238b5caaa7]# ls -la total 42908132 drwx------. 2 vcloud vcloud 4096 May 5 17:54 . drwxrwxr-x. 5 vcloud vcloud 4096 May 5 17:07 .. -rw-------. 1 vcloud vcloud 16230888960 May 5 17:54 vm-669e59f1-d54a-46ae-9a51-ff528ffd35d6-disk-1.vmdk -rw-------. 1 vcloud vcloud 8684 May 5 17:54 vm-669e59f1-d54a-46ae-9a51-ff528ffd35d6-disk-2.nvram -rw-------. 1 vcloud vcloud 11476079104 May 5 17:22 vm-a26f4fdc-9873-4025-b69b-2cb48f66cfb0-disk-0.vmdk -rw-------. 1 vcloud vcloud 16230888960 May 5 17:52 vm-a26f4fdc-9873-4025-b69b-2cb48f66cfb0-disk-1.vmdk -rw-------. 1 vcloud vcloud 8684 May 5 17:52 vm-a26f4fdc-9873-4025-b69b-2cb48f66cfb0-disk-2.nvram [root@vCloud /]# |
So it was obvious that was not added a mount in the initial install in this vCloud Director, and the install did set the root partition for the transfer folder. Obviously, one day users will start to use many uploads, this partition would be without space, and then service will stop because there is no space in the system to run the services, logs, etc.
I double-check vCenter tasks to see if there were any uploads before the team complains that vCD was down.
As we can see above, several OVF templates export, which triggers the problem, and /root partition was full because of these temporary files that should have been in a shared folder with enough space for this type of task.
So now I need to create a new Shared Folder in one of our QNAPs and mount it in this vCD.
1 2 3 |
mount -t nfs 192.168.6.42:/vCD-Shared-01 /opt/vmware/vcloud-director/data/transfer/ |
I change permission and ownership of the /opt/vmware/vcloud-director/data/transfer
1 2 3 4 |
chown -R vcloud:vcloud /opt/vmware/vcloud-director/data/transfer chmod -R 775 /opt/vmware/vcloud-director/data/transfer |
Added to mount to fstab to be mounted on a vCD reboot.
1 2 3 |
192.168.6.42:/vCD-Shared-01 /opt/vmware/vcloud-director/data/transfer/ nfs defaults 0 0 |
Tested fstab mount with mount -a, all was ok. Then I reboot the vCD(not mandatory, but I want to double-check the mount on a reboot).
After the reboot, all services were ok, and the cell was running without any issues.
So as we can see in this simple blog post, the /opt/vmware/vcloud-director/data/transfer continues to be one of the first issues in the vCD. I can’t count how often vCD had issues because of this mount (not mounted, wrong permissions, wrong ownership, etc.).
In this case, it was easy to identify and very easy (if you know where to look) to fix and start the services. Since most of the time, partition full is the reason for services not to start or stop.
I hope this blog post vCloud Director stop working with partition root full was useful to troubleshoot and find the issues in your vCloud Director infrastructures.
Share this article if you think it is worth sharing. If you have any questions or comments, comment here or contact me on Twitter.
Leave A Comment
You must be logged in to post a comment.