I recently had an issue with vmware’s vCenter where all virtual machines and hosts were showing as “Disconnected”. Nothing could be done in the control panel, but it functioned otherwise. Doing a quick search online, I didn’t see many results, so I restarted the instance, hoping the issue would self-resolve. However, the instance didn’t come back up after a restart. I explain how I resolved the problem, and give general troubleshooting tips to find problems.
Connecting to the virtual machine with vSphere, I could see the system was stuck with the message “Waiting for vpxd to initialize”. Giving it a few minutes to think, I restarted the instance again and it got stuck in the same place. Doing a quick search online, I quickly found a few reasons an instance may not start up: the host name may not resolve back to the correct computer, or the disk(s) may be full.
Since the instance will not start normally, we will need to run a bash shell to troubleshoot (note: any runlevel or selection you make in GRUB will give you a standard boot, which gets stuck if the instance won’t start normally). Follow this vmware article to run a bash prompt. [If the article link breaks, press ‘p’ in the GRUB menu, enter the vCenter password (“vmware” if you didn’t change it). Then hit ‘e’ to edit, highlight the second line, hit ‘e’ to edit and append ‘init=/bin/bash’. Press ‘enter’ then ‘b’ to boot. This will dump you to a bash prompt.]
I found a vmware knowledge base article and a couple blog posts online stating if the disk is full, then the system may not start correctly. The ‘df’ command told me there wasn’t a space issue on the disks noted in the article, but it is out of space in the root directory!
Ok, this gets us started on the right path. We know the /dev/sda3 disk mounted to the root directory is full, but how do we locate what is taking up all the space? It may take some trial and error, and this command may help you narrow it down.
du -xah / | sort -h | tail
This runs the ‘du’ command: ‘x’ tells it to stop at file system boundaries (skips /proc, etc), ‘h’ shows the sizes in a readable format, and ‘a’ shows all files and folders. The results are piped through ‘sort’ with the human readable flag, and the sorted results are piped into ‘tail’ to only show the last few in the list. If you need more results, remove the last section of the command. It would be a good idea to save the output to a file so you can view it easier.
As you can see below, this command tells me the /var/log/ldapmessages file is taking up a significant portion of the disk: a whopping 6.0 gigabytes of the allocated 9.8 gigabytes. This specific instance has been running for over a year, so this log contains a lot of old information.
Looking at the ldapmessages file, it seems to only be a log, which should be safe to clear. Be sure to check the file before deleting or overwriting them! Once the log was empty, disk usage was back to normal, and the system agreed.
Once I verified there was enough disk space for normal operation, I restarted the instance with the ‘reboot’ command, and vCenter started normally. Be sure to fix what is taking all the space, otherwise this issue will happen again, eventually.