XenServer Host Stuck In Maintenance Mode (Server Still Booting)

Here I am with another XenServer issue that was easy to solve but quite difficult to troubleshoot. I have applied a few XenServer updates (I am running XenServer 6.5 FP1), but I started having issues after installing XS65ESP1022 and XS65ESP1023.

Specifically, after rebooting the host following the installation of these two updates, I noticed that my host would not come out of maintenance mode anymore. If I tried to do so from XenCenter I would get an error message that said Server still booting. This of course was not true, since I could access the console on the host.

After a lot of trial and error I found a solution, but it turns out there were at least a couple of problems that I needed to solve before being up and running again.

The background

The first thing that I noticed was that, if I restarted the toolstack, the server would exit maintenance mode, but as soon as I rebooted it, it would be stuck in maintenance mode all over again.

XenServer Restart Toolstack

In addition to this, even after restarting the toolstack, I noticed that one of my drives was unplugged. If I right-clicked on the drive and selected Repair… I would receive this error: Logical Volume mount/activate error.

This looked more like a Linux issue than a XenServer one, so I tried to solve it by running fsck on the drive, but no luck as I kept receiving this error instead:

fsck.ext3:

fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
fsck.ext2: Device or resource busy while trying to open /dev/sdc
Filesystem mounted or opened exclusively by another program?

So at this point I ran top to see if there was something that was already running on the disk and I found out that fsck was already running on the disk. I left it to finish and tried repairing the disk again from XenCenter, but the repair process would fail with the same Logical Volume mount/activate error I was getting before.

The host was not in emergency mode (xe host-is-in-emergency-mode returned false) and, apparently, there is no way to remove an installed XenServer update without reinstalling XenServer from scratch and restoring from backup.

Since I wanted to avoid doing this, it was time to get my hands dirty with the CLI once more…

The solution

Since the fsck process had completed, and the drive was showing as unplugged, I tried to manually plug the drive again from the CLI.

  1. Find the UUID of the unplugged SR (you can do so easily from XenCenter)XenCenter get SR UUID
  2. Run xe pbd-list sr-uuid=UUID_from_step_1 params=allxe-pbd-listThe highlighted line is the UUID of the pbd, which you are going to need in the last step
  3. Run xe pbd-create host-uuid=HOST_UUID sr-uuid=SR_UUID (all values you need to run this command can be seen in the output of the previous command)
  4. Run xe pbd-plug uuid=pbd_UUID

You should now see your SR plugged again to your XenServer host. If you reboot now, things should be working as expected and the host should not enter maintenance mode anymore.

14 Comments

  1. tks to GOD and for you too, its worked!!!.

  2. the command time delay it?

  3. Why do you think this happened?

    • Without any other info, I am only left to assume that something went wrong during the installation of one of the updates. Perhaps one of them was not installed in a clean way, or something broke during the reboot.

      At this point in time, it would be difficult to find out more, unless this happens again in the future.

      • I just encountered something very similar while applying updates to 2 server (master and slave) in a pool. The master rebooted, all was fine. The slave did NOT come back up happy and is displaying symptoms quite similar to yours. I’m working my way through the steps in tutorial, it’s currently having a very long think about the xe pbd-list command.

Leave a Reply

© 2017 Daniel's TechBlog

Theme by Anders NorénUp ↑

%d bloggers like this: