Disclaimer: I am not an investment advisor. When I describe my own trading activities, it is not intended as advice or solicitation of any kind.

21 February 2012

Recovering an Unbootable Kernel Image

This is a quick how-to on rebuilding an Arch Linux initramfs image when both the main and fallback images are unbootable. This can happen if something goes wrong during an update to the "linux" package and the problem isn't detected and solved before a reboot. This post draws information from the Arch Wiki articles Change Root and mkinitcpio, as well as an Arch Forum post discussing a path problem causing an unbootable image. All the information necessary to recover is contained within these three links, but I felt that a cookbook would be helpful, especially in the stressful moment when a vital computer is sitting at the limited shell prior to booting the kernel.

This is intended to solve the specific problem when the Arch bootstrap claims it can't find the boot drive, and when it is very unlikely that the hardware is actually having a problem. In my case, I experienced this with regularity on a Virtual Machine, which was not having virtual hardware failure. It turned out to be exactly the problem in the forum post described above, but first I needed to recover the system.

Step 1 - Get It Booted
Your system isn't going to boot on its own: both the primary and fallback boot images are refusing to behave. So go get yourself an Arch ISO and burn it to CD or a USB. If you've already installed Arch, you know how to do this. Or read the Arch Wiki article about it if you've forgotten. Use the download and burn process as an opportunity to take a deep breath - that will help with the remaining steps.

Boot from the ISO, and choose the first option from the ISO's boot menu. But don't start Arch setup. Instead get the network running so you can update with pacman.

# aif -p partial-configuration-network

Answer the prompts. If this first step doesn't go well, don't sweat it - it just means you won't have internet access, which probably isn't required anyway.

Step 2 - Take Stock
We need to manually mount all the necessary partitions, and to do that we need to know what they are. If you remember how you partitioned your disk, that's great. But if you don't remember exactly which /dev/sdaX goes where, you'll need to do a little guessing. Luckily, fdisk can help. Note that I've trimmed the output some for brevity. I've also cheated a little and typed in some partitions from gparted because my /dev/sdb uses GPT.

# fdisk -l

Disk /dev/sda: 100.0 GB, 100030242816 bytes
   Device Boot     Start         End      Blocks   Id  System
/dev/sda1   *         63     3903794     1951866   83  Linux
/dev/sda2        3903795   195371567    95733886+  83  Linux


Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
   Device Boot     Start         End      Blocks   Id  System
/dev/sdb1             31 3871748080   3871748047   83 Linux
/dev/sdb2     3871748081 3907029134     35281054   xx Linux-Swap

On my system I have a roughly 2GB bootable ext2 partition on /dev/sda1 and the remainder of the 100GB SSD is ext4. I have a 1.8TB data partition at /dev/sdb1, and my swap file is /dev/sdb2. This is enough information for me to remember that sda2 mounts to / and sdb1 mounts to /caviar but contains the /var directory, which is sym-linked over from sda2.

If fdisk isn't enough to jog your memory, you may need to test-mount and explore a little.

Step 3 - Mount and Prep
Once you know which partitions you need to mount where, get it all mounted under /mnt/arch. Also mount the proc, sys, and dev directories so they'll be available to your chrooted sandbox.

# mount /dev/sda2 /mnt/arch
# mount /dev/sda1 /mnt/arch/boot
# mount /dev/sdb1 /mnt/arch/caviar
# mount -t proc proc /mnt/arch/proc
# mount -t sysfs sys /mnt/arch/sys
# mount -o bind /dev /mnt/arch/dev

In case you need to update with pacman, you'll want network access. If you got the network running in Step 1, copy the resolv.conf down into the chroot world.

# cp -L /etc/resolv.conf /mnt/arch/etc/resolv.conf

Step 4 - Chroot and Fix
Next, jump into your sandbox.

# chroot /mnt/arch /bin/bash

Now that you're here, feel free to poke around in logs to see what might have gone wrong. In my case, I had stupidly run pacman -Syu --noconfirm from a cron job without setting the PATH to include /sbin. As a result, the update script failed to call depmod but then blindly ran mkinitcpio on the incomplete map files, rendering the image stillborn. The depmod result should really be considered during a linux upgrade so that mkinitcpio doesn't trash the boot image, IMHO, but what do I know.

For now, let's assume you have the same problem I did. To resolve it, all I needed to do was the following:

# /sbin/depmod
# mkinitcpio -p linux

As a precaution, I also did a full update with pacman and made sure that everything went smoothly.

# pacman -Syu

All was well, and I was able to reboot into my system again. 

1 comment: