When Rendering Your Laptop Unbootable Is a Learning Experience

Introduction. What happened?

I was at a customer site assisting some performance testing of Linux hosts connected to NetApp via NFS. The customer was using sio_ntap which is a NetApp provided load generator and I was experimenting with the same on my laptop. I wasn’t paying close attention and instead of pointing the tool at a file I pointed it at /dev/sda2 which is /boot on my system. I did this dumb thing as root of course. I didn’t realize what I’d done until I got a popup warning me a few minutes later that /boot was nearly full.

I looked at it and found /boot had no content. I didn’t panic or even react as the realization hit. I knew I could suspend the laptop normally and finish the day with the customer and decide how to repair it. I knew that the laptop would no longer boot off of the hard drive.

Now whacking /boot is not necessarily a big deal. What made this more complex, though, is that my / and swap are in an LVM container and encrypted.

So the question is sometimes asked, why add the complexity of encrypting the whole system including swap? The easier answer is that my employer requires it and this system was sent to me encrypted with a Windows OS protected with bitlocker.

The more complete answer is that it just makes sense and really all portable systems should be shipping this way by default. I’ve been encrypting the root fs and swap on my systems since 2009. Apple and Google are encrypting their mobile OSs by default now. Apple and Microsoft offer tools to do whole drive encryption and it’s now built into the Ubuntu installer as well. If this laptop is lost or stolen, my private and company private data on it is reasonably safe.

By encrypting the root fs and swap I can protect the system when it is suspended to disk as well. With swap encrypted, I have to enter the passphrase at power up to decrypt and the suspended image is loaded back up and resumed normally. The only unencrypted bits on the system are /boot and /boot/efi which of course have no user data.

Initial fixes: reinstall a kernel and initramfs-tools

There were a number of paths to take to recover including, of course, full recovery from a recent backup but since so much of the system was intact and it was only kernel packages that required reinstallation, I chose to work through that approach. The first thing to do was resume from suspend to RAM and figure some things out:

sharney@zenarcade:~/scottharney.com$ df -h
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-root  223G   60G  152G  29% /
none                         4.0K     0  4.0K   0% /sys/fs/cgroup
udev                         5.8G   12K  5.8G   1% /dev
tmpfs                        1.2G  1.6M  1.2G   1% /run
none                         5.0M     0  5.0M   0% /run/lock
none                         5.9G   90M  5.8G   2% /run/shm
none                         100M   56K  100M   1% /run/user
/dev/sda2                    237M   97M  128M  44% /boot
/dev/sda1                    511M   69M  443M  14% /boot/efi
sharney@zenarcade:~/scottharney.com$ dpkg -l | grep kernel |grep linux| grep ^ii
ii  linux-firmware              1.127.11           all                Firmware for Linux kernel drivers
ii  linux-generic               3.13.0.46.53       amd64              Complete Generic Linux kernel and headers
ii  linux-headers-3.13.0-45     3.13.0-45.74       all                Header files related to Linux kernel version 3.13.0
ii  linux-headers-3.13.0-45-gen 3.13.0-45.74       amd64              Linux kernel headers for version 3.13.0 on 64 bit x86 SMP
ii  linux-headers-3.13.0-46     3.13.0-46.79       all                Header files related to Linux kernel version 3.13.0
ii  linux-headers-3.13.0-46-gen 3.13.0-46.79       amd64              Linux kernel headers for version 3.13.0 on 64 bit x86 SMP
ii  linux-headers-generic       3.13.0.46.53       amd64              Generic Linux kernel headers
ii  linux-image-3.13.0-45-gener 3.13.0-45.74       amd64              Linux kernel image for version 3.13.0 on 64 bit x86 SMP
ii  linux-image-3.13.0-46-gener 3.13.0-46.79       amd64              Linux kernel image for version 3.13.0 on 64 bit x86 SMP
ii  linux-image-extra-3.13.0-45 3.13.0-45.74       amd64              Linux kernel extra modules for version 3.13.0 on 64 bit x86
ii  linux-image-extra-3.13.0-46 3.13.0-46.79       amd64              Linux kernel extra modules for version 3.13.0 on 64 bit x86
ii  linux-image-generic         3.13.0.46.53       amd64              Generic Linux kernel image
ii  linux-signed-generic        3.13.0.46.53       amd64              Complete Signed Generic Linux kernel and headers
ii  linux-signed-image-3.13.0-4 3.13.0-46.79       amd64              Signed kernel image generic.
ii  linux-signed-image-generic  3.13.0.46.53       amd64              Signed Generic Linux kernel image
sharney@zenarcade:~/scottharney.com$
sharney@zenarcade:~/scottharney.com$ mount | grep sda2
/dev/sda2 on /boot type ext2 (rw)
sharney@zenarcade:~/scottharney.com$ cat /etc/fstab
## /etc/fstab: static file system information.
##
## Use 'blkid' to print the universally unique identifier for a
## device; this may be used with UUID= as a more robust way to name devices
## that works even if disks are added and removed. See fstab(5).
##
## <file system> <mount point>   <type>  <options>       <dump>  <pass>
UUID=02f39ef7-d4d5-4c8a-a1b9-e5eb434750ab /               ext4    errors=remount-ro 0       1
## /boot was on /dev/sda2 during installation
##UUID=a947feb0-1bef-4e9f-9af2-d87018206a5a /boot           ext2    defaults        0       2
## /boot/efi was on /dev/sda1 during installation
##UUID=FAB4-6BF5  /boot/efi       vfat    defaults        0       1
/dev/mapper/ubuntu--vg-swap_1 none            swap    sw              0       0
UUID=FAB4-6BF5  /boot/efi       vfat    defaults        0       1
UUID=e91b3f49-d7ab-403d-9287-db82ef2bc5fc       /boot   ext2    defaults        0       2
sharney@zenarcade:~/scottharney.com$ 

The output shows me where things are and what kernel images I have – or rather had – installed. I can see where /boot is mounted and that it’s an ext2 filesystem which I knew but it’s worth verifying.

The other good thing to do since my filesystem was largely intact was attach a USB drive and get a good fresh full backup just in case. tar -cvpjf /media/mydrive/fullbackup.tar.gz --directory=/ --exclude=proc --exclude=sys --exclude=dev/pts --exclude=/media . Note: The period at the end of that command matters.

I happened to have an Ubuntu 14.04 USB drive handy but otherwise the next step would have been to make one so I could boot off of it. Once booted off of USB, I needed to go ahead and run mke2fs /dev/sda2 to get an actual filesystem to mount a /boot that I could start working with. After that I needed to mount up my encrypted root filesystem and chroot into it so I could install kernel packages and such. I followed this procedure

root@ubuntu # blkid 
/dev/sda3: UUID="3ca5d400-822c-4e58-ada6-3528c6fcb7bb" TYPE="crypto_LUKS" 
/dev/sda1: UUID="FAB4-6BF5" TYPE="vfat" 
/dev/sda2: UUID="e91b3f49-d7ab-403d-9287-db82ef2bc5fc" TYPE="ext2" 
/dev/mapper/sda3_crypt: UUID="A34D3T-o1r1-OhMI-Gd5M-mgyi-GOHn-hdAGXk" TYPE="LVM2_member" 
/dev/mapper/ubuntu--vg-root: UUID="02f39ef7-d4d5-4c8a-a1b9-e5eb434750ab" TYPE="ext4" 
/dev/mapper/ubuntu--vg-swap_1: UUID="1f75756d-644c-423b-a7bf-23338f5122d4" TYPE="swap" 
root@ubuntu # cryptsetup luksOpen /dev/sda3 mycryptvol
root@ubuntu # mkdir /media/mycryptvol
root@ubuntu # vgscan
root@ubuntu # vgchange -ay
root@ubuntu # lvdisplay
  --- Logical volume ---
  LV Path                /dev/ubuntu-vg/root
  LV Name                root
  VG Name                ubuntu-vg
  LV UUID                Ufdirg-27m0-wSN2-xU8C-VL8m-mAL3-TYE0ba
  LV Write Access        read/write
  LV Creation host, time ubuntu, 2014-08-04 09:58:58 -0500
  LV Status              available
  # open                 1
  LV Size                225.85 GiB
  Current LE             57817
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:1

  --- Logical volume ---
  LV Path                /dev/ubuntu-vg/swap_1
  LV Name                swap_1
  VG Name                ubuntu-vg
  LV UUID                OKvz8H-hhug-FnU6-eskM-X5ar-hjlB-5Whs3m
  LV Write Access        read/write
  LV Creation host, time ubuntu, 2014-08-04 09:58:59 -0500
  LV Status              available
  # open                 2
  LV Size                11.88 GiB
  Current LE             3042
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:2
root@ubuntu # mount /dev/ubuntu-vg/root /mnt
root@ubuntu # mount --bind /dev /mnt/dev
root@ubuntu # mount --bind /dev/pts /mnt/dev/pts
root@ubuntu # mount --bind /proc /mnt/proc
root@ubuntu # mount --bind /sys /mnt/sys
root@ubuntu # mount /dev/sda2 /mnt/boot
root@ubuntu # mount /dev/sda1 /mnt

As you can see from above, I confirmed the partition that contained my lvm root and swap as /dev/sda3 with blkid . I followed this post to remind myself how to mount everything up get into the chroot appropriately.

The next step was to reinstall packages related to the kernel, initrd, and grub for booting I did a dpkg install --reinstall for all of the linux-image* and linux-signed-image* packages as well as grub, grub-efi, initramfs and initramfs-tools . I followed this grub EFI fix post as well.

At this point I had a populated /boot and the EFI looked good. I knew it wouldn’t boot yet but I did want to see how far I got. I unmounted everything in reverse order, rebooted off the hard drive and it turns out, I’d gotten pretty far. Grub did start but couldn’t find the kernel.

Boot-Repair

So the next thing I did was try Boot-Repair which I had come across in my searches. As the post describes, you can quickly install it directly onto a booted ubuntu-usb and take a crack at it. I figured since boot encryption was in fact built into the ubuntu install that it might work.

I ended up trying a couple of iterations and reboots with my root filesystem both completely unmounted and mounted within the USB environment. The one result I did find was that the system actually got a little farther at boot actually getting to the stage where the initrd was trying to mount root but of course it hadn’t done the encryption bits. So now I largely knew that I had to get a proper kernel boot line in /boot/grub.cfg and I had to get an initrd image built with the decryption capabilities built into it.

My method for finding out the above, of course, was simple google searches based on what I was learning as I went with search strings such as “grub encrypted lvm” which led me to posts like this which helped me get further down the path. I also took a little side trip to do an lvm config restore with vgcfgrestore

The github post that worked

One of those google searches on “grub.cfg cryptsetup luksOpen” led me to a github gist that got me through.

There were a few important bits in that post that allowed me to construct the chain. The first was /etc/crypttab which I verified as correct. Some posts I read along the way imply that the presence of this file should trigger update-initramfs to build the right bits into the initrd but I found that I also needed /etc/initramfs-tools/conf.d/cryptroot as well and CRYPTSETUP=y in /etc/initramfs-tools/initramfs.conf

root@zenarcade:~/Downloads# cat /etc/initramfs-tools/conf.d/cryptroot 
CRYPTROOT=target=sda3_crypt,source=/dev/disk/by-uuid/3ca5d400-822c-4e58-ada6-3528c6fcb7bb,lvm=ubuntu-vg
root@zenarcade:~/Downloads# cat /etc/crypttab 
sda3_crypt UUID=3ca5d400-822c-4e58-ada6-3528c6fcb7bb none luks,retry=1,lvm=ubuntu-vg
root@zenarcade:~/Downloads# tail /etc/initramfs-tools/initramfs.conf 

DEVICE=

##
## NFSROOT: [ auto | HOST:MOUNT ]
##

NFSROOT=auto

CRYPTSETUP=y

I should note at this point that I actually misppelled the variable as “CRYPTOROOT” instead of “CRYPTROOT” . Booting with a grub command line that eliminated noquiet and nosplash so I could watch the verbose boot up led me to search and find this “cryptsetup not found on boot” post which helped me realize my error. I also used the valuable lsinitramfs to inspect the initrd image contents which help me find that /sbin/cryptsetup was missing from my initrd builds.

Once I got past my spelling error, I had a /etc/crypttab, /etc/initramfs-tools/conf.d/cryptroot, /etc/initramfs-tools/initramfs.conf and /etc/default/grub which included the commandline bits to decrypt on boot. I could run update-intramfs -k all -c and update-grub inside my chroot, confirm initrd content with lsinitramfs, exit again and reboot….

Success. I was asked for the decryption passphrase in the boot splash screen as usual and booted right up.

Wrap-Up: Lessons learned

Of course I could have gone through a full system recovery from backup but that would likely have been just as lengthy. The process would have been to scrap the system and build a plain ubuntu and then restore over it. Alternately, I could have booted of the USB stick and restored to mounted drives. I likely would have had some additional clean up to do as well.

  • Don’t panic. You can fix this
  • You will make additional mistakes as you go. See the first bullet.
  • Take breaks. step away for a bit. Go do something else to clear your head.
  • Backup backup backup. I did use my tarball and I did use lvm configs (which are automatically backed up when you change lvm config)
  • Document as you go. This is, of course, how I was able to create this post. You never know when you might have to do something like this again.
  • Have boot USB drives handy.

Update 3/16/2016

I had to perform this same process on a different laptop with the same layout, encrypted root and swap under LVM. I followed the process above after booting off of an Ubuntu 14.04 USB stick. However, I wasn’t getting a /dev/ubuntu--vg-root visible to mount. On the above example I was booting into an intramfs from the system boot drive. I did a little search and found this post which reminded me that I needed to have udev update the device mapper tree, via udevadm trigger. Once I did that the /dev entry was created.

This kind of low level rescue work is something you don’t do very often, but here I have done it twice in a few months. The high level steps and layers are familiar to me and you understand that when the system is booted, it’s all orchestrated. In this case you’re doing it by hand. And it’s not all that dissimilar on other systems. I’ve used many rebuilds of device trees in the past with mknod on ancient linux systems or cfgadm and later variants on Solaris. So the overall patterns are familiar even if the specific steps and syntax details change.

I also fixed a typo in the example command execution list updating vchange to vgchange

 Share!

 
comments powered by Disqus