Introduction. What happened?
I was at a customer site assisting some performance testing of Linux hosts
connected to NetApp via NFS. The customer was using sio_ntap
which is a
NetApp provided load generator and I was experimenting with the same on my
laptop. I wasn’t paying close attention and instead of pointing the tool at a
file I pointed it at /dev/sda2
which is /boot
on my system. I did this
dumb thing as root of course. I didn’t realize what I’d done until I got a
popup warning me a few minutes later that /boot
was nearly full.
I looked at it and found /boot
had no content. I didn’t panic or even react
as the realization hit. I knew I could suspend the laptop normally and finish
the day with the customer and decide how to repair it. I knew that the laptop
would no longer boot off of the hard drive.
Now whacking /boot
is not necessarily a big deal. What made this more
complex, though, is that my / and swap are in an LVM container and encrypted.
Sidebar: why encrypt your system?
So the question is sometimes asked, why add the complexity of encrypting the whole system including swap? The easier answer is that my employer requires it and this system was sent to me encrypted with a Windows OS protected with bitlocker.
The more complete answer is that it just makes sense and really all portable systems should be shipping this way by default. I’ve been encrypting the root fs and swap on my systems since 2009. Apple and Google are encrypting their mobile OSs by default now. Apple and Microsoft offer tools to do whole drive encryption and it’s now built into the Ubuntu installer as well. If this laptop is lost or stolen, my private and company private data on it is reasonably safe.
By encrypting the root fs and swap I can protect the system when it is
suspended to disk as well. With swap encrypted, I have to enter the passphrase
at power up to decrypt and the suspended image is loaded back up and resumed
normally. The only unencrypted bits on the system are /boot
and /boot/efi
which of
course have no user data.
Initial fixes: reinstall a kernel and initramfs-tools
There were a number of paths to take to recover including, of course, full recovery from a recent backup but since so much of the system was intact and it was only kernel packages that required reinstallation, I chose to work through that approach. The first thing to do was resume from suspend to RAM and figure some things out:
sharney@zenarcade:~/scottharney.com$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-root 223G 60G 152G 29% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 5.8G 12K 5.8G 1% /dev
tmpfs 1.2G 1.6M 1.2G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 5.9G 90M 5.8G 2% /run/shm
none 100M 56K 100M 1% /run/user
/dev/sda2 237M 97M 128M 44% /boot
/dev/sda1 511M 69M 443M 14% /boot/efi
sharney@zenarcade:~/scottharney.com$ dpkg -l | grep kernel |grep linux| grep ^ii
ii linux-firmware 1.127.11 all Firmware for Linux kernel drivers
ii linux-generic 3.13.0.46.53 amd64 Complete Generic Linux kernel and headers
ii linux-headers-3.13.0-45 3.13.0-45.74 all Header files related to Linux kernel version 3.13.0
ii linux-headers-3.13.0-45-gen 3.13.0-45.74 amd64 Linux kernel headers for version 3.13.0 on 64 bit x86 SMP
ii linux-headers-3.13.0-46 3.13.0-46.79 all Header files related to Linux kernel version 3.13.0
ii linux-headers-3.13.0-46-gen 3.13.0-46.79 amd64 Linux kernel headers for version 3.13.0 on 64 bit x86 SMP
ii linux-headers-generic 3.13.0.46.53 amd64 Generic Linux kernel headers
ii linux-image-3.13.0-45-gener 3.13.0-45.74 amd64 Linux kernel image for version 3.13.0 on 64 bit x86 SMP
ii linux-image-3.13.0-46-gener 3.13.0-46.79 amd64 Linux kernel image for version 3.13.0 on 64 bit x86 SMP
ii linux-image-extra-3.13.0-45 3.13.0-45.74 amd64 Linux kernel extra modules for version 3.13.0 on 64 bit x86
ii linux-image-extra-3.13.0-46 3.13.0-46.79 amd64 Linux kernel extra modules for version 3.13.0 on 64 bit x86
ii linux-image-generic 3.13.0.46.53 amd64 Generic Linux kernel image
ii linux-signed-generic 3.13.0.46.53 amd64 Complete Signed Generic Linux kernel and headers
ii linux-signed-image-3.13.0-4 3.13.0-46.79 amd64 Signed kernel image generic.
ii linux-signed-image-generic 3.13.0.46.53 amd64 Signed Generic Linux kernel image
sharney@zenarcade:~/scottharney.com$
sharney@zenarcade:~/scottharney.com$ mount | grep sda2
/dev/sda2 on /boot type ext2 (rw)
sharney@zenarcade:~/scottharney.com$ cat /etc/fstab
## /etc/fstab: static file system information.
##
## Use 'blkid' to print the universally unique identifier for a
## device; this may be used with UUID= as a more robust way to name devices
## that works even if disks are added and removed. See fstab(5).
##
## <file system> <mount point> <type> <options> <dump> <pass>
UUID=02f39ef7-d4d5-4c8a-a1b9-e5eb434750ab / ext4 errors=remount-ro 0 1
## /boot was on /dev/sda2 during installation
##UUID=a947feb0-1bef-4e9f-9af2-d87018206a5a /boot ext2 defaults 0 2
## /boot/efi was on /dev/sda1 during installation
##UUID=FAB4-6BF5 /boot/efi vfat defaults 0 1
/dev/mapper/ubuntu--vg-swap_1 none swap sw 0 0
UUID=FAB4-6BF5 /boot/efi vfat defaults 0 1
UUID=e91b3f49-d7ab-403d-9287-db82ef2bc5fc /boot ext2 defaults 0 2
sharney@zenarcade:~/scottharney.com$
The output shows me where things are and what kernel images I have – or
rather had – installed. I can see where /boot
is mounted and that it’s
an ext2 filesystem which I knew but it’s worth verifying.
The other good thing to do since my filesystem was largely intact was attach a
USB drive and get a good fresh full backup just in case. tar -cvpjf /media/mydrive/fullbackup.tar.gz --directory=/ --exclude=proc --exclude=sys --exclude=dev/pts --exclude=/media .
Note: The period at the end of that
command matters.
I happened to have an Ubuntu 14.04 USB drive handy but otherwise the next step
would have been to
make one
so I could boot off of it. Once booted off of USB, I needed to go ahead and
run mke2fs /dev/sda2
to get an actual filesystem to mount a /boot
that I
could start working with. After that I needed to mount up my encrypted root
filesystem and chroot
into it so I could install kernel packages and such. I
followed this
procedure
root@ubuntu # blkid
/dev/sda3: UUID="3ca5d400-822c-4e58-ada6-3528c6fcb7bb" TYPE="crypto_LUKS"
/dev/sda1: UUID="FAB4-6BF5" TYPE="vfat"
/dev/sda2: UUID="e91b3f49-d7ab-403d-9287-db82ef2bc5fc" TYPE="ext2"
/dev/mapper/sda3_crypt: UUID="A34D3T-o1r1-OhMI-Gd5M-mgyi-GOHn-hdAGXk" TYPE="LVM2_member"
/dev/mapper/ubuntu--vg-root: UUID="02f39ef7-d4d5-4c8a-a1b9-e5eb434750ab" TYPE="ext4"
/dev/mapper/ubuntu--vg-swap_1: UUID="1f75756d-644c-423b-a7bf-23338f5122d4" TYPE="swap"
root@ubuntu # cryptsetup luksOpen /dev/sda3 mycryptvol
root@ubuntu # mkdir /media/mycryptvol
root@ubuntu # vgscan
root@ubuntu # vgchange -ay
root@ubuntu # lvdisplay
--- Logical volume ---
LV Path /dev/ubuntu-vg/root
LV Name root
VG Name ubuntu-vg
LV UUID Ufdirg-27m0-wSN2-xU8C-VL8m-mAL3-TYE0ba
LV Write Access read/write
LV Creation host, time ubuntu, 2014-08-04 09:58:58 -0500
LV Status available
# open 1
LV Size 225.85 GiB
Current LE 57817
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:1
--- Logical volume ---
LV Path /dev/ubuntu-vg/swap_1
LV Name swap_1
VG Name ubuntu-vg
LV UUID OKvz8H-hhug-FnU6-eskM-X5ar-hjlB-5Whs3m
LV Write Access read/write
LV Creation host, time ubuntu, 2014-08-04 09:58:59 -0500
LV Status available
# open 2
LV Size 11.88 GiB
Current LE 3042
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:2
root@ubuntu # mount /dev/ubuntu-vg/root /mnt
root@ubuntu # mount --bind /dev /mnt/dev
root@ubuntu # mount --bind /dev/pts /mnt/dev/pts
root@ubuntu # mount --bind /proc /mnt/proc
root@ubuntu # mount --bind /sys /mnt/sys
root@ubuntu # mount /dev/sda2 /mnt/boot
root@ubuntu # mount /dev/sda1 /mnt
As you can see from above, I confirmed the partition that contained my lvm
root and swap as /dev/sda3
with blkid
. I followed
this post
to remind myself how to mount everything up get into the chroot appropriately.
The next step was to reinstall packages related to the kernel, initrd, and
grub for booting I did a dpkg install --reinstall
for all of the
linux-image*
and linux-signed-image*
packages as well as grub
,
grub-efi
, initramfs
and initramfs-tools
. I followed this
grub EFI fix post as well.
At this point I had a populated /boot
and the EFI looked good. I knew it
wouldn’t boot yet but I did want to see how far I got. I unmounted everything
in reverse order, rebooted off the hard drive and it turns out, I’d
gotten pretty far. Grub did start but couldn’t find the kernel.
Boot-Repair
So the next thing I did was try Boot-Repair which I had come across in my searches. As the post describes, you can quickly install it directly onto a booted ubuntu-usb and take a crack at it. I figured since boot encryption was in fact built into the ubuntu install that it might work.
I ended up trying a couple of iterations and reboots with my root filesystem
both completely unmounted and mounted within the USB environment. The one
result I did find was that the system actually got a little farther at boot
actually getting to the stage where the initrd
was trying to mount root but
of course it hadn’t done the encryption bits. So now I largely knew that I
had to get a proper kernel boot line in /boot/grub.cfg
and I had to get an
initrd
image built with the decryption capabilities built into it.
My method for finding out the above, of course, was simple google searches
based on what I was learning as I went with search strings such as “grub
encrypted lvm” which led me to posts like
this
which helped me get further down the path. I also took a little side trip to
do an lvm config
restore with
vgcfgrestore
The github post that worked
One of those google searches on “grub.cfg cryptsetup luksOpen” led me to a github gist that got me through.
There were a few important bits in that post that allowed me to construct the
chain. The first was /etc/crypttab
which I verified as correct. Some posts
I read along the way imply that the presence of this file should trigger
update-initramfs
to build the right bits into the initrd but I found that I
also needed /etc/initramfs-tools/conf.d/cryptroot
as well and CRYPTSETUP=y
in /etc/initramfs-tools/initramfs.conf
root@zenarcade:~/Downloads# cat /etc/initramfs-tools/conf.d/cryptroot
CRYPTROOT=target=sda3_crypt,source=/dev/disk/by-uuid/3ca5d400-822c-4e58-ada6-3528c6fcb7bb,lvm=ubuntu-vg
root@zenarcade:~/Downloads# cat /etc/crypttab
sda3_crypt UUID=3ca5d400-822c-4e58-ada6-3528c6fcb7bb none luks,retry=1,lvm=ubuntu-vg
root@zenarcade:~/Downloads# tail /etc/initramfs-tools/initramfs.conf
DEVICE=
##
## NFSROOT: [ auto | HOST:MOUNT ]
##
NFSROOT=auto
CRYPTSETUP=y
I should note at this point that I actually misppelled the variable as
“CRYPTOROOT” instead of “CRYPTROOT” . Booting with a grub command line
that eliminated noquiet
and nosplash
so I could watch the verbose boot up
led me to search and find this
“cryptsetup not found on boot” post
which helped me realize my error. I also used the valuable lsinitramfs
to
inspect the initrd image contents which help me find that /sbin/cryptsetup
was missing from my initrd builds.
Once I got past my spelling error, I had a /etc/crypttab
,
/etc/initramfs-tools/conf.d/cryptroot
, /etc/initramfs-tools/initramfs.conf
and /etc/default/grub
which included
the commandline bits to decrypt on boot. I could run update-intramfs -k all -c
and update-grub
inside my chroot, confirm initrd content with
lsinitramfs
, exit again and reboot….
Success. I was asked for the decryption passphrase in the boot splash screen as usual and booted right up.
Wrap-Up: Lessons learned
Of course I could have gone through a full system recovery from backup but that would likely have been just as lengthy. The process would have been to scrap the system and build a plain ubuntu and then restore over it. Alternately, I could have booted of the USB stick and restored to mounted drives. I likely would have had some additional clean up to do as well.
- Don’t panic. You can fix this
- You will make additional mistakes as you go. See the first bullet.
- Take breaks. step away for a bit. Go do something else to clear your head.
- Backup backup backup. I did use my tarball and I did use lvm configs (which are automatically backed up when you change lvm config)
- Document as you go. This is, of course, how I was able to create this post. You never know when you might have to do something like this again.
- Have boot USB drives handy.
Update 3/16/2016
I had to perform this same process on a different laptop with the same layout,
encrypted root and swap under LVM. I followed the process above after booting
off of an Ubuntu 14.04 USB stick. However, I wasn’t getting a
/dev/ubuntu--vg-root
visible to mount. On the above example I was booting
into an intramfs from the system boot drive. I did a little search and found
this post
which reminded me that I needed to have udev
update the device mapper tree,
via udevadm trigger
. Once I did that the /dev
entry was created.
This kind of low level rescue work is something you don’t do very often, but
here I have done it twice in a few months. The high level steps and layers are
familiar to me and you understand that when the system is booted, it’s all
orchestrated. In this case you’re doing it by hand. And it’s not all that
dissimilar on other systems. I’ve used many rebuilds of device trees in the past
with mknod
on ancient linux systems or cfgadm
and later variants on Solaris.
So the overall patterns are familiar even if the specific steps and syntax
details change.
I also fixed a typo in the example command execution list updating vchange
to
vgchange