Debugging OpenSUSE emergency mode - superblock error with Btrfs

Thomas

I have been using OpenSUSE Tumbleweed as my only OS with the Btrfs file system for a few months now, coming from Ubuntu. While the system generally feels more performant, I sporadically experienced random freezes without any correlation to specific actions. It could happen while working in the IDE, browsing, or writing. Annoying, but manageable. Until yesterday.

The screen froze once again, but this time the system only booted into emergency mode, something I thankfully had no prior experience with.

After consulting journalctl, I found a hint that the /sysroot partition could no longer be mounted. I now understood why my system would not start, but the underlying cause remained unclear.

What I tried to solve the problem #

  1. First, I naively thought of snapper. Restarted, selected the last “functional” snapshot, and oops - back in emergency mode. No luck there.

  2. Then I plugged in my installation USB (not a live medium). Through an options menu, I was somehow able to boot into an obviously older snapshot of my system. Interestingly, running sudo snapper list showed that this booted system only knew snapshots from 3 months ago, while my normal boot menu listed the newer ones.

  3. I used this wobbly system to create a Tumbleweed Live image on another USB stick for further analysis.

System partitions #

My system has the following partitions:

  • nvme0n1p1 - EFI System Partition
  • nvme0n1p3 - BTRFS system partition with /
  • nvme0n1p4 - BTRFS partition that apparently only contains older snapshots
  • nvme0n1p5 - Swap

I was able to determine that partition nvme0n1p3, which contains my file system, could not be mounted with the message can't read superblock. Ouch, that sounded painful.

The debugging process #

Using the live system, I first tried to check if my partition was marked as damaged:

sudo parted -l

Then I installed btrfs-progs and attempted:

sudo btrfs rescue super-recover -v /dev/nvme0n1p3

But I got the response: All supers are valid. No need to recover

Okay, what now? The breakthrough came from:

sudo dmesg

This revealed that my log tree was faulty. What exactly that means - well, not entirely sure.

Finally, I executed:

sudo btrfs rescue zero-log /dev/nvme0n1p3

After that, I was thankfully able to mount my system partition again and boot normally.

Conclusion #

In the end it appeared to be a kernel bug related to Btrfs log-tree replay. It basically fixed itself after updating the Kernel. It has been a fun evening in front of my laptop anyway ;)

References #