Btrfs with Snapper Saves Arch System After IBT Conflict with NVIDIA Driver Kernel Module

April 9, 2023, 9 p.m.

The Btrfs filesystem used in combination with Snapper to allow system rollbacks without an external live system was pioneered by openSUSE almost a decade ago. This configuration creates Btrfs snapshots periodically and when package management transactions, such as updates, are performed. When a rollback is required, it can be effectuated very simply by booting into one of the read-only snapshots and issuing a Snapper command such that on next boot the system is in the same state as the chosen snapshot.

This post describes the rollback experience of an Arch system with the openSUSE Btrfs/Snapper configuration which had been installed using An Arch Linux Installation on a Btrfs Filesystem with Snapper for System Snapshots and Rollbacks. The issue that necessitated the rollback and as well as an update to the rollback process described in the article mentioned is also discussed.

Introduction

A previous article on this site described a process to install Arch with the Btrfs filesystem and Snapper to manage Btrfs snapshots and system rollbacks. The layout of the Btrfs partition was modeled after that used by openSUSE by default. While the subvolume layout is nested to multiple levels and complex, it does allow for the system to be rolled back simply by booting into a read-only snapshot selected from the GRUB menu and issuing a single snapper command. Some Arch users, in YouTube videos, and on the Arch wiki itself, focus on the complexity of the subvolume layout and the setting of the default root subvolume in /etc/fstab and the GRUB configuration as reasons to not configure an Arch Btrfs system in the openSUSE manner, instead documenting a simple single level Btrfs subvolume layout which requires an external live ISO in order to execute a series of Btrfs commands to perform a rollback by replacing the current system subvolume by one of the Btrfs snapshots.

For me, the ease of the rollbacks outweigh the complexity of the subvolume layout which has no practical drawbacks as it is completely transparent in normal use. Also, the issue of setting the default subvolume is handled by simple modifications to /etc/fstab, and the GRUB configuration files /etc/grub.d/10_linux and /etc/grub.d/20_linux_xen during the installation process (addressed in An Arch Linux Installation on a Btrfs Filesystem with Snapper for System Snapshots and Rollbacks).

To aid the discussion of the rollback, the following image from the installation article is reproduced below. The most important characteristics of the configuration are:

  • The initial filesystem creation during installation automatically creates the top level subvolume (Subvolume ID 5, subvolid=5, subvol=/). A subvolume is manually created to contain all other subvolumes (Subvolume ID 256, subvolid=256, subvol=/@). Within this subvolume is another subvolume (Subvolume ID 257) is created to contain all of the snapshots, within which each snapshot is contained inside an individual subvolume. A subvolume is created within Subvolume ID 257 to contain the initial installation snapshot assigned Subvolume ID 258 (/@/.snapshots/1/snapshot). Also within Subvolume ID 256 are other subvolumes (Subvolume IDs 259 - 268 ) for each filesystem hierarchy location that is to be excluded from system snapshot subvolumes.
  • Subvolume ID 258 is set as the default subvolume (or snapshot, since a snapshot is actually just another subvolume ) during installation. In the absence of an explicitly identified subvolume in /etc/fstab, the default subvolume (snapshot) is mounted at / during system startup and includes all other filesystem hierarchy locations not excluded from this subvolume by assigning them to their own subvolumes. This default subvolume remains the system root, even while other snapshots are created. Note that for the default subvolume to be automatically mounted, during installation it was necessary to modify the initially generated /etc/fstab to remove the explicitly specified subvol=/. This was described in An Arch Linux Installation on a Btrfs Filesystem with Snapper for System Snapshots and Rollbacks.
  • When a rollback is performed, Snapper creates two new snapshots, one read-only and one read-write, by duplicating the snapshot chosen as that to which to rollback. The newly created read-write snapshot is then set as the new default subvolume by Snapper, replacing the previous default subvolume. After a reboot the new read-write snapshot is automatically mounted as the system root by fstab.
  • While the subvolume that contains the filesystem hierarchy root is mounted automatically at system startup (two points above), the subvolume where GRUB looks for the kernel during boot is contained in the GRUB configuration which will set the default subvolume as the system root, writing the subvolume path as the value of the root= kernel command-line parameter. For this to be set correctly automatically, it was necessary, during installation -- as described in An Arch Linux Installation on a Btrfs Filesystem with Snapper for System Snapshots and Rollbacks, to modify some GRUB configuration files sourced by grub-mkconfig.
  • Subvolume IDs 259 through 268 are created to exclude filesystem hierarchy locations from being snapshotted as part of the default subvolume which includes the the root of the filesystem hierarchy and everything under except those excluded. The most important of the excluded filesystem hierarchy locations is /boot/grub. Its exclusion makes /boot/grub/grub.cfg persistent across rollbacks. Another important excluded path is /opt. Its exclusion prevents applications installed in this location from being rolled back with the system. This may be desirable or not.

The Snapper Compatible Btrfs Subvolume Layout

The Issue Requiring a Rollback

After a year of using the Arch installation with the Btrfs/Snapper configuration described above and in the installation guide, I had to rollback the system. The immediately apparent issue that necessitated the rollback was that, after an update, Optimus Manager -- which I use on the Arch system to automatically set the graphics mode on the Lenovo Legion 5i Pro, configured to set discrete graphics if the external power supply is plugged in or to set hybrid graphics if running on battery -- would not load the NVIDIA driver kernel module. (See Nvidia Optimus on Linux for a description of NVIDIA Optimus, the NVIDIA driver kernel module, its options, and Optimus Manager) The issue is illustrated in the following image which shows two Konsole windows in which the problem is evident. The window on the bottom shows the output of

sudo optimus-manager --print-mode

which indicates that Optimus Manager was not able to set the graphics mode. In the same window, an attempt to run nvidia-smi results in an output that indicates that the NVIDIA driver is not running. The reason for this is apparent in Optimus Manager's current graphics switching attempt log, shown in the Konsole window on top. At two points in this log are clues to the problem; the first indicates that an Optimus Manager subprocess attempt to load the NVIDIA module with

modprobe nvidia NVreg_UsePageAttributesTable=1 NVreg_DynamicPowerManagement=0x02

fails with a SIGSEGV; at the second point, Optimus Manager reiterates that an attempt to load the NVIDIA module with modprobe resulted in an error.

Optimus Manager Errors Due to the NVIDIA Kernel Module Not Being Able to Be Loaded

After some DuckDuckGo-ing, I discovered the root cause -- a new security feature for Intel processors implemented in the Linux kernel was incorporated into a new version of the Arch LTS kernel (I only use the LTS because it is the default set by GRUB and I never bothered to change it). The image below shows the Reddit post which mentions the issue and the solution.

Reddit Post Mentioning Indirect Branch Tracking Prevents Loading of NVIDIA Driver Kernel Module
The post states that setting the kernel command line parameter ibt=off prevents IBT from preventing the loading of the NVIDIA driver kernel module.

The security feature in question is Indirect Branch Tracking, described in the Phoronix article pictured below, as

Indirect Branch Tracking is part of Intel Control-Flow Enforcement Technology (CET) with Tigerlake CPUs and newer. IBT provides indirect branch protection to defend against JOP/COP attacks by ensuring indirect calls land on an ENDBR instruction.
A Phoronix Article Describing Indirect Branch Tracking and Its Imminent Inclusion in the Linux Kernel (5 November 2022)
The article mentions this security feature's inclusion in the 6.2 Linux kernel, but by March 2023, it was included in the Arch Linux LTS kernel.

The solution provided in the Reddit post was to add the kernel parameter value ibt=off in the kernel command line, a solution that I was able to verify worked on my Arch installation using the LTS kernel.

The Rollback

Two methods of rollback were described in the installation guide article: in the first, the rollback is initiated in the current read-write system, even if it has issues that are the motivation for the rollback, in the second -- the method documented by openSUSE, the rollback is initiated from a read-only snapshot of the system booted by selecting it from the GRUB menu. Both were demonstrated to work in the article, however, for some reason during an attempt to use the first method, there was an issue that prevented the rollback to complete successfully, and in the frenzy of wanting to rollback the system immediately, I decided to just use the second method -- from a read-only snapshot (the SUSE way) -- which worked as it should.

Performing the rollback allowed me to go back to a perfectly working system and determine the reason for the issue. I went through several cycles of rollback and update before I discovered the actual solution to the problem. After the first rollback I thought that -- although unlikely -- the problem with the NVIDIA module not loading may have been related to a recent intervention required on Arch due to the replacement of the base-devel package group by a meta-package. When the issue persisted after updating the reverted system, I performed a second rollback still working on the assumption that the problem was related to the base-devel, this time also reinstalling the base-devel meta-package. When the issue was still present I did more investigating and found the errors reported by Optimus Manager and nvidia-smi. The Optimus Manager error specifically led me to the issue as described in the Reddit post, pictured above, in which the poster stated that the a working solution would be to disable the new IBT feature with the kernel command line parameter ibt=off.

After adding the ibt=off to the GRUB command line in /etc/default/grub and updating the GRUB configuration with grub-mkconfig, I performed a third rollback. The state of the system after the rollback is depicted in the following image in which three Konsole windows are shown.

  • The window at the top right of the image displays the output of
    snapper list
    in which all of the snapshots currently on the system are listed. This command was not executed immediately after the rollback but after subsequent package management transactions were performed following the rollback. But it does show the relevant information such as the two snapshots created for each rollback -- snapshots 814 and 815 for the first rollback (snapshot number is the leftmost column), snapshots 817 and 818 for the second rollback, and snapshots 832 and 833 for the third rollback. Note that the asterisk next to snapshot number 833 indicates that this snapshot is the currently mounted subvolume and that it is the one that will be mounted at the next boot. (See man snapper and An Arch Linux Installation on a Btrfs Filesystem with Snapper for System Snapshots and Rollbacks).
  • The leftmost window shows the output of
    btrfs subvolume list /
    in which the all of the Btrfs subvolumes on the partition within the top level subvolume (Subvolume ID 5). The output lists each subvolume by subvolume ID and also indicates the parent subvolume by ID and the subvolume's path. Snapshot 833, the currently mounted subvolume at / is near the bottom of the output as
    ID 1102 gen 38026 top level 257 path @/.snapshots/833/snapshot
  • The third window shows the current GRUB configuration, /boot/grub/grub.cfg as generated by grub-mkconfig. Notable, in this file is the kernel command line in which the path of the kernel is specified as
    /@/.snapshots/833/snapshot/boot/vmlinuz-linux-lts
    and the mount option
    rootflags=subvol=@/.snapshots/833/snapshot
    The initial ram disk paths also reflect the current snapshot path. The partition in which the installation is contained is specified as with more conventional filesystems with the partition UUID (root=UUID= ... ).

1) Snapper Listing of Snapshots 2) Btrfs Subvolumes and 3) GRUB Configuration After the Third Rollback

The actual process is summarized below.

  1. Reboot. In the GRUB menu select the item Arch Linux Snapshots. An example of the GRUB menu item selection sequence is shown in the following set of images from an earlier installation of Arch with Btrfs/Snapper on the Dell G5.
    The Additional GRUB Menu Items Generated by grub-btrfs (and snap-pac-btrfs when Snapshots Are Created as a Result of pacman Transactions).
  2. In the resulting list of available snapshots, select the latest snapshot known to be of a good system state.
  3. After logging in to the graphical environment, open a terminal window and execute the snapper rollback command. The command used to create the third rollback was
    sudo snapper -c root -v rollback -d "Third rollback due to instability after Arch packaging changes to 'base-devel' and more likely due to NVIDIA change <...>"
    In this command the global option -c specifies the name of the snapper configuration, in this case the default configuration named root is specified. The -v option requests verbose output. And the -d option specifies a comment or description of the rollback transaction, which is displayed in the output of snapper list (as shown in the image earlier in the article depicting the state of the system after the rollback).

Conclusion

One of the advanced capabilities of a Linux system with Btrfs is the filesystem's support of system snapshots due to its advanced Copy-on-Write. When this is paired with Snapper and SUSE/openSUSE's Btrfs subvolume layout, rollbacks of the system are easily performed with a single snapper rollback command.

References