Building secure images with NixOS
Image-based Linux distributions have seen increasing popularity, recently. They promise reliability and security, but pose packaging problems for existing distributions. Ryan Lahfa and Niklas Sturm spoke about the work that NixOS has done to enable an image-based workflow at this year's All Systems Go! conference in Berlin. Unfortunately, LWN was not able to cover the conference for scheduling reasons, but the videos of the event are available for anyone interested in watching the talks. Lahfa and Sturm explained that it is currently possible to create a NixOS system that cryptographically verifies the kernel, initrd, and Nix store on boot — although doing so still has some rough edges. Making an image-based NixOS installation is similarly possible.
Lahfa started by giving a brief overview of NixOS for those attendees who were
unfamiliar with it. He described the distribution as a "standard
systemd-based Linux
", but with some differences mostly centered around the
fact that it does not follow the
filesystem hierarchy standard. In NixOS, all of the binaries on
the system live in /nix/store, and are configured to use a path and
library path that are tightly scoped to only their declared dependencies. This
has a lot of benefits, Lahfa said, including NixOS's ability to run multiple
versions of the same software. But it also has consequences for
secure boot.
Lahfa explained that secure boot "controls who is allowed to run software on
your computer
". It relies on using signed binaries; the computer will only boot
into the provided kernel if the signature on it is valid. On systemd systems, it
is possible to use
unified kernel images (UKIs), which package a unified extensible firmware
interface (UEFI) boot stub, the kernel, and
its initrd together. This has security benefits, because it means that
secure boot validates the initrd as well as the kernel. But it causes problems
for NixOS, which needs to present many more options in the bootloader than most
other distributions in order to support its efficient rollback features.
NixOS's separation of binaries into individual paths under /nix/store — and ability to share libraries between different versions — allows the distribution to keep a large number of previous configurations around. Every time a NixOS system has its configuration changed, from a software update, for example, the complete state of the installed programs is saved as a "generation". In the bootloader, the user can select any previous generation they would like (at least until the old generations are cleaned up to reclaim their storage space), and the kernel will load the appropriate initrd for that generation, which in turn sets up all of the configuration files from that generation. This allows for fearless upgrades, since the previous configuration is available in the boot menu — a value proposition quite similar to image-based distributions. Unfortunately, this ability doesn't work well if the initrd needs to be bundled with the kernel, because that increases both the size of each kernel image, and the number of different kernel images that must be stored. Doing so will quickly fill up the EFI (Extensible Firmware Interface) system partition (ESP).
So, to work around this, Lahfa, Sturm, and other contributors wrote
Lanzaboote, a NixOS-aware
reimplementation of
systemd-stub, the UEFI secure-boot stub included in the systemd project.
Lanzaboote is signed, for secure boot, and
then separately verifies the hashes of the kernel and initrd that it loads. Now,
generations can share kernels and initrds when possible, making it possible to
have more generations available. Unfortunately, Lanzaboote is hard to upstream.
The systemd project suggested using systemd-boot addons instead — binaries
that are verified by secure boot but not directly run. If initrds and kernels
could be made into addons, then Lanzaboote's functionality may not be needed,
and NixOS could just use systemd's
loader.conf to mix-and-match initrds and
kernels.
In the future, efforts to "denormalize
" UKIs into their component parts
like this
without losing their attendant security benefits could potentially benefit
image-based distributions as well, by enabling more sharing between images.
Building images
For the second half of the talk, Sturm spoke about how NixOS could actually be
used as an image-based distribution. He started by describing why — when NixOS
already supports atomic rollbacks — users might want to use an image-based
model. Ultimately, it comes down to security, he said. An image-based workflow
enables more security features — "things we really want, but cannot offer by
default to everyone
". The core benefit is cryptographically enforced
immutability, which makes a computer more tamper-resistant.
Sturm then explained how to use NixOS to build images. Nix (NixOS's package manager and build system) can produce ISO files directly as an output from the build process, but he instead suggested an approach based on changing how NixOS stores generations. Normally, generations are stored as files inside /nix/store. Any data that is shared between generations (such as a program that was not updated) can be referenced by multiple generations. To produce images based on NixOS, he suggested having a separate partition for each generation. This loses NixOS's ability to share data between generations, but it also brings the distribution closer to what existing image-based distributions do, meaning that existing tooling can be reused. Sturm specifically called out systemd-repart, ukify, and systemd-sysupdate as existing tools that fit naturally into this approach.
Not every tool can be directly adapted, though. Sturm said that
mkosi — systemd's OS image builder — would "break the developer flow
" on NixOS, and that it was
better to build everything directly with Nix. Eventually, the folks working on
image-based NixOS may want mkosi
support to enable more comprehensive testing by the upstream systemd
project, but for now it is not a goal.
Sturm then gave an example of the partition layout that a computer using this approach might use: two separate sets of partitions for generations, so that the one not in use can be updated, each protected by dm-verity. Each set of partitions has a partition for the dm-verity metadata, and one for the store itself. The second set of store partition can be created upon first boot, so a system using this approach can be installed just by copying the EFI system partition (ESP) and first store partition into place. Finally, a persistant partition for user data takes up the rest of the disk.
In a computer set up like this, the Nix store could be mounted using
nix-store-veritysetup-generator, or it could be mounted under /usr
with the normal
systemd-veritysetup-generator and bind-mounted into place. In
either case, the Nix store would be protected by dm-verity. None of this work is
enabled by default on NixOS, Sturm cautioned. A user needs to opt-in to get it.
But it "lets you kind of build your own distro
" in that you can
potentially use Nix to produce system images that don't rely on any NixOS tools at
run time with this method.
One audience member asked whether it was possible to have the Nix store both
verified by dm-verity and encrypted. Sturm said that it was "not supported by
repart but doable
". Specifically, the user would need to create an encrypted
block device and then layer dm-verity over the top. Another audience member asked
whether users would really have to give up on sharing data between generations
to use this approach — wouldn't it be possible to have a shared baseline that
multiple generations could reference? Sturm agreed that would be possible as
well, but not while simultaneously using dm-verity. The tradeoff between
efficiency and security is something that each user will need to consider when
deciding which one is right for them.
NixOS breaks a lot of assumptions about Linux systems, starting with the filesystem hierarchy standard and going from there. Despite that, a lot of the same tools that are being used with other image-based distributions can be used to create an image-based NixOS installation. Users who want the benefits of an image-based distribution at the same time as NixOS's unique advantages may have to do a bit of tweaking, since support is still a work in progress, but should nonetheless find that combining the two is possible.
Posted Nov 6, 2024 17:14 UTC (Wed)
by Karellen (subscriber, #67644)
[Link] (8 responses)
Sturm then gave an example of the partition layout that a computer using this approach might use: two separate sets of partitions for generations, so that the one not in use can be updated [...] Finally, a persistant partition for user data takes up the rest of the disk. With such a system, how does determine partition sizes that are a) big enough to be future-proof for whatever system software you might want to install later, while b) not taking up unneeded space that might be later wanted for user data? I've tried a bunch of different partition strategies over the years, including separate /usr, /home, /boot, /srv and /var in various combinations (although not all at once, I think). Given the difficulties of resizing and repartitioning disks and filesystems while some of them are in use, and running out of space in various partitions I'd not given enough room for despite having plenty of room on the disk overall, I've gone with a unified encrypted root with only a separate /boot partition for probably about a decade now, and not had to worry about it again. Is the solution just "have an order of magnitude more disk space than you're ever going to use"? Or what?
Posted Nov 6, 2024 17:39 UTC (Wed)
by RaitoBezarius (subscriber, #106052)
[Link]
> With such a system, how does determine partition sizes that are a) big enough to be future-proof for whatever system software you might want to install later, while b) not taking up unneeded space that might be later wanted for user data?
Usually, there's well-known sizes which are parameters of your deployment:
- ESP, fixed size (1GB or so)
(add the extra salt for A/B schema)
If that's not enough, I kind of asked this question during the image based summit, see: https://lwn.net/Articles/994704/ under "ESP resizing", the suggested solution is dm-linear.
So theoretically, if you overcome your 200GB budget, and you really need to do fancy things, I suppose a dm-linear based solution can do smart things if you can safely downsize the user partition.
Nonetheless, for an image-based system: `/srv`, `/home` doesn't really exist, I would map everything to `/var`, the only mutable partition in the setup. And bind mount `/home` to `/var/home`, etc. So it may look like your unified encrypted root setup, except that the Nix store doesn't require confidentiality as long you don't put secrets in your code or configuration and use systemd-credentials (or similar concepts).
Posted Nov 6, 2024 18:00 UTC (Wed)
by geert (subscriber, #98403)
[Link] (1 responses)
Same here. But I tend to run into /boot becoming too small after a while :-(
Posted Nov 7, 2024 10:08 UTC (Thu)
by k3ninho (subscriber, #50375)
[Link]
> Same here. But I tend to run into /boot becoming too small after a while :-(
Same same. I'm greedy about the firmwares that go into the initrd, so that's on me.
K3n.
Posted Nov 6, 2024 18:21 UTC (Wed)
by ballombe (subscriber, #9523)
[Link] (1 responses)
Posted Nov 6, 2024 18:38 UTC (Wed)
by arsen (subscriber, #161285)
[Link]
Posted Nov 6, 2024 18:54 UTC (Wed)
by walters (subscriber, #7396)
[Link] (2 responses)
We're working on https://github.com/containers/composefs/ which doesn't have this issue; its tagline is "The reliability of disk images, the flexibility of files".
Posted Nov 7, 2024 6:40 UTC (Thu)
by bof (subscriber, #110741)
[Link] (1 responses)
I'm confused though about the separate content-addressed store. That gets populated only from mkcomposefs runs? And the EROFS images can only work if the content store in place, happens to contain all the content that was there when the image was created in the first place?
Won't that mean that any newer image is going to break if the content store ever is restored from older backup? And how would garbage collection in the content store work?
I feel like I'm missing something there.
Posted Nov 7, 2024 18:28 UTC (Thu)
by walters (subscriber, #7396)
[Link]
Not necessarily.
Basically the composefs git repository is a low level unopinionated generic tool. Yes, `mkcomposefs` can write files into the object store, but so can any other tool and in practice we expect sophisticated higher level software to do exactly that.
> And the EROFS images can only work if the content store in place, happens to contain all the content that was there when the image was created in the first place?
Well yes, you are responsible for ensuring the object store is populated. But for example you could totally have higher level software, before invoking "mount.composefs" quickly check that the expected objects are there, and if not fetch them from a network object store on-demand. There's CLI tools and a C library that lets you read and write the composefs EROFS for doing things like this.
> And how would garbage collection in the content store work?
That is again up to higher level tooling; I'll try to fill out the docs more but the simple baseline is:
- You maintain a set of "GC roots" (images) that are composefs blobs
We are working on a higher level Rust project that will be a much more opinionated interface to things like this, including direct integration with e.g. OCI.
But there are already others who have integrated composefs, e.g. there was a lightning talk at All Systems Go from https://rauc.io/ about using composefs.
Posted Nov 7, 2024 16:00 UTC (Thu)
by Lennie (subscriber, #49641)
[Link]
How do image-based systems figure out what partition sizes they need?
How do image-based systems figure out what partition sizes they need?
- Nix store (so the code, the config, etc.), we put some reasonable limits here, let's say 200GB
- Verity partition for Nix store, answer: https://github.com/systemd/systemd/pull/34636 (another contributor of this whole project made this PR)
- User data, minimized size for initial startup and then automatic grow
How do image-based systems figure out what partition sizes they need?
How do image-based systems figure out what partition sizes they need?
How do image-based systems figure out what partition sizes they need?
LVM allow you to extend mounted partitions without any hassle.
How do image-based systems figure out what partition sizes they need?
How do image-based systems figure out what partition sizes they need?
How do image-based systems figure out what partition sizes they need?
How do image-based systems figure out what partition sizes they need?
- Iterate over those composefs blobs, parse them to find which objects they reference
- Iterate over the object store and remove unreferenced objects
Overlayfs ?