A report from the 2022 Image-Based Linux Summit
One of the motivations for the summit was the simple fact that much of the wider ecosystem has been thinking about the same set of problems. For example, our employer, Microsoft, has made use of a lot of the concepts covered by the summit in the recently announced ARM64-based Azure offload SoC, which is running a custom, security-hardened Linux distribution. While we were thinking, tinkering, and writing about new ways to improve the current state of the art, it became obvious to us that many vendors are working, more or less, in the same space, doing similar work with varying degrees of overlap. However, little to no collaboration was happening. The summit was meant to identify and agree on common concepts and come up with a set of initial specifications. Some of them already have reference implementations.
So we invited technical representatives from the engineering groups of various vendors and distributions that have been known to work on related topics. The summit was intentionally kept small, as it was meant to be a series of conversations and brainstorming sessions, with no fixed agenda or presentations — a BoF-style event. The 30 participants met in the Microsoft office in Berlin and discussed a range of topics from a list that the authors and participants had put together in advance. The topics covered were focused around the idea of shipping Linux via images and with enhanced security features.
Participants were affiliated with numerous companies or projects, including Canonical, Ubuntu Core, Debian, Gnome OS, Fedora CoreOS, Red Hat, Endless OS, Arch Linux, openSUSE, Flatcar, Microsoft, Amazon/AWS, Meta, System Transparency, systemd, image-builder/osbuild, mkosi, and rpm-ostree. As a result of this summit, the Linux Userspace API ("UAPI") Group was founded; it is a community for people with an interest in innovating how we build, deploy, and run modern Linux operating systems. It serves as a central gathering place for specs, documentation, and ideas. The associated uapi-group.org website contains the meeting minutes from the summit and links to current specifications and ideas.
Over the next couple of months the group hopes to create various specifications in this repository in the form of technical deep dives. The first one, centered around Unified Kernel Images (UKI), a concept introduced below, is already available on Lennart Poettering's blog.Building Blocks for Images
Several concepts (and, of course, acronyms) that are supported by the systemd project were at the center of many discussions:
- A UKI (Unified Kernel Image) is a secure-boot-signed UEFI executable file that wraps a kernel image, an initrd image, a kernel command line, and more. A UKI can boot on EFI and integrates nicely with systemd-boot.
- Discoverable Disk Images (DDIs) are self-described filesystem images, heavily inspired by Canonical's Snaps, that have been enhanced to follow the Discoverable Partitions Specification (DPS). DDIs are wrapped in a GPT partition table that may contain root (or /usr/) filesystems, system extensions, system configurations, portable services, containers, and more, all of which are protected by dm-verity and combined into one image.
- Credentials pass secure bundles of data across components (hypervisor, firmware, system manager, container manager) and into system services.
- A system extension (sysext) is a DDI that can be overlaid on top of /usr/ (or /opt/) in a secure and atomic manner using read-only OverlayFS. A sysext can be used to extend a base filesystem to, for example, allow modular but pre-built initrds.
- A system configuration (syscfg) is a DDI that can be overlaid on top of /etc/ and provide a way to securely extend the configuration files of a system. This idea is still being developed, and will support the initrd, rootfs, system services, portable services, or nspawn images.
These concepts have all been supported by systemd and its collection of tools for some time, with the exception of syscfg, which is being developed now. They all support signed dm-verity for online and offline, kernel-enforced, integrity protection.
Usage of sysext is planned to be proposed for Fedora 39's initrd logic; it should close a major gap in the security story of Linux. So far, initrds have always been built locally and used in an unprotected manner, neither signed nor measured; thus they are lacking any verification and are at the mercy of any attacker who gains (online or offline) write access to the local disk. By switching to UKIs, we can provide a base, shared, vendor-built initrd that provides common components and a series of sysext DDIs that are added when needed, providing support for less commonly needed hardware or storage subsystems such as iSCSI or NFS. A specification for UKIs is available. For a deeper look, readers should refer to the aforementioned blog post.
Configuration, building, and deployment
We discussed how to handle local configuration at length. Various distributions, such as openSUSE, are pushing developers to use libeconf, which follows the same configuration scheme used by systemd. With this mechanism, the vendor's default configuration resides in /usr/, the ephemeral override is in /run/, and the persistent override is in /etc/. Others, like those based on OSTree, ship defaults in /usr/etc/ and then copy them over to /etc/ on instantiation. At Microsoft, we are working on a different approach in the form of syscfg, which is similar to sysext, but for configuration. The same principles are followed: configuration overlays are protected with dm-verity, signed, stackable, and applicable to the whole system or individual services/containers.
A large variety of options came up when discussing how to build images. Every single distribution has its own build system (as expected). The systemd project provides the systemd-repart tool, which understands DDIs and can run on each system (for initial partitioning or provisioning, or for factory reset), so there is some hope that it will be adopted for other uses too. The mkosi image-building tool is gaining some traction as it supports building layered sysexts and UKIs. But, by and large, every vendor is on its own with its own custom solution, and this situation is unlikely to change. One area where we hope to standardize is the production of a software bill of materials (SBOM), with some tentative agreement among the participants that the SPDX standard is the way to go. Some distributions, including SUSE and Flatcar, already go beyond this and provide full SLSA provenance.
After building an image, the next obvious step is getting it onto a system. Again, each vendor has its own methods here. Systemd recently introduced systemd-sysupdate as the run-time tool to pull down images from a configurable source. It is likely that the server side will remain unique to each vendor, but the hope is that, at least, the local client could be shared. Systemd-sysupdate can integrate nicely with systemd-boot and the rest of the ecosystem. Also discussed was resurrecting casync and integrating it with systemd-sysupdate to replace the use of curl seen in systemd-sysupdate now.
Updates and rollback
On the topic of upgrades, participants briefly discussed how to minimize the associated disruption. This is of great interest to Amazon and Microsoft, as image upgrades cause service interruptions. There are two competing approaches used in production. One is to use CRIU, which works best for single-process containers. The other is to use persistent memory and teach individual services how to make use of it, allowing restarting from fast memory or, even, keeping state intact across a kexec reboot. A standard user-space solution is desirable for the latter but is currently missing; some work is ongoing in that direction by Microsoft, but it is still the early days. A suggestion was made to implement an "exitrd" in systemd that would be similar to a kexec, allowing the system to skip the hardware/firmware stage of a reboot and saving some time when only the operating-system DDI is updated and the kernel doesn't change. In this case, the system could simply shut down user space to an "exitrd", swap the mount point of the DDI for the new one, and start user space again. This would allow skipping the kernel phases of shutdown and booting.
Almost all of the participants have implemented a form of operating-system rollback or factory-reset mechanism. This is highly desirable when distributing an image-based Linux system; one of the advantages is being able to roll back to a known-working state. There are important differences among the distributions though. Those using Btrfs (SUSE) rely on snapshots, while Ubuntu Core provides a more traditional recovery system that is stored in the EFI system partition.
But factory reset can mean two different things: either to restore the operating system to its original version, or to restore the entire system to the factory state, wiping all local data. Tools provided by systemd have long supported the latter, and will be enhanced to let the factory reset be triggered by a UEFI variable and/or a special UUID being set on the OS partition. A new target unit will be introduced as a synchronization point, so that any custom action that needs to happen on factory reset can be pulled in automatically. Image builders can then choose to provide a boot menu entry to let users trigger this functionality.
Other vendors, including Android, are using the well-known A/B pattern for rollbacks and updates. One of the next action items is to implement secure rollback protection in systemd components such as systemd-cryptsetup using TPM counters, as this is more scalable than relying on denylists that eventually become too big to be manageable. The response to the BootHole vulnerability alone, for example, used about one-third of the revocation space available on UEFI systems; see this page for details.
One of the areas where there was immediate agreement was boot assessment; when an image-based system uses an A/B scheme, a way to signal when a boot is "good" or "bad" is needed. The "boot-complete target" will fulfill this purpose. It is already available, and it sounded like more distributions will start using it; services doing any kind of local assessment will be able to use it as the synchronization point. Systemd will integrate this mechanism with a timer, so that if a good state is not reached within the configured time limit, a rollback and reboot will be triggered. Additionally, update success or failure can be reported when using advanced update protocols such as Omaha, for instance via the open-source Nebraska server.
Security
Enabling TPM-based security by default was also the subject of a lot of discussions. Right now, such security on generic Linux is pretty much opt-in. By switching to UKIs with embedded and signed PCR policies, we can make the TPM measurements predictable. This means that automatically enabling TPM-backed disk encryption by default becomes possible because there will no longer be a need to re-seal secrets every time a component changes. A document on signed TPM PCR policies will be available soon. It was agreed to set up a "registry" for PCRs, with the intention of helping developers and vendors avoid conflicts and overlaps.
There is a lot of interest in remote attestation in the context of confidential computing, and there are various solutions and competing standards. On a local node, some solutions rely on the IMA log, some on the TPM log, and some on completely custom registries. There is a lot of ongoing change at this stage and, while the work on image-based systems is largely a prerequisite for remote attestation, no clear action item or feature request came out of this discussion.
Conclusions and future work
The projects represented at the summit can be divided in three camps: image-based deployments, ostree-based deployments, and Btrfs-based deployments. So while it is natural that there was no complete overlap covering all participant projects, there was consensus that, on a significant number of topics, collaboration and standardization are indeed possible.
Finally, given that participants felt that the summit was productive and useful, it was agreed to meet again in the future, perhaps in co-location with another conference to facilitate travel. While there is no perfect agreement between all vendors on all topics, there are enough similarities that we came out of this with a series of action items and a plan. The UAPI Group, with all participants as members, will let us collect and collaborate on documentation and specifications. The initial specifications for DDIs, DPS, and a PCR registry are already in place and published on the UAPI website, with more to come.
Index entries for this article | |
---|---|
GuestArticles | Boccassi, Luca |
Posted Nov 3, 2022 20:18 UTC (Thu)
by walters (subscriber, #7396)
[Link] (7 responses)
Thanks so much for starting this and writing it up! Looking forward to collaborating with others in this space. And hopefully I'll be able to make the next one of these in person! Some comments: Finally, one thing I'd really hoped that could be discussed, but I'm not seeing here is a common push for filesystem compatibility changes across both distribution and 3rd party software. Specifically around things like installing files to /opt and best practices around building image systems and where state is stored - look at all the stuff in e.g. https://documentation.suse.com/sles/15-SP1/html/SLES-all/cha-transactional-updates.html and e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1900691
Posted Nov 3, 2022 21:26 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (3 responses)
The key differentiator is support for full integrity protection at the block layer via dm-verity. Locally mutable systems like os-tree cannot provide that by definition. In many settings this is not a minor thing, but a fundamental design goal if not table stakes.
> Finally, one thing I'd really hoped that could be discussed, but I'm not seeing here is a common push for filesystem compatibility changes across both distribution and 3rd party software. Specifically around things like installing files to /opt and best practices around building image systems and where state is stored - look at all the stuff in e.g. https://documentation.suse.com/sles/15-SP1/html/SLES-all/... and e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1900691
The specs repo is on Github and contributions are welcome. There is already a fork of the XDG-Base spec for handling of configuration files (that I'm trying to get merged upstream). In general we do not want to deviate excessively from the FHS, but enhancements are good.
Posted Nov 3, 2022 21:56 UTC (Thu)
by walters (subscriber, #7396)
[Link] (2 responses)
IMO, the definition of "locally mutable" gets fuzzy here when one adds things like 3rd party sysexts into the mix. There's also the opposite case of using IMA or fs-verity underneath file-based systems (like ostree, but not exclusively - the old OLPC model of "let's use rsync to hardlinked filesystem trees" was another variant). Sure, it's not as strong as dm-verity, but it absolutely will (particularly in combination with other mechanisms like dm-crypt and other LSMs) stop many attacks. My classic example is the runc exploit. https://lwn.net/Articles/842164/
But I think we're in agreement here to say that the cases are "dm-verity", "file based", and "btrfs snapshot" right? The original phrasing of excluding ostree from "image based" seems contradictory from having us under a shared discussion group. Not to mention, while rauc supports dm-verity it also seems to support non-dm-verity (e.g. plain squashfs) and I hope you'd agree that it's still an image based update system, it just doesn't have the dm-verity properties (in that setup) either.
Posted Nov 3, 2022 23:35 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (1 responses)
Of course this does not mean that other systems like ostree and btrfs are bad or that they should not be used, it simply means that they sit in a different place on the wide spectrum of image-based Linux, with significant enough differences to be meaningfully distinguishable.
Posted Nov 4, 2022 13:42 UTC (Fri)
by walters (subscriber, #7396)
[Link]
(Yes, I know this - and I'm pretty sure you know I know too ;) But, I know we're also conversing with an audience so some repetition is needed unfortunately)
> it simply means that they sit in a different place on the wide spectrum of image-based Linux, with significant enough differences to be meaningfully distinguishable.
Yes! We're in agreement. There's lots of nuances between different points on that spectrum. And a lot of space to share knowledge and approaches across them.
Security is also (as you know) not a binary thing. Personally, I think one of the biggest problems in our industry in general is people *not updating the operating system at all*. Switching to "the operating system auto-updates by default" was one of the big changes from the original Container Linux - and we've carried that forward in both Flatcar and Fedora CoreOS. It's actually a pretty profound difference in practice - I blogged about this but basically I think there's a strong sense of "responsibility" for updates that shifts from "I typed apt|dnf|whatever update and it broke, so it's my fault" to "the system just fell over, it's $OS's fault".
On this topic, I think "unlocked" image based systems (like pulling btrfs snapshots, snap and ostree, etc.) having the properties of e.g. transactionality and offline updates (not disrupting the running system) provide a very meaningful improvement to security from *this* aspect. Not to mention the "we tested the update image server side and you are bit for bit reproducing it".
Another way to say it is, these properties are also very meaningfully distinguishable from traditional package based systems.
What I'm arguing at the core is: s/image based/dm-verity based/ in the original sentence. I also would tend to use the term "locked" or "sealed" when talking about this because honestly "image" means too many things already.
Posted Nov 4, 2022 10:23 UTC (Fri)
by pothos (subscriber, #116075)
[Link] (1 responses)
Posted Nov 4, 2022 10:26 UTC (Fri)
by bluca (subscriber, #118303)
[Link]
Posted Nov 5, 2022 2:06 UTC (Sat)
by champtar (subscriber, #128673)
[Link]
I would love to see such doc, so I have something to send to vendors so they might improve their RPMs (even if I don't have high hopes).
Posted Nov 4, 2022 1:26 UTC (Fri)
by xecycle (subscriber, #140261)
[Link] (4 responses)
Posted Nov 4, 2022 1:29 UTC (Fri)
by jake (editor, #205)
[Link] (3 responses)
heh, that's an unfortunate typo, isn't it? :)
fixed now ...
jake
Posted Nov 4, 2022 9:53 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (2 responses)
Posted Nov 4, 2022 6:06 UTC (Fri)
by NHO (guest, #104320)
[Link] (5 responses)
Posted Nov 4, 2022 6:50 UTC (Fri)
by zdzichu (subscriber, #17118)
[Link]
Posted Nov 4, 2022 7:51 UTC (Fri)
by smurf (subscriber, #17840)
[Link]
Not to mention your encrypted backups.
Posted Nov 4, 2022 9:52 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (2 responses)
There is no such "legal problem", this scare story has been around for 20 years since UEFI first arrived, and guess what, it never happened, because it does not make any sense. The UEFI spec mandates that the machine owner, with verified physical presence at the keyboard, can swap the keys.
Posted Nov 4, 2022 11:34 UTC (Fri)
by aragilar (subscriber, #122569)
[Link] (1 responses)
Posted Nov 4, 2022 11:42 UTC (Fri)
by bluca (subscriber, #118303)
[Link]
Posted Nov 4, 2022 15:57 UTC (Fri)
by aszs (subscriber, #50252)
[Link]
Imagine a certificate authority like Let's Encrypt but the challenge it requires for signing the request isn't only to prove the server answers to the cert's domain but also a remote attestation that the server is running the reproducible workload whose digest is in the cert. Then as a user I can have some assurance that the server I'm connecting to is running the code I'm expecting it to just by checking that digest.
Sounds like we're getting closer but how far away is the state-of-the-art from having the necessary building blocks to enabling something like this? What is missing?
Posted Nov 4, 2022 16:38 UTC (Fri)
by ale2018 (guest, #128727)
[Link] (1 responses)
Posted Nov 4, 2022 18:17 UTC (Fri)
by bluca (subscriber, #118303)
[Link]
Posted Nov 7, 2022 6:39 UTC (Mon)
by hsiangkao (guest, #123981)
[Link]
Posted Nov 9, 2022 17:39 UTC (Wed)
by calumapplepie (guest, #143655)
[Link] (2 responses)
What does "BoF" stand for in this context?
Posted Nov 9, 2022 17:54 UTC (Wed)
by Wol (subscriber, #4433)
[Link]
Cheers,
Posted Nov 9, 2022 17:56 UTC (Wed)
by amacater (subscriber, #790)
[Link]
See, for example Debconf which has large plenary sessions, individual talks and then semi-formal BoFs to get people together on a small topic.
Posted Nov 10, 2022 4:55 UTC (Thu)
by pabs (subscriber, #43278)
[Link] (2 responses)
Posted Nov 10, 2022 5:28 UTC (Thu)
by mjg59 (subscriber, #23239)
[Link]
Posted Nov 10, 2022 14:41 UTC (Thu)
by bluca (subscriber, #118303)
[Link]
Posted Nov 10, 2022 14:05 UTC (Thu)
by smitty_one_each (subscriber, #28989)
[Link]
> ... our employer, Microsoft ...
I remember seeing neither the TPS Report, nor hearing the conspiracy theory covering this event.
Nonetheless: cheers. I think your work has been a big win for the world in general, Lennart.
Posted Nov 11, 2022 8:50 UTC (Fri)
by pabs (subscriber, #43278)
[Link] (1 responses)
Posted Nov 11, 2022 15:00 UTC (Fri)
by bluca (subscriber, #118303)
[Link]
A report from the 2022 Image-Based Linux Summit
Others, like those based on OSTree, ship defaults in /usr/etc/ and then copy them over to /etc/ on instantiation.
No, actually the configuration is merged across updates too. It means you get new default config files. To me, this is a key aspect that keeps ostree based systems feeling like a Unix system. First, running vi /etc/fstab will continue to Just Work. If you only instantiate /etc on firstboot/install time, then your system has hysteresis - its state depends on the installation version. Such a model I think can work at a small scale for custom, targeted operating systems - but for general purpose scale, people who maintain software that installs files in /etc are just going to say "it's your problem" if it doesn't work for them to add new files in /etc over time.
So far, initrds have always been built locally
I think you mean for "package native" systems - several other image based update systems (including rpm-ostree and I'm pretty sure rauc have been doing it for a long time. You are right about the integrity aspect though.
The projects represented at the summit can be divided in three camps: image-based deployments, ostree-based deployments, and Btrfs-based deployments.
Hmm, I still think of ostree as an image system. I'd tend to say "block device image" perhaps versus "file based"? I would imagine that the people doing btrfs snapshots also think of them as image systems? In the end, the filesystem tree you're booting into is exactly reproducing the server, plus support for offline updates. To me those are two key differentiators versus the opposite end of the spectrum - traditional package based systems.
I can maybe see about trying to write some sort of docs for this there also in the uapi group? I think this is relevant for e.g. systemd-sysext too. Ah, though interesting I see sysext merges /opt too. In the ostree model that's /var/opt to be always part of machine-local state.
A report from the 2022 Image-Based Linux Summit
I can maybe see about trying to write some sort of docs for this there also in the uapi group? I think this is relevant for e.g. systemd-sysext too. Ah, though interesting I see sysext merges /opt too. In the ostree model that's /var/opt to be always part of machine-local state.
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
When adding the IPE LSM on top (https://microsoft.github.io/ipe/) this allows a pretty neat and hardened system with full code integrity (well, it's at least getting there: still working on enlightening interpreters for scripts).
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
The outcome of this discussion was that we want to push applications to install their default configs to /usr instead and allow drop in files under /etc, which is written down in https://uapi-group.org/specifications/specs/base_directory_specification/
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
What happens if motherboard in my laptop died from coffee-related accident and I need to extract data from my SSD that was automatically encrypted without asking me on installation, with keys stored in dead motherboard?
Proposal also does nothing to even address the fairly important legal problem: what if vendor decides that I am also third party, not permitted to meddle with software on my personal computer, and their ownership of root keys entitles them to extract wealth from me for the right to use hardware I own with software I own, until they decide that my hardware is not worth supporting and disable all software on it by the means of short-lived crypto cert?
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
Of course the user experience given by the default settings sucks, and it is being worked on. But it has nothing to do with this.
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
Wol
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit
A report from the 2022 Image-Based Linux Summit