|
|
Subscribe / Log in / New account

A security model for systemd

[LWN subscriber-only content]

By Joe Brockmeier
November 5, 2025

All Systems Go!

Linux has many security features and tools that have evolved over the years to address threats as they emerge and security gaps as they are discovered. Linux security is all, as Lennart Poettering observed at the All Systems Go! conference held in Berlin, somewhat random and not a "clean" design. To many observers, that may also appear to be the case for systemd; however, Poettering said that he does have a vision for how all of the security-related pieces of systemd are meant to fit together. He wanted to use his talk to explain "how the individual security-related parts of systemd actually fit together and why they exist in the first place".

I did not have a chance to attend the All Systems Go! conference this year, but watched the recording of the talk after it was published. The slides are also available.

What is a security model?

Poettering said that when he started drafting his slides it dawned on him that he had used the phrase "security model" frequently, but without knowing its formal definition. So he turned to Wikipedia's definition, which states:

A computer security model is a scheme for specifying and enforcing security policies. A security model may be founded upon a formal model of access rights, a model of computation, a model of distributed computing, or no particular theoretical grounding at all.

That definition was pleasing, he said, because he could just "pull something out of my hair and it's a security model." Of course, he wanted to be a bit more formal than that. Considering the threats in the world we actually live in was the place to begin.

Thinking about threats

Today's systems are always exposed, he said. They are always connected; even systems that people do not think about, such as those in cars, are effectively always online waiting for updates. And systems are often in physically untrusted environments. Many systems are hosted by cloud providers and outside the physical control of their users. Users also carry around digital devices, such as phones, tablets, and laptops: "So it is absolutely essential that we talk about security to protect them both from attacks on the network and also locally and physically."

The staff here at LWN.net really appreciate the subscribers who make our work possible. Is there a chance we could interest you in becoming one of them?

The next thing is to think about what is actually being attacked. Poettering described some of the possible scenarios; one type of attack might take advantage of a vulnerability in unprivileged code, while another might try to exploit privileged code to make it execute something it was not supposed to. It could be an attack on the kernel from user space. "We need to know what's being attacked in order to defend those parts from whomever is attacking them."

Attacks also have different goals, he said. Some attacks may target user data, others may attempt to backdoor a system, and still others may be focused on using a system's resources, or conducting a denial-of-service (DoS) attack. The type of attacks determine the type of protections to be used. Encryption, he said, is useful if one is worried about data exfiltration, but not so much for a DoS.

Poettering said that he also thought about where attacks are coming from. For example, does an attacker have physical access to a system, is the attack coming over a network, or is the attack coming from inside the system? Maybe a user has a compromised Emacs package, or something escapes a web browser's sandbox. Not all of these attack sources are relevant to systemd, of course, but thinking about security means understanding that attacks can come from everywhere.

FLOUTing security

The bottom line is that the approach to defending against attacks depends on where they come from and what the intention of the attack is. Poettering put up a new slide, which he said was the most important of all the slides in his presentation. It included his acronym for systemd's security model, "FLOUT":

Frustrate attacks
Limit exposure after successful attacks
Observe attacks
Undo attacks
Track vulnerabilities

"I call this 'FLOUT': frustrate, limit, observe, undo, and track. And I think in systemd we need to do something about all five of them".

The first step is to "frustrate" attackers; to make attacks impossible. "Don't even allow the attack to happen and all will be good." But, it does not always work that way; software is vulnerable, and exploits are inevitable. That is why limiting exposure with sandboxing is important, he said. If a system is exploited, "they might do bad stuff inside of that sandbox, but hopefully not outside of it."

Since exploits are inevitable, it also necessary to be able to observe the system and know not only that an attack happened, but how it happened as well. And, once an attack has happened and been detected, it must be undone. With containers and virtual machines, it is less important to have a reset function, Poettering said: "Just delete the VM or container, if you have the suspicion that it was exploited, and create a new one". But that approach does not work so well with physical devices. "We need to always have something like a factory reset that we can return to a well-defined state" and know that it is no longer exploited. Finally, there is tracking vulnerabilities. Ideally, he said, you want to know in advance if something is vulnerable.

Poettering returned to a theme from the beginning of the talk; the fact that Linux, and its security features, were not designed "in a smooth, elegant way". There are so many different security components, he complained, ranging from the original Unix model with UIDs and GIDs, to user namespaces. "And if you want to use them together, its your problem". Too much complexity means less security.

He said that he preferred universal security mechanisms to fine-grained ones. This means finding general rules that always apply and implementing security policies that match those rules, rather than trying to apply policies for specific projects or use cases. He gave the example that device nodes should only be allowed in /dev. That is a very simple security policy that is not tied to any specific hardware.

But that is not how many of Linux's security mechanisms are built. SELinux, for instance, requires a specific policy for each daemon. Then, one might write the policy that forbids that daemon from creating device nodes. But that is much more fragile and difficult to maintain, he said. "It's much easier figuring out universal truths and enforcing them system-wide". To do that, components should be isolated into separate worlds.

Worlds apart

Poettering said that he liked to use the word "worlds" because it's not used much in the Linux community, so far. The term "worlds" could be replaced with "containers", "sandboxes", "namespaces", and so on. The important concept is that something in a separate world is not only restricted from accessing resources that are outside of that world, it should not see those resources at all.

So to keep the complexity of these sandboxes small, it's good if all these objects are not even visible, not even something you have to think about controlling access to, because they are not there, right?

Security rules should be that way, he said, and deal with isolation and visibility. That is different than the way SELinux works; everything still runs in the same world. An application may be locked down, but it still sees everything else.

The next fundamental thing to think about, he said, is to figure out what an application is in the first place and how to model it for security. It is not just an executable binary, but a combination of libraries, runtimes, data resources, configuration files, and more, all put together. To have a security model, "we need to model apps so that we know how to apply the security" to them.

Ideally, an app would be something like an Open Container Initiative (OCI) image or Flatpak container that has all of its resources shipped in an "atomic" combination; that is, all of the components are shipped together and updated together. In this way, he said, each application is its own world. Here, Poettering seemed to be comparing the update model for Docker-type containers and Flatpak containers to package-based application updates, where an application's dependencies might updated independently; he said that "non-atomic behavior" is a security vulnerability because different components may not be tested together.

Another piece of a security model is delegation; components need to be able to talk to one another and delegate tasks. On the server side, the database and web server must be able to talk to one another. On the desktop, the application that needs a Bluetooth device needs to be able to talk to the application that manages Bluetooth devices.

Security boundaries

Poettering also talked about different types of security boundaries. Security sandboxes are one type of boundary that most people already think about, and boundaries between user identities (UIDs). A system's different boot phases are yet another type of boundary; for example, during certain parts of the boot process there are values that are measured into the TPM. After that phase of the boot process is finished it "kind of blows a fuse" and the system can no longer modify those values, which provides a security boundary.

He said that there are also distinctions that are important between code, configuration, and state. Code is executable, but the configuration is not. The resources should be kept separate; state and configuration should be mutable, but code should not be mutable "because that's an immediate exploit, basically, if some app or user manages to change the code".

Along with the security boundaries are the technologies that enforce those boundaries; for example, Linux namespaces, SELinux security labels, CPU rings, and others.

Distributions

The Linux distribution code-review model is supposed to be a security feature, he said. It means that users do not have to download software from 500 different sources they "cannot possibly understand if they are trustworthy or not". Instead, users rely on distributions to do some vetting of the code.

However, Poettering said that there are problems with this model: namely that it does not scale and it is too slow. Distributions cannot package everything, and they cannot keep up with how quickly developers release software. Plus, code reviews are hard, even harder than programming. "So do we really trust all the packagers and the distributions to do this comprehensively? I can tell you I'm not." This is not to disrespect distribution packagers, he said: "I'm just saying that because I know I'm struggling with code reviews, and so I assume that other people are not necessarily much better than me".

One never knows, he said, if distribution packagers are actually reviewing the code they package, and "sometimes it becomes visible that they don't; let's hope that those are the exceptions". Sandboxing and compartmentalizing, Poettering said, is essential to ensure that users do not have to rely solely on code review for protection.

Rules

Having examined all the things that one has to think about when creating a security model, Poettering wanted to share the rules that he has come up with. The first is that kernel objects should be authenticated before they are instantiated. "We should minimize any interaction with data, with objects, with stuff that hasn't been authenticated yet because that is always where the risk is."

Poettering also said that security should focus on images, not files; look at the security of an entire app image, rather than trying to examine individual files (or "inodes" as he put it). "We should measure everything in combination before we use it". He brought up sandboxing again, and said that it was necessary to "isolate everywhere".

Another rule is that a comprehensive factory reset is a must, he said. This cannot be an afterthought, but something that needs to be in the system right away. And, finally, "we need to really protect our security boundaries".

But, he said, a security model still has to be useful. And, "as most of us here are hackers" there needs to be a break-glass mode that allows for making temporary changes and debugging. A break glass mode should be a measured and logged event, though: "Even if you are allowed to do this, there needs to be a trace of it afterward". Such a mode should not allow a developer to exfiltrate data from a system, and possibly even invalidate data in some way.

Linux misdesigns

Next, Poettering identified some of the things he felt were misdesigns in the Linux and Unix security models that he does not want to rely on. His first gripe was with the SUID (or "setuid") bit on files. This is not a new topic for him; Poettering said that general-purpose Linux distributions should get rid of SUID binaries in 2023, in response to a GNU C library (glibc) vulnerability. Instead, he suggested using interprocess communication (IPC) to manage executing a privileged operation on behalf of an unprivileged user.

He also felt that the Linux capabilities implementation is a terrible thing. The feature is "kind of necessary", but a design mistake. For example, CAP_SYS_ADMIN is "this grab bag of privileges of the super user". He complained that it had a privilege "so much bigger than all the other ones that it's a useless separation" of privileges. However, complaints about CAP_SYS_ADMIN are neither new nor rare; Michael Kerrisk, for example, enumerated several in his LWN article about it in 2012.

In any case, Poettering did acknowledge that capabilities are "not entirely useless", and that systemd makes heavy use of capabilities. However, "we only make use of it because it's there, and it's really basic, and you cannot even turn it off in the kernel".

One of the core Unix designs that Linux has inherited is "everything is a file". That is, he said, not actually true. There are certain kinds of objects that are not inodes, such as System V semaphores and System V shared memory. That is a problem, because they are objects with a different type of access control than inodes where "at least we know how security works".

Implementation in systemd

"Now, let's be concrete", Poettering said. It was time to explain how systemd implements the security model that he had discussed, and where its components fit into the FLOUT framework. The first was to sandbox services, to limit exposure; systemd has a number of features for putting services into their own sandbox.

Another is using dm-verity and signatures for discoverable disk images (DDIs) that are inspected to ensure they meet image policies. Verifying disk images would frustrate attackers, as well as provide observability; if a disk image does not match the signature, that is a sign of tampering. Systemd's factory reset features provide the "undo" part of the FLOUT framework; in systemd v258 the project added the ability to reset the TPM as well as disk partitions. LWN covered that in August 2025.

Poettering said that we should also "try really hard to do writable XOR executable mounts". A filesystems should be mounted writable so that its contents can be modified, or it should be mounted as executable so that binaries could be run from it. But a filesystem should never be both. If that were implemented through the whole system, he said, it would be much more secure. Systemd provides tools to do this, in part, with its system extension features. Systemd can mount system extension images (sysext) for /usr and /opt, and configuration extension images (confext) for /etc. The default is to mount these extension read-only, though it is possible to make them writable.

Systemd also uses the TPM a lot, "for fundamental key material" to decrypt disks (systemd-cryptsetup) and service credentials (systemd-creds). That, he said, helped to frustrate attackers and limit access. Finally, he quickly mentioned using the varlink IPC model for delegating and making requests to services, which also helped as a way to limit access.

Questions

One member of the audience wanted to know how Poettering would replace capabilities if he had a magic wand capable of doing so. "If you don't like it, what would you like to see instead?" Poettering responded that his issue was not with the capability model per se, but with the actual implementation in Linux. He said that he liked FreeBSD's Capsicum: "if they would implement that, that would be lovely".

Another attendee asked when systemd would enable the no new privileges flag. Poettering said that it was already possible to use that flag with systemd because it does not have SUID binaries. "We do not allow that". But, he said, it does not mean that the rest of the system is free of SUID binaries. It should be the goal, "at least in well-defined systems" to just get rid of SUID binaries.


Index entries for this article
ConferenceAll Systems Go!/2025



to post comments

OpenBSD pledge & unveil are also nice

Posted Nov 5, 2025 16:15 UTC (Wed) by rbranco (subscriber, #129813) [Link]

There are implementations for Linux using seccomp & Landlock:
- https://github.com/jart/pledge/
- https://github.com/marty1885/landlock-unveil


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds