A security model for systemd
[LWN subscriber-only content]
Linux has many security features and tools that have evolved over
the years to address threats as they emerge and security gaps as they
are discovered. Linux security is all, as Lennart Poettering observed at the All Systems Go! conference held
in Berlin, somewhat random and not a "clean
"
design. To many observers, that may also appear to be the case for
systemd; however, Poettering said that he does have a vision for how
all of the security-related pieces of systemd are meant to fit
together. He wanted to use his talk to explain "how the individual
security-related parts of systemd actually fit together and why they
exist in the first place
".
I did not have a chance to attend the All Systems Go! conference this year, but watched the recording of the talk after it was published. The slides are also available.
What is a security model?
Poettering said that when he started drafting his slides it dawned on him that he had used the phrase "security model" frequently, but without knowing its formal definition. So he turned to Wikipedia's definition, which states:
A computer security model is a scheme for specifying and enforcing security policies. A security model may be founded upon a formal model of access rights, a model of computation, a model of distributed computing, or no particular theoretical grounding at all.
That definition was pleasing, he said, because he could just
"pull something out of my hair and it's a security model
." Of
course, he wanted to be a bit more formal than that. Considering the
threats in the world we actually live in was the place to begin.
Thinking about threats
Today's systems are always exposed, he said. They are always
connected; even systems that people do not think about, such as those
in cars, are effectively always online waiting for updates. And
systems are often in physically untrusted environments. Many systems
are hosted by cloud providers and outside the physical control of
their users. Users also carry around digital devices, such as phones,
tablets, and laptops: "So it is absolutely essential that we talk
about security to protect them both from attacks on the network and
also locally and physically.
"
The staff here at LWN.net really appreciate the subscribers who make our work possible. Is there a chance we could interest you in becoming one of them?
The next thing is to think about what is actually being
attacked. Poettering described some of the possible scenarios; one
type of attack might take advantage of a vulnerability in unprivileged
code, while another might try to exploit privileged code to make it
execute something it was not supposed to. It could be
an attack on the kernel from user space. "We need to know what's
being attacked in order to defend those parts from whomever is
attacking them
."
Attacks also have different goals, he said. Some attacks may target user data, others may attempt to backdoor a system, and still others may be focused on using a system's resources, or conducting a denial-of-service (DoS) attack. The type of attacks determine the type of protections to be used. Encryption, he said, is useful if one is worried about data exfiltration, but not so much for a DoS.
Poettering said that he also thought about where attacks are coming from. For example, does an attacker have physical access to a system, is the attack coming over a network, or is the attack coming from inside the system? Maybe a user has a compromised Emacs package, or something escapes a web browser's sandbox. Not all of these attack sources are relevant to systemd, of course, but thinking about security means understanding that attacks can come from everywhere.
FLOUTing security
The bottom line is that the approach to defending against attacks depends on where they come from and what the intention of the attack is. Poettering put up a new slide, which he said was the most important of all the slides in his presentation. It included his acronym for systemd's security model, "FLOUT":
Frustrate attacks
Limit exposure after successful attacks
Observe attacks
Undo attacks
Track vulnerabilities
"I call this 'FLOUT': frustrate, limit, observe, undo, and
track. And I think in systemd we need to do something about all five
of them
".
The first step is to "frustrate
" attackers; to make attacks
impossible. "Don't even allow the attack to happen and all will be
good.
" But, it does not always work that way; software
is vulnerable, and exploits are inevitable. That is why limiting
exposure with sandboxing is important, he said. If a system is
exploited, "they might do bad stuff inside of that sandbox, but
hopefully not outside of it
."
Since exploits are inevitable, it also necessary to be able to
observe the system and know not only that an attack happened, but how
it happened as well. And, once an attack has happened and been
detected, it must be undone. With containers and virtual machines, it
is less important to have a reset function, Poettering said: "Just
delete the VM or container, if you have the suspicion that it was
exploited, and create a new one
". But that approach does not work
so well with physical devices. "We need to always have something
like a factory reset that we can return to a well-defined state
"
and know that it is no longer exploited. Finally, there is tracking
vulnerabilities. Ideally, he said, you want to know in advance if
something is vulnerable.
Poettering returned to a theme from the beginning of the talk; the
fact that Linux, and its security features, were not designed "in a
smooth, elegant way
". There are so many different security
components, he complained, ranging from the original Unix model with
UIDs and GIDs, to user namespaces. "And if you want to use them
together, its your problem
". Too much complexity means less
security.
He said that he preferred universal security mechanisms to fine-grained ones. This means finding general rules that always apply and implementing security policies that match those rules, rather than trying to apply policies for specific projects or use cases. He gave the example that device nodes should only be allowed in /dev. That is a very simple security policy that is not tied to any specific hardware.
But that is not how many of Linux's security mechanisms are
built. SELinux, for instance, requires a specific policy for each
daemon. Then, one might write the policy that forbids that daemon from
creating device nodes. But that is much more fragile and difficult to
maintain, he said. "It's much easier figuring out universal truths and
enforcing them system-wide
". To do that, components should be
isolated into separate worlds.
Worlds apart
Poettering said that he liked to use the word "worlds
"
because it's not used much in the Linux community, so far. The term
"worlds" could be replaced with "containers", "sandboxes", "namespaces",
and so on. The important concept is that something in a separate world
is not only restricted from accessing resources that are outside of
that world, it should not see those resources at all.
So to keep the complexity of these sandboxes small, it's good if all these objects are not even visible, not even something you have to think about controlling access to, because they are not there, right?
Security rules should be that way, he said, and deal with isolation and visibility. That is different than the way SELinux works; everything still runs in the same world. An application may be locked down, but it still sees everything else.
The next fundamental thing to think about, he said, is to figure
out what an application is in the first place and how to model it for
security. It is not just an executable binary, but a combination of
libraries, runtimes, data resources, configuration files, and more,
all put together. To have a security model, "we need to model apps
so that we know how to apply the security
" to them.
Ideally, an app would be something like an Open
Container Initiative (OCI) image or Flatpak container that has all
of its resources shipped in an "atomic
" combination; that is,
all of the components are shipped together and updated together. In
this way, he said, each application is its own world. Here, Poettering
seemed to be comparing the update model for Docker-type containers and
Flatpak containers to package-based application updates, where an
application's dependencies might updated independently; he said that
"non-atomic behavior
" is a security vulnerability because
different components may not be tested together.
Another piece of a security model is delegation; components need to be able to talk to one another and delegate tasks. On the server side, the database and web server must be able to talk to one another. On the desktop, the application that needs a Bluetooth device needs to be able to talk to the application that manages Bluetooth devices.
Security boundaries
Poettering also talked about different types of security
boundaries. Security sandboxes are one type of boundary that most
people already think about, and boundaries between user identities
(UIDs). A system's different boot phases are yet another type of boundary; for
example, during certain parts of the boot process there are values
that are measured into the TPM. After that phase of the boot process
is finished it "kind of blows a fuse
" and the system can no
longer modify those values, which provides a security boundary.
He said that there are also distinctions that are important between
code, configuration, and state. Code is executable, but the
configuration is not. The resources should be kept separate; state and
configuration should be mutable, but code should not be mutable
"because that's an immediate exploit, basically, if some app or
user manages to change the code
".
Along with the security boundaries are the technologies that enforce those boundaries; for example, Linux namespaces, SELinux security labels, CPU rings, and others.
Distributions
The Linux distribution code-review model is supposed to be a
security feature, he said. It means that users do not have to download
software from 500 different sources they "cannot possibly
understand if they are trustworthy or not
". Instead, users rely on
distributions to do some vetting of the code.
However, Poettering said that there are problems with this model:
namely that it does not scale and it is too slow. Distributions cannot
package everything, and they cannot keep up with how quickly
developers release software. Plus, code reviews are hard, even harder
than programming. "So do we really trust all the packagers and the
distributions to do this comprehensively? I can tell you I'm not.
"
This is not to disrespect distribution packagers, he said: "I'm
just saying that because I know I'm struggling with code reviews, and
so I assume that other people are not necessarily much better than
me
".
One never knows, he said, if distribution packagers are
actually reviewing the code they package, and "sometimes
it becomes visible that they don't; let's hope that those are the
exceptions
". Sandboxing and compartmentalizing, Poettering said,
is essential to ensure that users do not have to rely solely on code
review for protection.
Rules
Having examined all the things that one has to think about when
creating a security model, Poettering wanted to share the rules that
he has come up with. The first is that kernel objects should be
authenticated before they are instantiated. "We should minimize any
interaction with data, with objects, with stuff that hasn't been
authenticated yet because that is always where the risk is.
"
Poettering also said that security should focus on images, not
files; look at the security of an entire app image, rather than trying
to examine individual files (or "inodes
" as he put it). "We
should measure everything in combination before we use it
". He
brought up sandboxing again, and said that it was necessary to
"isolate everywhere
".
Another rule is that a comprehensive factory reset is a must, he
said. This cannot be an afterthought, but something that needs to be
in the system right away. And, finally, "we need to really protect
our security boundaries
".
But, he said, a security model still has to be useful. And, "as
most of us here are hackers
" there needs to be a break-glass
mode that allows for making temporary changes and debugging. A break
glass mode should be a measured and logged event, though: "Even if
you are allowed to do this, there needs to be a trace of it
afterward
". Such a mode should not allow a developer to exfiltrate
data from a system, and possibly even invalidate data in some way.
Linux misdesigns
Next, Poettering identified some of the things he felt were misdesigns in the Linux and Unix security models that he does not want to rely on. His first gripe was with the SUID (or "setuid") bit on files. This is not a new topic for him; Poettering said that general-purpose Linux distributions should get rid of SUID binaries in 2023, in response to a GNU C library (glibc) vulnerability. Instead, he suggested using interprocess communication (IPC) to manage executing a privileged operation on behalf of an unprivileged user.
He also felt that the Linux capabilities
implementation is a terrible thing. The feature is "kind of
necessary
", but a design mistake. For example,
CAP_SYS_ADMIN is "this grab bag of privileges of the super
user
". He complained that it had a privilege "so much bigger
than all the other ones that it's a useless separation
" of
privileges. However, complaints about CAP_SYS_ADMIN are
neither new nor rare; Michael Kerrisk, for example, enumerated several
in his LWN article
about it in 2012.
In any case, Poettering did acknowledge that capabilities are
"not entirely useless
", and that systemd makes heavy use of
capabilities. However, "we only make use of it because it's there,
and it's really basic, and you cannot even turn it off in the
kernel
".
One of the core Unix designs that Linux has inherited is
"everything is a file". That is, he said, not actually true. There are
certain kinds of objects that are not inodes, such as System V
semaphores and System V shared memory. That is a problem, because
they are objects with a different type of access control than inodes
where "at least we know how security works
".
Implementation in systemd
"Now, let's be concrete
", Poettering said. It was time to
explain how systemd implements the security model that he had
discussed, and where its components fit into the FLOUT framework. The first was
to sandbox services, to limit exposure; systemd has a number of
features for putting services into their own sandbox.
Another is using dm-verity and signatures for discoverable disk images (DDIs) that are inspected to ensure they meet image policies. Verifying disk images would frustrate attackers, as well as provide observability; if a disk image does not match the signature, that is a sign of tampering. Systemd's factory reset features provide the "undo" part of the FLOUT framework; in systemd v258 the project added the ability to reset the TPM as well as disk partitions. LWN covered that in August 2025.
Poettering said that we should also "try really hard to do
writable XOR executable mounts
". A filesystems should be mounted
writable so that its contents can be modified, or it should
be mounted as executable so that binaries could be run from it. But a
filesystem should never be both. If that were implemented through the
whole system, he said, it would be much more secure. Systemd provides
tools to do this, in part, with its system
extension features. Systemd can mount system extension images
(sysext) for /usr and /opt, and configuration
extension images (confext) for /etc. The default is to mount
these extension read-only, though it is possible to make them writable.
Systemd also uses the TPM a lot, "for fundamental key
material
" to decrypt disks (systemd-cryptsetup)
and service credentials (systemd-creds). That,
he said, helped to frustrate attackers and limit access. Finally, he
quickly mentioned using the varlink
IPC model for delegating and making requests to services, which also
helped as a way to limit access.
Questions
One member of the audience wanted to know how Poettering would
replace capabilities if he had a magic wand capable of doing
so. "If you don't like it, what would you like to see
instead?
" Poettering responded that his issue was not with the
capability model per se, but with the actual implementation
in Linux. He said that he liked FreeBSD's Capsicum: "if they
would implement that, that would be lovely
".
Another attendee asked when systemd would enable the no
new privileges flag. Poettering said that it was already possible
to use that flag with systemd because it does not have SUID
binaries. "We do not allow that
". But, he said, it does not
mean that the rest of the system is free of SUID binaries. It should
be the goal, "at least in well-defined systems
" to just get rid
of SUID binaries.
| Index entries for this article | |
|---|---|
| Conference | All Systems Go!/2025 |
Posted Nov 5, 2025 16:15 UTC (Wed)
by rbranco (subscriber, #129813)
[Link]
OpenBSD pledge & unveil are also nice
- https://github.com/jart/pledge/
- https://github.com/marty1885/landlock-unveil
