A security model for systemd
[LWN subscriber-only content]
Linux has many security features and tools that have evolved over
the years to address threats as they emerge and security gaps as they
are discovered. Linux security is all, as Lennart Poettering observed at the All Systems Go! conference held
in Berlin, somewhat random and not a "clean
"
design. To many observers, that may also appear to be the case for
systemd; however, Poettering said that he does have a vision for how
all of the security-related pieces of systemd are meant to fit
together. He wanted to use his talk to explain "how the individual
security-related parts of systemd actually fit together and why they
exist in the first place
".
I did not have a chance to attend the All Systems Go! conference this year, but watched the recording of the talk after it was published. The slides are also available.
What is a security model?
Poettering said that when he started drafting his slides it dawned on him that he had used the phrase "security model" frequently, but without knowing its formal definition. So he turned to Wikipedia's definition, which states:
A computer security model is a scheme for specifying and enforcing security policies. A security model may be founded upon a formal model of access rights, a model of computation, a model of distributed computing, or no particular theoretical grounding at all.
That definition was pleasing, he said, because he could just
"pull something out of my hair and it's a security model
." Of
course, he wanted to be a bit more formal than that. Considering the
threats in the world we actually live in was the place to begin.
Thinking about threats
Today's systems are always exposed, he said. They are always
connected; even systems that people do not think about, such as those
in cars, are effectively always online waiting for updates. And
systems are often in physically untrusted environments. Many systems
are hosted by cloud providers and outside the physical control of
their users. Users also carry around digital devices, such as phones,
tablets, and laptops: "So it is absolutely essential that we talk
about security to protect them both from attacks on the network and
also locally and physically.
"
The staff here at LWN.net really appreciate the subscribers who make our work possible. Is there a chance we could interest you in becoming one of them?
The next thing is to think about what is actually being
attacked. Poettering described some of the possible scenarios; one
type of attack might take advantage of a vulnerability in unprivileged
code, while another might try to exploit privileged code to make it
execute something it was not supposed to. It could be
an attack on the kernel from user space. "We need to know what's
being attacked in order to defend those parts from whomever is
attacking them
."
Attacks also have different goals, he said. Some attacks may target user data, others may attempt to backdoor a system, and still others may be focused on using a system's resources, or conducting a denial-of-service (DoS) attack. The type of attacks determine the type of protections to be used. Encryption, he said, is useful if one is worried about data exfiltration, but not so much for a DoS.
Poettering said that he also thought about where attacks are coming from. For example, does an attacker have physical access to a system, is the attack coming over a network, or is the attack coming from inside the system? Maybe a user has a compromised Emacs package, or something escapes a web browser's sandbox. Not all of these attack sources are relevant to systemd, of course, but thinking about security means understanding that attacks can come from everywhere.
FLOUTing security
The bottom line is that the approach to defending against attacks depends on where they come from and what the intention of the attack is. Poettering put up a new slide, which he said was the most important of all the slides in his presentation. It included his acronym for systemd's security model, "FLOUT":
Frustrate attacks
Limit exposure after successful attacks
Observe attacks
Undo attacks
Track vulnerabilities
"I call this 'FLOUT': frustrate, limit, observe, undo, and
track. And I think in systemd we need to do something about all five
of them
".
The first step is to "frustrate
" attackers; to make attacks
impossible. "Don't even allow the attack to happen and all will be
good.
" But, it does not always work that way; software
is vulnerable, and exploits are inevitable. That is why limiting
exposure with sandboxing is important, he said. If a system is
exploited, "they might do bad stuff inside of that sandbox, but
hopefully not outside of it
."
Since exploits are inevitable, it also necessary to be able to
observe the system and know not only that an attack happened, but how
it happened as well. And, once an attack has happened and been
detected, it must be undone. With containers and virtual machines, it
is less important to have a reset function, Poettering said: "Just
delete the VM or container, if you have the suspicion that it was
exploited, and create a new one
". But that approach does not work
so well with physical devices. "We need to always have something
like a factory reset that we can return to a well-defined state
"
and know that it is no longer exploited. Finally, there is tracking
vulnerabilities. Ideally, he said, you want to know in advance if
something is vulnerable.
Poettering returned to a theme from the beginning of the talk; the
fact that Linux, and its security features, were not designed "in a
smooth, elegant way
". There are so many different security
components, he complained, ranging from the original Unix model with
UIDs and GIDs, to user namespaces. "And if you want to use them
together, its your problem
". Too much complexity means less
security.
He said that he preferred universal security mechanisms to fine-grained ones. This means finding general rules that always apply and implementing security policies that match those rules, rather than trying to apply policies for specific projects or use cases. He gave the example that device nodes should only be allowed in /dev. That is a very simple security policy that is not tied to any specific hardware.
But that is not how many of Linux's security mechanisms are
built. SELinux, for instance, requires a specific policy for each
daemon. Then, one might write the policy that forbids that daemon from
creating device nodes. But that is much more fragile and difficult to
maintain, he said. "It's much easier figuring out universal truths and
enforcing them system-wide
". To do that, components should be
isolated into separate worlds.
Worlds apart
Poettering said that he liked to use the word "worlds
"
because it's not used much in the Linux community, so far. The term
"worlds" could be replaced with "containers", "sandboxes", "namespaces",
and so on. The important concept is that something in a separate world
is not only restricted from accessing resources that are outside of
that world, it should not see those resources at all.
So to keep the complexity of these sandboxes small, it's good if all these objects are not even visible, not even something you have to think about controlling access to, because they are not there, right?
Security rules should be that way, he said, and deal with isolation and visibility. That is different than the way SELinux works; everything still runs in the same world. An application may be locked down, but it still sees everything else.
The next fundamental thing to think about, he said, is to figure
out what an application is in the first place and how to model it for
security. It is not just an executable binary, but a combination of
libraries, runtimes, data resources, configuration files, and more,
all put together. To have a security model, "we need to model apps
so that we know how to apply the security
" to them.
Ideally, an app would be something like an Open
Container Initiative (OCI) image or Flatpak container that has all
of its resources shipped in an "atomic
" combination; that is,
all of the components are shipped together and updated together. In
this way, he said, each application is its own world. Here, Poettering
seemed to be comparing the update model for Docker-type containers and
Flatpak containers to package-based application updates, where an
application's dependencies might updated independently; he said that
"non-atomic behavior
" is a security vulnerability because
different components may not be tested together.
Another piece of a security model is delegation; components need to be able to talk to one another and delegate tasks. On the server side, the database and web server must be able to talk to one another. On the desktop, the application that needs a Bluetooth device needs to be able to talk to the application that manages Bluetooth devices.
Security boundaries
Poettering also talked about different types of security
boundaries. Security sandboxes are one type of boundary that most
people already think about, and boundaries between user identities
(UIDs). A system's different boot phases are yet another type of boundary; for
example, during certain parts of the boot process there are values
that are measured into the TPM. After that phase of the boot process
is finished it "kind of blows a fuse
" and the system can no
longer modify those values, which provides a security boundary.
He said that there are also distinctions that are important between
code, configuration, and state. Code is executable, but the
configuration is not. The resources should be kept separate; state and
configuration should be mutable, but code should not be mutable
"because that's an immediate exploit, basically, if some app or
user manages to change the code
".
Along with the security boundaries are the technologies that enforce those boundaries; for example, Linux namespaces, SELinux security labels, CPU rings, and others.
Distributions
The Linux distribution code-review model is supposed to be a
security feature, he said. It means that users do not have to download
software from 500 different sources they "cannot possibly
understand if they are trustworthy or not
". Instead, users rely on
distributions to do some vetting of the code.
However, Poettering said that there are problems with this model:
namely that it does not scale and it is too slow. Distributions cannot
package everything, and they cannot keep up with how quickly
developers release software. Plus, code reviews are hard, even harder
than programming. "So do we really trust all the packagers and the
distributions to do this comprehensively? I can tell you I'm not.
"
This is not to disrespect distribution packagers, he said: "I'm
just saying that because I know I'm struggling with code reviews, and
so I assume that other people are not necessarily much better than
me
".
One never knows, he said, if distribution packagers are
actually reviewing the code they package, and "sometimes
it becomes visible that they don't; let's hope that those are the
exceptions
". Sandboxing and compartmentalizing, Poettering said,
is essential to ensure that users do not have to rely solely on code
review for protection.
Rules
Having examined all the things that one has to think about when
creating a security model, Poettering wanted to share the rules that
he has come up with. The first is that kernel objects should be
authenticated before they are instantiated. "We should minimize any
interaction with data, with objects, with stuff that hasn't been
authenticated yet because that is always where the risk is.
"
Poettering also said that security should focus on images, not
files; look at the security of an entire app image, rather than trying
to examine individual files (or "inodes
" as he put it). "We
should measure everything in combination before we use it
". He
brought up sandboxing again, and said that it was necessary to
"isolate everywhere
".
Another rule is that a comprehensive factory reset is a must, he
said. This cannot be an afterthought, but something that needs to be
in the system right away. And, finally, "we need to really protect
our security boundaries
".
But, he said, a security model still has to be useful. And, "as
most of us here are hackers
" there needs to be a break-glass
mode that allows for making temporary changes and debugging. A break
glass mode should be a measured and logged event, though: "Even if
you are allowed to do this, there needs to be a trace of it
afterward
". Such a mode should not allow a developer to exfiltrate
data from a system, and possibly even invalidate data in some way.
Linux misdesigns
Next, Poettering identified some of the things he felt were misdesigns in the Linux and Unix security models that he does not want to rely on. His first gripe was with the SUID (or "setuid") bit on files. This is not a new topic for him; Poettering said that general-purpose Linux distributions should get rid of SUID binaries in 2023, in response to a GNU C library (glibc) vulnerability. Instead, he suggested using interprocess communication (IPC) to manage executing a privileged operation on behalf of an unprivileged user.
He also felt that the Linux capabilities
implementation is a terrible thing. The feature is "kind of
necessary
", but a design mistake. For example,
CAP_SYS_ADMIN is "this grab bag of privileges of the super
user
". He complained that it had a privilege "so much bigger
than all the other ones that it's a useless separation
" of
privileges. However, complaints about CAP_SYS_ADMIN are
neither new nor rare; Michael Kerrisk, for example, enumerated several
in his LWN article
about it in 2012.
In any case, Poettering did acknowledge that capabilities are
"not entirely useless
", and that systemd makes heavy use of
capabilities. However, "we only make use of it because it's there,
and it's really basic, and you cannot even turn it off in the
kernel
".
One of the core Unix designs that Linux has inherited is
"everything is a file". That is, he said, not actually true. There are
certain kinds of objects that are not inodes, such as System V
semaphores and System V shared memory. That is a problem, because
they are objects with a different type of access control than inodes
where "at least we know how security works
".
Implementation in systemd
"Now, let's be concrete
", Poettering said. It was time to
explain how systemd implements the security model that he had
discussed, and where its components fit into the FLOUT framework. The first was
to sandbox services, to limit exposure; systemd has a number of
features for putting services into their own sandbox.
Another is using dm-verity and signatures for discoverable disk images (DDIs) that are inspected to ensure they meet image policies. Verifying disk images would frustrate attackers, as well as provide observability; if a disk image does not match the signature, that is a sign of tampering. Systemd's factory reset features provide the "undo" part of the FLOUT framework; in systemd v258 the project added the ability to reset the TPM as well as disk partitions. LWN covered that in August 2025.
Poettering said that we should also "try really hard to do
writable XOR executable mounts
". A filesystems should be mounted
writable so that its contents can be modified, or it should
be mounted as executable so that binaries could be run from it. But a
filesystem should never be both. If that were implemented through the
whole system, he said, it would be much more secure. Systemd provides
tools to do this, in part, with its system
extension features. Systemd can mount system extension images
(sysext) for /usr and /opt, and configuration
extension images (confext) for /etc. The default is to mount
these extension read-only, though it is possible to make them writable.
Systemd also uses the TPM a lot, "for fundamental key
material
" to decrypt disks (systemd-cryptsetup)
and service credentials (systemd-creds). That,
he said, helped to frustrate attackers and limit access. Finally, he
quickly mentioned using the varlink
IPC model for delegating and making requests to services, which also
helped as a way to limit access.
Questions
One member of the audience wanted to know how Poettering would
replace capabilities if he had a magic wand capable of doing
so. "If you don't like it, what would you like to see
instead?
" Poettering responded that his issue was not with the
capability model per se, but with the actual implementation
in Linux. He said that he liked FreeBSD's Capsicum: "if they
would implement that, that would be lovely
".
Another attendee asked when systemd would enable the no
new privileges flag. Poettering said that it was already possible
to use that flag with systemd because it does not have SUID
binaries. "We do not allow that
". But, he said, it does not
mean that the rest of the system is free of SUID binaries. It should
be the goal, "at least in well-defined systems
" to just get rid
of SUID binaries.
| Index entries for this article | |
|---|---|
| Conference | All Systems Go!/2025 |
Posted Nov 5, 2025 16:15 UTC (Wed)
by rbranco (subscriber, #129813)
[Link]
Posted Nov 5, 2025 17:49 UTC (Wed)
by SLi (subscriber, #53131)
[Link] (1 responses)
Now it has a bit of a feeling of waterfall development with agreed responsibilities and "you stay there, I stay here".
I'm not even saying this is bad. It's actually very good that the userspace/kernel API gets defined well and narrowly. Rather, do you see this as a hindrance?
Posted Nov 5, 2025 18:17 UTC (Wed)
by bluca (subscriber, #118303)
[Link]
So as always it's nuanced, and there's a bit of both at play.
Posted Nov 5, 2025 18:16 UTC (Wed)
by smcv (subscriber, #53363)
[Link]
... so, he'd like to replace the thing that is named "capabilities" (but, confusingly, is not a capability-based security model with the meaning used in e.g. https://en.wikipedia.org/wiki/Capability-based_security) with a capability-based security model. Terminology is hard!
Posted Nov 5, 2025 18:32 UTC (Wed)
by nim-nim (subscriber, #34454)
[Link] (10 responses)
The disconnect is expecting too much of package validation and too little of image validation, because we wish for things to be simpler than they are.
That being said containment of app systems is clearly worthwhile however you assemble those app systems.
Posted Nov 5, 2025 20:28 UTC (Wed)
by ebee_matteo (subscriber, #165284)
[Link] (9 responses)
Of course, you are right that this does solve just the integrity problem and does not prove authenticity.
The trust boundary however is pushed a bit further: at the point where you can validate an image was signed by the right people with the right keys.
Compare with a Debian package, whose files can be modified on disk after installation by a malicious user. And of course, an image also often implies a reproducible environment (e.g. controlled env variables, etc.) which makes it a bit harder to exploit.
Posted Nov 5, 2025 21:05 UTC (Wed)
by bluca (subscriber, #118303)
[Link] (7 responses)
It provides authenticity too, as the point being made was about signed dm-verity images. The signature is verified by the kernel keyring, so both authenticity and integrity are covered.
Of course this is not the case when using more antiquated image formats such as Docker, where it's just a bunch of tarballs in a trenchcoat, but systemd has been supporting better image formats for a long time now.
Posted Nov 5, 2025 22:06 UTC (Wed)
by nim-nim (subscriber, #34454)
[Link] (2 responses)
We’ve known for quite a long time it is useless to install genuine foo if genuine foo can not resist exploitation as soon as it is put online (as companies deploying golden Windows images discovered as soon as networking became common), and we’ve known for quite a long time attackers rarely bother altering components in flight they typo squat and trick you into installing genuine authenticated malware (not different from the counterfeits that flood marketplaces and that Amazon or Alibaba will happily ship you in their genuine state).
Security comes from the ability to rebuild and redistribute stuff when it has a hole (honest mistakes) and from poking inside stuff that will be redistributed to check it actually is what it pretends to be (deliberate poisoning). And then you can sign the result and argue if your signing is solid or not, but signing is only worthwhile if the two previous steps have been done properly.
Posted Nov 6, 2025 1:22 UTC (Thu)
by Nahor (subscriber, #51583)
[Link] (1 responses)
If what you built and distributed can easily be replaced without you knowing, then those two steps are of limited value too.
And continuing your line of thought, if signing/immutability/rebuild/distribution are all done right, they are useless if you don't very the source code you're using.
And even if the source code verification is done right, it is useless if the person doing the verification and signing of the code can be corrupted or coerced with a $5 wrench.
And even if [...]
TLDR; what you're arguing is that security is pointless and of limited values because there will always be a point where you have to trust something or someone. There will always be a weak link. All we can do is ensuring that most links are safe to reduce the attack surface. Using images is one step in that direction.
Posted Nov 6, 2025 7:32 UTC (Thu)
by nim-nim (subscriber, #34454)
[Link]
We trust a system, where maximum transparency, accountability and absence of lockdown keep vendors honest. We trust the regulator, that forces vendors to provide a minimum of information on their pretty boxes, we trust consumer associations, that check the regulator is not captured by vendors, we trust people that perform reviews, tests and disassembly of products, the more so they are diverse and independent and unable to collude with one another, we trust competition and the regulations that enforce this competition and prevent vendors from cornering and locking down some part of the market.
And then you can add a certificate of genuine authenticity to the mix but most of the things you'll buy in real life don’t come with those because that’s icing on the cake no more. Trusting the vendor produces 737 maxes. This is not a fluke but human nature. You're usually better served by checking other things such as the quality of materials and assembly.
Performing third party checks is hard, and long, and those checks are usually incomplete, be it by distributions or in the real world, while printing authenticity certificates is easy. It is very tempting to slap shiny certificates on an opaque box and declare mission accomplished but it is not. Moreso if the result is reducing vendor accountability and dis-incentivize doing things right (I’m not saying that’s the case here but it is the usual ending of let’s trust the vendor initiatives).
Posted Nov 5, 2025 22:17 UTC (Wed)
by ebee_matteo (subscriber, #165284)
[Link] (3 responses)
Yes, authenticity against a digital signature.
But trust has to start somewhere. You need to trust the signing keys, or somebody that transitively approved the key, e.g. as a CA.
In other words, you can prove an image was signed against a key, but if I manage to convince you to trust my public key, I can still run malicious software on your machine.
I still haven't seen the problem of supply-chain attacks being solved (by anybody, regardless of the technology employed).
Posted Nov 5, 2025 22:23 UTC (Wed)
by bluca (subscriber, #118303)
[Link] (2 responses)
Yes, and this is a solved problem on x86-64: you trust the vendor who sold you your CPU. You have to anyway, since it's your CPU, and it's silly to pretend otherwise.
Posted Nov 6, 2025 1:32 UTC (Thu)
by Nahor (subscriber, #51583)
[Link] (1 responses)
Not really. It's more like it is an unsolvable problem (or at least impractical to do so) so we choose to stop there.
> you trust the vendor who sold you your CPU
Plenty of people will argue you can't ("blabla manufacturing blabla China blabla" and "blabla NSA blabla backdoor blabla")
Posted Nov 6, 2025 2:46 UTC (Thu)
by intelfx (subscriber, #130118)
[Link]
That's the point of the GP, which I believe you have missed.
If you don't trust your CPU vendor enough to believe that their root of trust implementation is not subverted by your malicious actor of choice, then why would you trust *anything* that comes out of that CPU against the same malicious actor? The only logical choice of action would be to throw the CPU away immediately.
And if you haven't done that, then it necessarily follows that you *do* trust the CPU vendor, so it's fine if they implement a root of trust too.
Posted Nov 6, 2025 8:10 UTC (Thu)
by taladar (subscriber, #68407)
[Link]
Images have quite frankly left me totally unconvinced that those who build them do actually care about security issues enough to even check for open issues, much less rebuild them every single time one gets fixed.
What good is having the authentic image if the image contains a mere few hundred open security holes of various (but not just low) severity?
Posted Nov 6, 2025 7:54 UTC (Thu)
by tomf (subscriber, #113110)
[Link] (1 responses)
I'm reminded of maybe a Dan Walsh quote in which he says SELinux policy complexity reflects an underlying oversharing, and that the SELinux policy files for containers are simple precisely because containers define separate worlds.
Meanwhile, SELinux has prevented some container escapes. So I think of containers and SELinux as being complementary.
I see the author also wrote https://www.redhat.com/en/blog/selinux-mitigates-containe... :)
Posted Nov 6, 2025 7:59 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
But then it turns out that you need to be able to label everything. And propagate the labels. And then have escape hatches from all of that. So pretty much every complex Linux installation ends up with disabled SELinux, some vendors don't even bother with it. Amazon ships their Amazon Linux with SELinux disabled by default.
AppArmor offers comparable security but much simpler policies. Yet it has never gained any traction because it's not complex enough, apparently.
OpenBSD pledge & unveil are also nice
- https://github.com/jart/pledge/
- https://github.com/marty1885/landlock-unveil
Relationship with kernel
Relationship with kernel
However, there are tons of _existing_ interfaces/systems/whatnot that can't really change, as it would be a massive compat break to do so, and an humongous task on top of that, so it is true that we are resigned to e.g. file caps being what they are.
Adding new things is much much easier than changing existing, entrenched subsystems.
capabilities and, er, capabilities
Images are a false simplification
Images are a false simplification
Images are a false simplification
Images are a false simplification
Images are a false simplification
Images are a false simplification
Images are a false simplification
Images are a false simplification
That CPU verifies the firmware signature, which verifies the bootloader signature, which verifies the UKI signature, which verifies the dm-verity signature.
Images are a false simplification
Images are a false simplification
Images are a false simplification
SELinux and containers are complementary
SELinux and containers are complementary
