14 years of systemd
It is a standard practice to use milestones to reflect on the achievements of a project, such as the anniversary of its first release or first commit. Usually, these are observed at five and ten‑year increments; the tenth anniversary of the 1.0 release, or 25 years since from the first public announcement, etc. Lennart Poettering, however, took a different approach at FOSDEM 2025 with a keynote commemorating 14 years of systemd, and a brief look ahead at his goals and systemd's challenges for the future.
He started the talk by reminding the audience what systemd
is, to "bring everybody up to speed
", using the definition
straight from the systemd home
page:
Systemd is a suite of basic building blocks for building a Linux OS. It provides a system and service manager that runs as PID 1 and starts the rest of the system.
Prehistory
Linux has had several predecessors to systemd beginning with System V init. Its design traces back to System V Unix beginning in 1983. The current System V style programs project that is still found in some Linux distributions traces back to 1992, Poettering said, and described it as old and cryptic.
Then came Upstart in 2006. It was an event‑based init daemon developed for Ubuntu by Scott James Remnant, a Canonical employee at the time. Upstart was designed to take the place of the traditional System V init daemon. And it did, for a while. It was adopted by Ubuntu, Fedora, Red Hat Enterprise Linux (RHEL), and others. Ultimately, it went by the wayside and was last updated in 2014, for a lifespan of just eight years.
The question of why distributions didn't stick with Upstart
deserves an answer, he said. One reason is that Upstart didn't
really solve the problem that a service manager, an init system,
should solve. It required "an administrator/developer type
" to
figure out all of the things that should happen on the system "and
then glue these events and these actions together
". It was too
manual, he said. Systemd, on the other hand, allowed users to simply
specify the goal, "and the computer figures out the rest
".
There were other problems with Upstart, Poettering said. He cited
slow development and political barriers due to Canonical's copyright
assignment policy. (LWN covered Canonical's
copyright assignment policy in 2009.)
Babykit
Poettering and Kay Sievers hashed out the basic ideas for what was
initially called "Babykit" on a flight back from the Linux Plumbers
Conference in 2009. He said the original name reflected the trend in
that era of naming daemons something‑kit. "I think it's a thing
that we imported from Apple.
" Most of those have died out,
he said, though we still have Polkit and PackageKit. He and
Sievers wanted to do a proper open‑source project with development in
the open, no copyright assignment, using the LGPL.
From the start it was more than just an init system, more than just
PID 1, he said. For example, part of the goal was to handle booting on
modern Linux systems, which is "a series of processes that need to
happen
". Right from the beginning, it was more than one
binary. People complain that systemd suffers from not‑invented‑here
(NIH) syndrome, "and sure, to some degree, everyone is victim to
this
," he said. "But, I really try to do my homework
."
By "homework
", he meant studying what other projects
do and what the status quo is before systemd implements
something. "We want to have good reasons why we do it
differently
". System V init and Upstart were influences on systemd,
because that was what Linux distributions were actually using. But
Apple's launchd
was much more interesting, he said. One feature that they loved, in
particular, was its socket
activation concept. A similar concept existed in the internet service daemon
(inetd), which was a standard component of most Linux and Unix‑type
systems at the time. But, he said, Apple pushed it to its limits, which got rid of "to some degree, explicit dependency configuration
".
Solaris's service
management facility (SMF) was another major influence "because
it has all this enterprisey stuff, and we wanted to go for
enterprise
". Systemd also has some original thoughts—but just a
couple of them.
I mean, I wouldn't claim that the concepts that Systemd is built from are purely ours. They are not. We looked at what's there and then tried to do maybe a little bit better at least.
Another major influence, of course, is Unix. But, Poettering said,
he didn't really know what Unix is and went on a bit of a
philosophical tangent about whether Linux and systemd were
"Unix". Ultimately, he concluded, "maybe in some ways. In other
ways, probably not
".
"The world runs on it"
Fedora was the first major Linux distribution to switch from
Upstart to systemd, in the Fedora 15 release back in
2011. Poettering said that it was a big win for systemd to be used by
default, and that it was the goal from the beginning to make systemd
something that would have mainstream use. Arch Linux and openSUSE
followed Fedora's lead a year later, and then RHEL 7 included
systemd when it was released in 2014. "So that was when it started,
like the whole world started running on systemd
". The whole world,
excepting Debian and Ubuntu. Those distributions moved to systemd in
2015, which he said was "the most complex win
" for the
project.
![[systemd logo]](https://static.lwn.net/images/2025/systemd-dark.png)
For all its success, systemd did not have a logo until Tobias
Bernard designed one and released it in 2019. Now systemd has its own
brand page, and a color scheme
that includes "systemd green". The project started its own conference
in 2015, originally called systemd.conf, which expanded its focus
beyond systemd and became All Systems Go! in
2017. Poettering put in a plug for this year's
All Systems Go!, and suggested "you should totally go
there if you're interested in low‑level operating system kind of
stuff. User space only, though.
"
Poettering asked: so, where are we today? All major Linux
distributions use systemd, "in particular, the commercial ones all
default to it
", and this basically means "the world runs on
it
".
The project has a vibrant community, he said. It consists of six
core contributors, including Poettering, and 60 people with commit
access. More than 2,600 people have contributed to systemd over the
years. One thing that the project could do better, he said, is to
release more often. Systemd has a six‑month release cycle, which is
"actually not that great, we should be doing better
", but said
that it is difficult doing release management for such a large
project.
Systemd also has "a little bit of funding
" through donations
to Software in the
Public Interest (SPI), and from grants by the Sovereign Tech Agency (formerly
the Sovereign Tech Fund). Poettering said that the project has used
the funding for things that don't interest the core developers, for
example reworking the systemd web site.
How big is systemd?
Today, systemd is a suite of about 150 separate
binaries. Poettering said that the project sometimes gets complaints
that it is too monolithic, but he argued that the project was not
monolithic and was in fact "quite modular
". That doesn't mean
that everything within the project was modular or could be used in any
other context, but "it's a suite of several different things
"
with a central component that keeps everything together. "You can
turn a lot of it off. Not all of it, but a lot
." It is primarily a
C project, Poettering said, with a few exceptions. Some components are
written in Python, and the project has experimented with Rust, but it
is ultimately a C project.
The systemd core team doesn't share the same exact view of things,
he said, but they do agree that systemd is "the common core of what
a Linux‑based OS needs
". It covers the basic functionality, from
login, network name resolution, networking, time synchronization, as
well as user-home-directory management. Every component has found adoption in
some Linux distributions, though each distribution chooses different
parts of systemd to adopt.
He also discussed the footprint of systemd in terms of
lines of code and dependencies. He said that the project comprises
690,000 lines of code, whereas wpa_supplicant
is about 460,000 lines of code, and the GNU C library (glibc) is
more than 1.4 million lines. "Is that a lot, or a little? I don't
know.
" He used to like to compare systemd to wpa_supplicant
because it was roughly the same size as systemd, but in the past three
years "we apparently accelerated
" to outgrow it. But, systemd is
still about half the size of glibc. As far as size on disk, Poettering
said that a full‑blown install of systemd on Fedora was about 36MB,
whereas GNU Bash is about 8MB. If the shell alone is 8MB, he said,
"then it's not that bad
" for systemd to be 36MB.
Poettering said that the project has always been conservative about dependencies, because if it pulls in a library as a dependency then it effectively impacts all Linux users. The project had switched to using dlopen() for all of the dependencies that were not necessary to run the basic system about two years ago. For those who don't know, he said, dlopen() is where a shared library is not loaded until it is absolutely required.
He gave an example of using a FIDO key with
full‑disk encryption. Many people use FIDO but many others
don't. By using dlopen() users can still have full‑disk
encryption bound to something else without having to have the FIDO
stack installed. "We pushed the dlopen() thing so far that
there are only three really required dependencies nowadays
",
Poettering said: glibc, libmount, and libcap. Systemd has three
build‑time optional dependencies as well: libselinux, libaudit, and
libseccomp.
In the wake of the XZ
backdoor, Poettering started pushing
others to take the dlopen() approach too. Systemd was not
the target for the backdoor, but because some distributions linked
the SSH daemon against libsystemd, which then pulled in liblzma, it
was used to prop open that door. "So,
back then, this was not a dlopen() dependency. That's why
this happened.
" Systemd was not at fault, he said, but "maybe
we can do something about it
." (LWN covered this topic in
2024.)
In summary, he said, systemd is not that big, and not tiny
either. It is suitable for inclusion in initial ramdisks (initrds) and
containers. "Systemd is not going to hurt you very much,
size‑wise
".
What belongs in systemd
Poettering admitted there is some scope creep in systemd, but said that the project does have requirements that make clear what belongs (and what does not) in systemd. The first of those is that it needs to solve a generic problem, not just a problem for one user. It needs to be foundational and solve a problem for a lot of users.
Another rule of thumb is that "it needs to have a future
"
and he said that the project is not going to add support for
legacy technologies. The implementation needs to be clean and follow a
common style, as well. "You can deliver products quickly if you cut
a lot of corners, but that's not where we want to be with
systemd.
" Even if something checks all the boxes, he said, it
doesn't mean that it needs to be in systemd. It can still be
maintained elsewhere. A technology also needs to fit into systemd's
core concepts.
There is not a single list with the project's core concepts written
down, Poettering said. It is possible to distill them from systemd's
man pages and specifications, but "I can't give you the whole
list
". He did, however, provide a number of examples. The first of
those is the clear separation of /etc, /run, and
/usr; /etc is for configuration, /run is
for "runtime stuff that is not persistent
", and /usr
is for "the stuff that comes from the vendor
". This is a
separation that was not traditionally followed so closely on Linux, he
said.
Hermetic
/usr is another concept that Poettering said that systemd
is trying to push. In a nutshell, that means that /usr has a
sufficient description of a system to allow it to boot, even without
/etc, /var, and so forth. "It basically means
that you can package up the whole of /usr, drop it on another
machine, we'll boot up
" and it will just work. He did not want to
go into great detail on each concept, he said, but to give an example
of how concepts "spill into everything else we do
".
Another concept Poettering mentioned is that everything systemd
does needs to have declarative behavior. "You just write down where
you want to go. You don't write down code
". For example, he said,
boot should not involve running a shell script, because shell scripts
are inherently imperative. That is not the way things should be
configured. There are more concepts that systemd has, but the point is
that having these concepts in place permits systemd to do things that
were hard to do before, such as the options
ProtectSystem= and ProtectHome= which provide
"high‑level knobs
" for sandboxing.
Systems also need standards, he said, and the project tries to set
standards by writing down specifications of how it does things "in
a generic way
". And systemd consumes a lot of standards, too, such
as /etc/os-release
which is now used by most Linux distributions and BSD‑based
operating systems. Poettering said that the project has even created a
web site for standards, the Linux
Userspace API (UAPI) group, where systemd people "and people
close to us
" are invited to put specifications. The
discoverable
disk image (DDI) specification, which provides a method for
self‑describing file system images that may contain root or
/usr filesystems for operating-system images, system
extensions, containers, and more, is one example.
The future
Poettering was running out of time when he reached the slides for systemd's goals and challenges for the future. He was careful to note that the goals and challenges he outlined were from his point of view and that others on the systemd team may have different priorities. For his part, he sees four goals and challenges for systemd.
The first goal is to implement boot and system integrity, so that
it is harder to backdoor the system. Not impossible, but
harder. Basically locking down the system to keep attackers out, and
having a well-known state for a system to return to where "you know
there's not going to be anyone inside there because you can prove
it
". If someone attacks a server, they can be removed from it
because the server can be returned to a defined state and it can be
updated in a safe way. "In other words, you don't have to always
sleep with your laptop under your pillow
" because someone might
modify the bootloader.
Poettering said that "all big OSes
" have boot and system
integrity addressed in one way or another. But none of the "generic
distributions
" have adopted it by default. It is a sad situation
that matters a lot, he said. One obstacle to implementing boot and
system integrity is that "it makes things more complex because you
need to think about cryptography and all these kinds of
things
".
Cultural issues are another obstacle, and those are in part founded
on FUD,
he said, such as the idea that the Trusted Platform Module (TPM) is
all about digital rights management (DRM) that "takes away your
computers
". On the contrary, the way that TPMs are designed
"are actually very much compatible with our
goals
". Package‑based systems also make things more
complicated than they need to be, but "we live in a
package‑based world
" since all major Linux distributions use
packages by default.
Goal number two is rethinking systemd's interprocess communication (IPC),
specifically moving away from D-Bus toward varlink. (LWN covered varlink in
systemd v257 in December.) While D-Bus is "never going away
",
Poettering said that varlink allows processing IPC requests in one
service instance per connection, which makes it easier to use.
Writing D-Bus daemons is hard, he said, but it is easy
to bind a command to an Unix stream socket using systemd's
socket activation to turn it into a varlink IPC service.
The third thing on Poettering's list was a challenge, which is
Rust, sort of—he made the case that systemd is doing pretty well
with C, at least vulnerability-wise. If one accepts CVEs as a metric,
anyway. There were three CVEs against systemd in 2023, and none in
2024, and its CVEs were "mostly not memory-related
". Even so,
he said, "we do think that the future probably speaks
Rust
".
But, systemd has a complex build and many binaries and tests. It is
not a fit for Cargo, and the Meson build system currently used by
systemd did not like Rust,
though it has gained some Rust functionality recently. And systemd is
sensitive to footprint issues, which is why it is heavily reliant on
shared libraries. Static linking for 150 binaries is not an option,
but "dynamic libraries in Rust are not there
". Shared libraries
need to be a first-class citizen. Ultimately, Poettering said that he
is "happy to play ball
" but does not want systemd to be the
ones to solve the problems that need to be solved in order to use
Rust. He mused that there might be some competition between Rust and
Zig to deliver a memory-safe language that can provide stable shared
libraries, including dlopen(), support for hybrid code bases,
and more.
Fourth and finally, but far too briefly, Poettering said that the
last challenge for systemd was about image-based operating systems and
"let's leave it at that
". The slide for the presentation went
slightly farther and included a call for pushing the Linux ecosystem
away from package-based deployments to image-based
deployments. It also recommended mkosi for building bespoke images
using package managers.
Poettering had time for a few questions. The first question was
whether systemd would eventually replace GRUB. Poettering said that he
was "not a believer in GRUB, as you might guess
", and that all
the pieces were there to replace it. The remaining problems were
political. GRUB tries to do too much, and most of those things are a
mistake, he said. Most distributions could switch to systemd-boot if
the focus is EFI only.
Another audience member asked what the plans were to "fix the
issues with resolved
", systemd's DNS resolution daemon. Poettering
said that it works fine for him, but he was aware some people had
problems with the "fancier features
" such as DNSSEC "and
that's flaky
" because DNSSEC is "really hard in real life
because servers are shit
". He suggested that the audience member
file a bug.
The video and slides for the keynote are now available from the talk page on the FOSDEM web site. The keynote was one of four talks Poettering gave during FOSDEM, and all of the talks are now available online for those who would like a deeper dive into specific systemd features.
[I was unable to attend FOSDEM in person this year, but watched the talk as it live‑streamed. Many thanks to the video team for their work in live‑streaming all the FOSDEM sessions and making the recordings available.]
Index entries for this article | |
---|---|
Conference | FOSDEM/2025 |
Posted Feb 17, 2025 17:06 UTC (Mon)
by josh (subscriber, #17465)
[Link]
Before systemd, different distributions put the hostname in different places. Now, they all use /etc/hostname. Likewise for various other things.
Posted Feb 17, 2025 17:11 UTC (Mon)
by bluca (subscriber, #118303)
[Link] (6 responses)
Posted Feb 17, 2025 17:23 UTC (Mon)
by jzb (editor, #7867)
[Link] (4 responses)
Posted Feb 17, 2025 23:22 UTC (Mon)
by k3ninho (subscriber, #50375)
[Link] (2 responses)
To Mr Bocassini, I apologise, Sir, for neither attending nor protesting, it's my aim in 2025 to be more inconvenient -- as protest should be -- but for now I'm only commenting online.
K3n.
Posted Feb 18, 2025 0:27 UTC (Tue)
by koverstreet (✭ supporter ✭, #4296)
[Link] (1 responses)
Posted Feb 18, 2025 0:58 UTC (Tue)
by jengelh (guest, #33263)
[Link]
Posted Feb 19, 2025 10:29 UTC (Wed)
by IanKelling (subscriber, #89418)
[Link]
Posted Feb 18, 2025 6:44 UTC (Tue)
by zdzichu (subscriber, #17118)
[Link]
While during the early years I've eagerly read each new paragraph of NEWS, after ~2015 I found myself caring less and less. None of the later features around images, TPMs, sealed systems, homed etc. are relevant to me.
Coincidentally, a decade ago Kubernetes appeared, although it was still 2-3 years from being universally usable. I find K8s appearance curious coincidence, but not related to systemd getting bland.
FTR, I'm one of the 2,600 crowd.
Posted Feb 17, 2025 17:19 UTC (Mon)
by alogghe (subscriber, #6661)
[Link] (69 responses)
Rebooting an entire system image because some minor set of lib* that 99% of the system isn't using seems like a bad outcome and something the big vendors just do because reasons.
I realize systemd might not want to get into the "package" world but squashing or interfering with the developments that are and need to occur in this world, by being whole system image focused, seems like a poor path.
We could instead work toward a world of directed graphs of individually signed components, where updates could trigger targeted restarts or refresh signals only where needed.
Generally this image focus seems like its solving 2012's problems.
Posted Feb 17, 2025 17:43 UTC (Mon)
by DemiMarie (subscriber, #164188)
[Link] (17 responses)
Rebooting needs to be cheap, because one needs to reboot weekly for kernel updates. If there is a service that must be kept running, the solution is a high-availabilty cluster, not a single machine with high uptime.
Posted Feb 17, 2025 18:02 UTC (Mon)
by NightMonkey (subscriber, #23051)
[Link] (7 responses)
Also, just a +1 on creating a HA cluster rather than relying on single OS instances.
Posted Feb 17, 2025 18:08 UTC (Mon)
by bluca (subscriber, #118303)
[Link]
Posted Feb 17, 2025 20:21 UTC (Mon)
by ferringb (subscriber, #20752)
[Link] (4 responses)
Basically CoreOS underlying image updates, just done w/out a reboot, and the new FS should be able to have integrity controlls leveled based on keys in the kernel already.
Posted Feb 17, 2025 20:58 UTC (Mon)
by bluca (subscriber, #118303)
[Link] (3 responses)
Posted Feb 17, 2025 21:05 UTC (Mon)
by ferringb (subscriber, #20752)
[Link]
Posted Feb 20, 2025 8:47 UTC (Thu)
by MKesper (subscriber, #38539)
[Link] (1 responses)
Posted Feb 20, 2025 9:57 UTC (Thu)
by bluca (subscriber, #118303)
[Link]
Posted Feb 19, 2025 16:53 UTC (Wed)
by Mook (subscriber, #71173)
[Link]
Posted Feb 17, 2025 18:05 UTC (Mon)
by alogghe (subscriber, #6661)
[Link] (3 responses)
So the cloud build systems go off and make a pile of stuff that you "image sign".
Can the user reproduce the contents or its just "yes this is the same pile of stuff that some cloud something made"? Answer is no the user cannot reproduce it.
So everyone needs to run a cluster to keep up with this image based rebooting idea? What a world.
We need systems at the fs-verity and overlayfs layer for this.
Posted Feb 17, 2025 18:51 UTC (Mon)
by mbunkus (subscriber, #87248)
[Link]
If your business case can tolerate a 2min downtime for reboots each weak, you do not need a cluster.
If, on the other hand, your business case cannot tolerate that small a downtime, then you do need a cluster, but not just for the reboots, but for regular operation 'cause applications & hardware can & do fail outside of scheduled maintenance windows, too.
Posted Feb 17, 2025 21:57 UTC (Mon)
by mezcalero (subscriber, #45103)
[Link] (1 responses)
Most "mkfs" tools nowadays accept file system trees as input that shall be placed into the fresh file system in a reproducible way.
"mkosi" built disk images are reproducible, by most definitions of the term (though not by all).
Lennart
Posted Feb 19, 2025 7:50 UTC (Wed)
by marcH (subscriber, #57642)
[Link]
I've fixed many build reproducibility issues and I hate the wrong "boolean" impression made by the term "reproducible". Build reproducibility issues are exactly like regular bugs: they do not always trigger; it depends on what you do. It depends on what you're trying to build and how, which corner cases you hit, etc. So saying "building X is reproducible" makes as little sense as saying "software X has zero bug". Some users of software X will indeed observe zero bug while other people using it differently will find a lot more. It's the same with build reproducibility: some people trying to reproduce the same binary will succeed, while others trying to reproduce another binary from the very same sources configured and used differently will hit some reproducibility issue(s) and fail.
I don't think this is a matter of "definition". It's the misunderstanding that build reproducibility is either true or false when it's neither. It's a "just" a type of bugs.
Posted Feb 18, 2025 12:43 UTC (Tue)
by spacefrogg (subscriber, #119608)
[Link] (4 responses)
While signing images sounds cool, it falls apart immediately when you realise that your trust in the contents of said image stem from mere hearsay. There is usually no cryptoraphic connection between the image signature and the package signatures that this image was made from. Just the belief that somebody might have used them, only them, and used them correctly, in the right order etc.
With nix and guix you know exactly what inputs where used to construct a system X and you can verify it. You can have hourly updates if you want and can even predict if a reboot is necessary. And of course every package is cryptographically signed.
You don't have that in image world. You're not even close to it.
Posted Feb 18, 2025 17:27 UTC (Tue)
by bluca (subscriber, #118303)
[Link] (3 responses)
Of course you can have that. Once again, runtime integrity and build supply chain security are orthogonal problems with different and independent solutions.
Just because all your packages are verified it doesn't stop an attacker with execution privileges from running their own code.
Posted Feb 19, 2025 9:07 UTC (Wed)
by taladar (subscriber, #68407)
[Link] (2 responses)
No PDF or Postscript, no browser with Javascript, no Office documents, no plugins or extensions,...
Posted Feb 19, 2025 9:49 UTC (Wed)
by bluca (subscriber, #118303)
[Link]
Posted Feb 19, 2025 11:56 UTC (Wed)
by pizza (subscriber, #46)
[Link]
...no vector fonts...
Posted Feb 17, 2025 18:00 UTC (Mon)
by bluca (subscriber, #118303)
[Link] (23 responses)
Posted Feb 17, 2025 18:16 UTC (Mon)
by alogghe (subscriber, #6661)
[Link] (11 responses)
This image based focus just punts these problems.
Posted Feb 17, 2025 18:19 UTC (Mon)
by bluca (subscriber, #118303)
[Link] (10 responses)
Posted Feb 17, 2025 18:29 UTC (Mon)
by alogghe (subscriber, #6661)
[Link] (9 responses)
They aren't separate concerns whatsoever.
https://reproducible-builds.org/
Posted Feb 17, 2025 19:20 UTC (Mon)
by bluca (subscriber, #118303)
[Link] (8 responses)
Once again, these are separate things, solving different problems.
Posted Feb 18, 2025 5:47 UTC (Tue)
by PeeWee (guest, #175777)
[Link] (7 responses)
But doesn't this then come dangerously close the DRM FUD? IIRC that is basically the main concern with the "trusted computing" paradigm and TPM: the vendor can prevent users from running their own non-signed code; who holds the signing key(s)? And if users can sign their own code, wouldn't that give an attacker the same capabilities? Or is this what Lennart meant by saying that TPM makes such attacks harder but not impossible?
Posted Feb 18, 2025 7:48 UTC (Tue)
by mjg59 (subscriber, #23239)
[Link] (3 responses)
Posted Feb 18, 2025 7:59 UTC (Tue)
by PeeWee (guest, #175777)
[Link] (2 responses)
Posted Feb 18, 2025 10:44 UTC (Tue)
by ferringb (subscriber, #20752)
[Link]
This is a good thing.
I think folks are confusing TPM with TEE. See https://docs.bunny.net/docs/widevine-drm-security-levels ; the first two levels, the end user can fully control their OS since hardware decoding + TEE keeps the decrypted content out of the users reach. Level 3- software decode, meaning the host OS *could* grab content, that requires that the OS be something signed and trusted by the DRM vendor (windows, for example). That requires secureboot (TPM) + boot chain validation. The user doesn't control the the critical parts of the OS in that scenario since they don't control the UEFI keys.
Linux wise, end users control the UEFI keys (exempting crappy laptop vendors like razer, where you have to use the shim). If you control those keys, you control the OS/userland- in full. Your distro cuts a new kernel, you validate it; if you wish to boot it, then (for UKI) you sign the UKI using the UEFI keys you control, thus allowing the hardware to boot it. Again- the keys *you* control.
Avoiding TPM has no relevance to DRM for everything but level3, and level3 is pretty much never going to happen in the linux world in my view; no content owner would trust a distro to manage the OS and kernel to their satisfaction, IMO.
TPM usage for the OS is a good thing. Any boot chain that isn't secureboot validated means someone else can swap in their own bootloader putting the kernel and OS under their control. Disk unlock that isn't based on hardware and system measurement has similar problems; "enter a password to unlock" is only as safe as the attestation of the software leading up to that point. No validation, no safety against "evil maid" type crap.
Note, hardware level attacks are a whole other beast, but if a 3 letter agency is after you, they're going to get in if they consider the effort worth it. :)
I could be wrong on a few particulars, but the broad strokes, that should be accurate.
Posted Feb 18, 2025 16:27 UTC (Tue)
by mjg59 (subscriber, #23239)
[Link]
Posted Feb 18, 2025 10:17 UTC (Tue)
by gioele (subscriber, #61675)
[Link]
It is, unfortunately. Especially things like remote attestation. I love the idea of having the cryptographic-baked certainty that I'm SSHing into my own box, running my own software, unmodified. At the same time a browser could use remote attestation to let sites know that I'm not running ublock.
It's a situation similar to the HTTPS/TLS one: protects your data, but makes it hard to spot malware and data exfiltration.
Posted Feb 18, 2025 11:19 UTC (Tue)
by bluca (subscriber, #118303)
[Link]
Very much not. The _owner_ of the machine is in control of what software runs, as the chain of trust of the integrity checks goes back to the DB and MOK UEFI lists of certs. So the owner is in control.
Posted Feb 18, 2025 14:02 UTC (Tue)
by mezcalero (subscriber, #45103)
[Link]
So, it's really about policies you the user defines, with keys you the user owns, and it prohibits nothing.
The DRM thing FSF keeps mentioning is made up, noone would bother with the TPM for that (there are better ways for people who care about DRM to enforce DRM [e.g. your video card], really no need to involve a TPM with that).
Posted Feb 17, 2025 19:11 UTC (Mon)
by champtar (subscriber, #128673)
[Link] (10 responses)
Posted Feb 17, 2025 19:18 UTC (Mon)
by bluca (subscriber, #118303)
[Link] (9 responses)
Posted Feb 17, 2025 23:37 UTC (Mon)
by champtar (subscriber, #128673)
[Link] (8 responses)
Regarding the verification, my understanding of composefs is that you have a file with an erofs partition containing all the metadata, and this file is also fs-verity protected, so once you have verified it and mounted it you should be good.
If you have time to detail why it's wide open, or have extra reading for me I would really appreciate (I'm a simple user but I enjoy learning on those subjects)
The big advantage of the file based approach is that you don't have to right size your A/B partitions, and you can keep more than 2 versions for cheap.
Posted Feb 18, 2025 0:57 UTC (Tue)
by bluca (subscriber, #118303)
[Link] (7 responses)
Sure but that's orthogonal - any writable data storage needs encryption at rest anyway as that's just table stakes, and LUKS2 can cover that just fine
> Regarding the verification, my understanding of composefs is that you have a file with an erofs partition containing all the metadata, and this file is also fs-verity protected, so once you have verified it and mounted it you should be good.
The first problem with that is that it's already too late once the partition is opened, as filesystem drivers are not resistent against malicious superblocks. The second problem is that again, that's just a one-time setup integrity check. There's nothing stopping anything with privileges in the same mount namespaces from replacing that mount with something else entirely, and then it's game over. ostree is entirely implemented in userspace, so it's defenceless against such attacks, because it's designed with different use cases and different threat models in mind (which is fine - nothing can do everything at once). dm-verity on the other hand has its cryptographic integrity enforced by the kernel, and when combined with the IPE LSM that was added in v6.12 it can enforce that this doesn't happen, and everything that is executed really comes from a signed dm-verity volume.
Posted Feb 18, 2025 4:43 UTC (Tue)
by champtar (subscriber, #128673)
[Link] (6 responses)
I don't see a difference between having the dm-verity signature checked by the kernel, and having the composefs erofs file checked from user space in the initramfs (assuming the FS is LUKS protected), it's just different pieces of code in a UKI no ?
Using LUKS + fs-verity you might have more overhead than just dm-verity but that's a trade-off to have more flexibility around partition sizing / file deduplication I guess.
I see you can use IPE with fs-verity, don't know with composefs. As I use containers not sure I'll be able to use IPE anytime soon anyway.
Posted Feb 18, 2025 10:55 UTC (Tue)
by bluca (subscriber, #118303)
[Link] (5 responses)
That's for offline protection. LUKS doesn't help you against online attacks, which apply to images downloaded and executed. This is of course not a problem for the data partition, as you don't download that from the internet, it's created locally on provisioning. Every image that gets downloaded from the internet and used for payloads need to have its integrity checked and enforced before being used, as drivers are not resilient against intentionally malformed superblocks.
> I don't see a difference between having the dm-verity signature checked by the kernel, and having the composefs erofs file checked from user space in the initramfs (assuming the FS is LUKS protected), it's just different pieces of code in a UKI no ?
The difference is huge: in the former case integrity is enforced at _all times_ by the kernel, at runtime, so it's resilient against online attacks. In the latter case, integrity is only checked _once_ during boot, and never again. Again, the use cases are different: in the first case security checks are done always, in the latter case they are there for offline protection of data at rest. The second model is strictly weaker. Which might be fine, mind you - as always it depends on the threat model.
Posted Feb 20, 2025 15:37 UTC (Thu)
by alexl (subscriber, #19068)
[Link] (4 responses)
>The difference is huge: in the former case integrity is enforced at _all times_ by the kernel, at runtime, so it's resilient against online attacks. In the latter case, integrity is only checked _once_ during boot, and never again. Again, the use cases are different: in the first case security checks are done always, in the latter case they are there for offline protection of data at rest. The second model is strictly weaker. Which might be fine, mind you - as always it depends on the threat model.
This is completely wrong. Composefs checks at initrd that the fs-verity digest of the composefs image is the expected value, similar to how you would verify the root dm-verity digest of the rootfs block device.
After that, each file access is validated by fs-verity at runtime. This includes reads of the EROFS image that has the metadata, as well as the backing files. And the expected fs-verity digests for the backing files are recorded in the EROFS image, and validated each time a backing file is opened.
The only difference between dm-verity and composefs is that dm-verity validates at the block level, and composefs verifies at the file level. This means that an attacker could modify the filesystem at the block level to "attack" the kernel filesystem parser. However, this difference is very small in practice. Only in a system using dm-verity for a read-only root, and *no* other filesystems does it make a difference. The moment you mount a filesystem for e.g. /var, then an attacker could just as well attack that. Any "solution" to that, such as dm-crypt would also apply to the composefs usecase.
Posted Feb 20, 2025 16:03 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (3 responses)
No, that is completely wrong. With signed dm-verity + IPE the enforcement is done by the kernel on every single binary or library being executed, at any time, at runtime, forever. That's the point of code integrity.
You cannot do that with composefs, because the security model is different and fully trusts userspace. So a userspace process that has escalated privileges can simply replace the composefs mounts with anything of their choosing, and run whatever they want. A userspace attacker that has escalated privileges on a dm-verity system cannot do that, as they would need a signed volume and they do not have access to a private key trusted by the kernel keyring, so taking control of userspace is not enough, you also need to take control of the kernel, which is much harder.
Posted Feb 21, 2025 15:17 UTC (Fri)
by alexl (subscriber, #19068)
[Link] (2 responses)
However, I think IPE is only useful for setups that are extremely locked down. For example, the second you have some kind of interpreter (like bash or python) available you can run essentially arbitrary code anyway. For any kind of more generic system that can run code more dynamically it will not be applicable. For example, you could never use it on a personal laptop or a generic server.
Posted Feb 21, 2025 16:16 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (1 responses)
No, that is not possible, not even theoretically, because the chain of trust MUST go up into the kernel for this to work at all. composefs mounts do not do this, by design, trust is verified in userspace. It's not a matter of implementing it, it needs to be redesigned, and mutability and integrity are essentially at opposite ends of the spectrum. Pick one or the other according to the use case, but you can't have both.
> However, I think IPE is only useful for setups that are extremely locked down.
Yes, for sure, code integrity is for cases where security requirements are stringent, and not for a generic laptop or so.
> For example, the second you have some kind of interpreter (like bash or python) available you can run essentially arbitrary code anyway.
This is being worked on, the brand new AT_EXECVE_CHECK option in 6.14 is exactly intended to allow interpreters to plug those use cases. It's absolutely true that interpreters are not covered right now, but (fingers crossed) there should be something usable for at least a couple of interesting interpreters this year if all goes according to plans.
Posted Feb 24, 2025 12:57 UTC (Mon)
by alexl (subscriber, #19068)
[Link]
I don't see exactly how this would be impossible. I mean, obviously it would require some design and implementation work in the kernel to do the kernel-side signature validation, but nothing fundamentally impossible. The main blocker is that I don't currently consider this a critical feature.
Posted Feb 17, 2025 18:38 UTC (Mon)
by epa (subscriber, #39769)
[Link] (21 responses)
Suppose the new version of libfoo fixes an important security hole. It is not enough to install it and make sure any newly started processes get the new code. Somehow you need to track down and restart existing processes using the vulnerable library, or, even trickier, arrange for them to unload the old library and dynamically link in the new one. I am sure such a scheme is possible in principle. I just don’t think it exists right now.
We scoff at the Windows users with their reboots, when a Linux system can keep running smoothly even after upgrading something as fundamental as libc. But when you stop to consider what the upgrade is for, perhaps the stupid approach is the right one after all.
Posted Feb 17, 2025 19:30 UTC (Mon)
by mussell (subscriber, #170320)
[Link] (7 responses)
Posted Feb 17, 2025 19:32 UTC (Mon)
by bluca (subscriber, #118303)
[Link] (4 responses)
Posted Feb 18, 2025 5:19 UTC (Tue)
by mirabilos (subscriber, #84359)
[Link]
Posted Feb 18, 2025 8:33 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (2 responses)
So if I have a problem with my graphical session, it's okay to boot my wife or daughter unexpectedly off the system? I thought a graphical session was user-space, certainly conceptually, and there's no reason why there shouldn't be multiple real people with multiple different graphical sessions all using the same computer ???
Cheers,
Posted Feb 18, 2025 11:00 UTC (Tue)
by bluca (subscriber, #118303)
[Link] (1 responses)
Posted Feb 18, 2025 11:05 UTC (Tue)
by geert (subscriber, #98403)
[Link]
Posted Feb 17, 2025 19:36 UTC (Mon)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Feb 18, 2025 15:52 UTC (Tue)
by raven667 (subscriber, #5198)
[Link]
Posted Feb 17, 2025 20:01 UTC (Mon)
by wtarreau (subscriber, #51152)
[Link] (9 responses)
Yes and no at the same time. For many users, the cost of a (quick) reboot is nothing compared to doubting. Images allow you to deliver a version of a complete system, and users update from one version to another. This also means less variations for support teams and easier to track regressions.
We've been doing this at haproxy in our appliances for the last 20 years now and it's much appreciated from users, support and developers because everyone knows what you're running.
Those who don't like it are experienced admins who would like to tweak the system a bit more, install their own tools etc. Then it becomes more complicated, even if there are R/W locations making this possible. For example installing a tool that depends on libraries not on the base system is harder to use (need to play with LD_LIBRARY_PATH etc). There are possible hacks consisting in using mount --bind to replace some binaries on the fly etc, so you're not completely locked down by default, but clearly for the vast majority of end users, knowing that they're only running the versions they're expected to run is quite satisfying. Also you end up with an extremely stable system that never fails to boot, because you don't install random stuff on them nor can you break package dependencies. It even permits smooth upgrades because you can enumerate everything that has to be handled and can afford to automatically migrate configurations. I tend to like this approach, I'm even using it for infrastructure components at home (firewall, proxies etc), because there's no degraded situation, it's either the good previous image or the good next one.
The important thing is to have a fast booting hardware, and with heavy UEFI bioses that decompress an entire operating system in RAM these days, we're slowly going away from this. 10 years ago, our system was fully up in 12s from power up. Nowadays it's more like 30-40s. In any case that only counts in case of double power outage (extremely rare), because normally each device is supposed to be backed up by another one which instantly takes over. And with some mainstream servers it can count in minutes and then I sense how it can be too long for many users for just a library update!
Posted Feb 17, 2025 20:15 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (2 responses)
Out of interest, is kexec at a point where it saves you time? Or is it not yet good enough for your needs when swapping an image out?
Posted Feb 18, 2025 22:24 UTC (Tue)
by bearstech (subscriber, #160755)
[Link] (1 responses)
Linux+systemd boots (and most importantly shuts down) faster and faster, while shitty firmwares boot slower and slower. Those kexec and soft-reboot features are sorely needed.
Posted Feb 19, 2025 16:31 UTC (Wed)
by raven667 (subscriber, #5198)
[Link]
Posted Feb 17, 2025 20:24 UTC (Mon)
by alogghe (subscriber, #6661)
[Link]
Real Users have complex workflow and program needs.
They have large trees of libraries and binaries not in the default image and they deserve full support and working systems that support that use.
This systemd image idea is rooted in appliance based thinking and isn't appropriate for large numbers of real users.
Posted Feb 17, 2025 22:34 UTC (Mon)
by mezcalero (subscriber, #45103)
[Link] (4 responses)
Moreover, note that there are things like soft-reboot these days, which are really fast. And there's a perspective for even more cool stuff: for example I think we should to make systemd-nspawn + portable services systematically ready to just stay running over soft reboot. Since they run off their own disk images anyway this should be relatively straigt-forward to implement.
Posted Feb 18, 2025 0:12 UTC (Tue)
by fraetor (subscriber, #161147)
[Link] (3 responses)
Runtime introspection, the ability of an admin to poke at the state of the system while it is running to gain a better understanding of its behaviour, meshes nicely with being able to reset an image-based OS back to a known good state, but I'm not sure how one would go about it.
Local admin modifications are changes to the vendor supplied files that fix site-specific issues, such as adjusting hard coded timeouts, including additional codecs, that kind of thing. I think sysexts are meant to be the solution here, but I'm not clear how easy they are for a local admin to actually create on the fly.
How are these use cases meant to be addressed in a systemd image-based OS? Am I missing something from these various extension/portable services?
Posted Feb 18, 2025 14:09 UTC (Tue)
by mezcalero (subscriber, #45103)
[Link] (2 responses)
And regarding introspection: if you want a debug shell to debug things, then nothing is stopping you from getting one with the usual mechanisms. just because your /usr/ is immutable doesn't mean you cannot get a root shell after you authenticated.
Posted Feb 18, 2025 17:44 UTC (Tue)
by fraetor (subscriber, #161147)
[Link] (1 responses)
Posted Feb 18, 2025 17:54 UTC (Tue)
by bluca (subscriber, #118303)
[Link]
Posted Feb 18, 2025 6:20 UTC (Tue)
by PeeWee (guest, #175777)
[Link]
I believe what you are looking for are tools like `needrestart` and `checkrestart`. Both exist in the Debian derived distros, e.g. Ubuntu. IIRC `checkrestart` is the predecessor of `needrestart`. The former lives in the debian-goodies package, which is optional, at least in Ubuntu, and the latter comes in its own package, which should be installed automatically, if I am not mistaken. And as a last step of an upgrade run (`apt upgrade`), `needrestart` gets executed, finds everything that is still using old, now deleted, libs and can even identify the services that need to be restarted and does so for most but not all of them, as there are some that would have surprising consequences when being restarted, e.g. systemd-logind. Also any programs that do not belong to a service will be identified, but the restarting must be done manually. This will also inform you, if a reboot is necessary, i.e. newer kernel version. And if there are services that cannot easily be restarted a `systemctl soft-reboot` may be the quicker option, provided no kernel reboot is necessary anyway.
I believe all this is done by leveraging `lsof` and checking which of the libs cannot be found in the filesystem. IIRC that was the approach when `checkrestart` was still in its infancy. And I cannot imagine that this kind of tooling is only available in the Debian world, even though I think it was pioneered there.
Posted Feb 18, 2025 11:03 UTC (Tue)
by pabs (subscriber, #43278)
[Link]
Posted Feb 19, 2025 10:43 UTC (Wed)
by IanKelling (subscriber, #89418)
[Link]
But it isn't a guarantee in the face of malicious code, so to be more accurate, it prevents a certain kind of bug, and so, if you've been updating your debian based system without being aware of this kind of bug for the last 15+ years, why exactly is it worth rebooting a whole lot more? I'm sympathetic to the parent poster.
Posted Feb 17, 2025 21:53 UTC (Mon)
by mezcalero (subscriber, #45103)
[Link] (4 responses)
1. I am pretty sure it's essential to learn from memory management that W^X is a *good* thing, and apply it file systems as a whole: a file system should *either* be writable, *or* allow executable inodes on it. The combination is always risky business. An image-based system can provide that: you cryptographically guarantee the integrity of the whole FS with everything on it via dm-verity and signing, which systems such as IPE then can hook into. A system such as nix/guix/ostree inherently does not, they maintain repositories of random inodes, both writable and (often) executable.
2. There must always be a cryptographically enforced way to return to a known good system state, i.e. flushing out local state, returning to vendor state. nix/guix/ostree generally don't support that, they cannot put the fs back into a known good state, because they modify it on the fs level, not on the block level, and they cannot just flush out the local fs modifications wholesale.
So, I think we can do better (and must do better) than nix/guix/ostree for building a secure OS, in particular in a networked world, where you know that sooner or later any software will be exploited, and hence it's so important to make it impossible to sneak executable code in, and if it happens anyway you know how to get it out fo there again.
Now, of course, you can have another security model in mind than I do, but you know the magic "reproducibility" thing is just *one* facet of security, and of course DDIs (i.e. the dm-verity disk images we like to thin in( are as reproducible as anything done by guix/nix/ostree if done properly. It's not an exclusive property of guix/nix/ostree, not at all.
Posted Feb 18, 2025 13:01 UTC (Tue)
by aragilar (subscriber, #122569)
[Link] (3 responses)
Posted Feb 18, 2025 14:13 UTC (Tue)
by mezcalero (subscriber, #45103)
[Link] (2 responses)
Posted Feb 20, 2025 8:18 UTC (Thu)
by aragilar (subscriber, #122569)
[Link] (1 responses)
The issue is who makes the decision as to whether users should be allowed to do this task. If this is imposed on users (e.g. their phones), then we have a problem, but if this is effectively an appliance that is provided for a specific task (and even then, the question of should things be automatable be asked), then it does make sense.
Posted Feb 20, 2025 10:40 UTC (Thu)
by Rigrig (subscriber, #105346)
[Link]
Posted Feb 17, 2025 18:06 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (21 responses)
This is absolutely a short-sighted decision that will cause huge problems down the line. In particular, for static binaries and for the future lockdown features that are going to be hamstrung by not having the ability to load the whole dependency graph.
Posted Feb 17, 2025 18:13 UTC (Mon)
by bluca (subscriber, #118303)
[Link] (1 responses)
Posted Feb 17, 2025 18:18 UTC (Mon)
by corbet (editor, #1)
[Link]
Posted Feb 17, 2025 18:37 UTC (Mon)
by mezcalero (subscriber, #45103)
[Link] (17 responses)
Lennart
Posted Feb 17, 2025 19:18 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (16 responses)
With dlopen() optionals the only way is to load all of them, in case the application needs them later. Then there are issues of disk access, optionals are resolved at an arbitrary point.
This all will have consequences in the future.
Posted Feb 17, 2025 19:26 UTC (Mon)
by mezcalero (subscriber, #45103)
[Link] (15 responses)
In fact the way we do it makes it a lot *easier* to handle things like this, since you could parse the mentioned elf note within your program ealy on and load all modules listed therein, gracefully handling those whoch cannot be fulfilled because not installed and *then* seal off memory. After all the data is there, programmatically accessible from inside the elf programs and from outside too.
Posted Feb 17, 2025 22:19 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (13 responses)
musl exists, and it can be fully static. With nscd we don't need NSS modules, and the only remaining bogosity is in PAM.
We're actually pretty close to a fully static system, that can be sealed in RAM.
> In fact the way we do it makes it a lot *easier* to handle things like this, since you could parse the mentioned elf note within your program ealy on and load all modules listed therein, gracefully handling those whoch cannot be fulfilled because not installed and *then* seal off memory.
"Mandatory optionals, it has a nice ring to it!"
Posted Feb 17, 2025 22:46 UTC (Mon)
by mezcalero (subscriber, #45103)
[Link] (12 responses)
Oh christ, static linking. and musl. not sure where to start. I mean, sure whatever, I think we have very different pov on operating systems. Good luck!
> "Mandatory optionals, it has a nice ring to it!"
Hmm? what's mandatory? not grokking what you are saying? even if you load dlopen() weak deps during process initialization early on they don't become mandatory?
Posted Feb 17, 2025 23:54 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
Yep. I prefer systems to be as static as possible, with cryptographic verification from the image down to individual pages in RAM. Adding _more_ dlopens is not helping.
> Hmm? what's mandatory? not grokking what you are saying?
I believe so?
> In fact the way we do it makes it a lot *easier* to handle things like this, since you could parse the mentioned elf note within your program ealy on and load all modules listed therein, gracefully handling those whoch cannot be fulfilled because not installed and *then* seal off memory.
This will result in xz loaded unconditionally everywhere, since there are no primitives that can generically say "/bin/true needs libsystemd but _only_ for the readiness protocol".
Posted Feb 19, 2025 15:15 UTC (Wed)
by surajm (subscriber, #135863)
[Link] (4 responses)
If you really want to go the static route you might end up with more binaries where you try and split out what could have been an optional shared lib dependency into its own process which you talk to over ipc. In some cases that might be okay but certainly not all.
Posted Feb 19, 2025 23:17 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Eagerly loading everything will undo that.
The _correct_ way to handle the screwup with compressors was to split libsystemd into libmeaculpa that only has the readiness protocol and other simple compact functionality, and into libjournald that has all the heavy journald-related stuff. Then projects like ssh can slowly migrate from the full libsystemd to libmeaculpa.
Posted Feb 19, 2025 23:51 UTC (Wed)
by bluca (subscriber, #118303)
[Link] (2 responses)
Yeah exactly, the purpose of it is to allow using the exact same build in multiple contexts, from a fully-featured large system to a minimal one, without having to do bespoke recompilations, which are a pain in the backside. This way the exact same package can be used for all purposes, and who assembles it decides how much functionality to take in or leave out, by simply changing which packages gets pulled in. It works really well and we can now build tiny initrds and large desktop images from the same set of packages.
Posted Feb 20, 2025 9:31 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Feb 20, 2025 9:58 UTC (Thu)
by bluca (subscriber, #118303)
[Link]
Posted Feb 19, 2025 10:11 UTC (Wed)
by tlamp (subscriber, #108540)
[Link] (5 responses)
Do you have perchance some of those thoughts spelled out somewhere already? I'd be honestly interested to read about them, especially w.r.t. statically linking in general.
One thing that I'm also curious is you being in favor (FWICT) of image based distros, which IMO is basically a sort of higher-level statically linking over multiple binaries, vs. not liking statically linked binaries (distributed on a package level) so much.
As with some eye squinting those do not seem _that_ different in practice to me – e.g., things like A/B updates might come slightly more naturally with image based distros, but are certainly possible without them.
Posted Feb 19, 2025 10:59 UTC (Wed)
by mezcalero (subscriber, #45103)
[Link] (4 responses)
Posted Feb 19, 2025 19:58 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Posted Feb 19, 2025 20:09 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Posted Feb 19, 2025 20:27 UTC (Wed)
by raven667 (subscriber, #5198)
[Link] (1 responses)
Posted Feb 20, 2025 10:23 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
Which one is right for you is a decision that depends on the project goals.
Posted Feb 18, 2025 19:49 UTC (Tue)
by ferringb (subscriber, #20752)
[Link]
Posted Feb 17, 2025 18:50 UTC (Mon)
by brenns10 (subscriber, #112114)
[Link]
Posted Feb 17, 2025 18:31 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (65 responses)
This permeated even into the core of systemd. For example, systemd supports restarts and watchdogs for regular services, but not for mount units. So you can't do the most trivial thing: wait until the NFS server becomes accessible during the system startup. Why? No idea, there's really no technical justification for it.
Some other components are also at least "strange", like the DNS resolver (resolved). It can, apparently, change the host name. This is a behavior borrowed from macOS, but it's totally unexpected in Linux.
Posted Feb 17, 2025 19:51 UTC (Mon)
by ringerc (subscriber, #3071)
[Link] (1 responses)
People scratching itches who don't want to implement a 70% similar feature for the other parts they don't use or care about. Trying to make them often means nobody gets any of it instead and progress halts; allowing them to do 70% often means someone else can tackle the 30% who wouldn't have been able to do all 100%. I've been on all the different sides of this many times.
Sometimes others who are deep in the weeds of a focus area don't look for or see the parallel in the first place.
Sometimes what looks from the outside like a closely related feature turns out to be almost entirely dissimilar in use and implementation. So it wasn't done for good but non obvious reasons.
The Kubernetes project is packed full of weird gaps, quirks and omissions like this. Largely because of its organic user driven growth. Kubectl port-forward's lack of a json output for the port mapping, the behavior of statefulset scaling when the pods are failing, etc.
Posted Feb 17, 2025 22:20 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Feb 17, 2025 21:08 UTC (Mon)
by mezcalero (subscriber, #45103)
[Link] (61 responses)
I am not sure i see the connection of automatic service restarts and watchdogs on one hand and NFS on the other?
We do have quite nice infrastructure for waiting for network online state though, see systemd-networkd-wait-online.service. It's quite configurable, since what you think is "trivial" is actually mindbogglingly complex. Deciding when a network is "online" is different to everyone. Could mean IP address configured and/or link beat seen, or DNS works, or server can be reached or so, and then has multiple axis because of multiple NICs, and IPv4 and IPv6 and so on, as well as DHCP vs. IPv4LL and so on. What's right for people heavily depends on your usecase.
So, I think we actually cover what you are asking for really nicely, I am not sure what you are missing. Is it possible that you are not actually using systemd's tools though for networking? Maybe you are barking up the wrong tree then? That said, we actually do provide hook points so that other stacks can plug their own wait logic in too. Maybe yours didn't? We are hardly to blame for that though?
Or are you suggesting we should retry establishing NFS mounts if that doesn't work for the first time because networking is borked? Sorry, but that's really something the NFS folks need to implement and deal with. Only they understand what is actually going wrong and whether it's worth retrying. Frankly, any network protocol should retry a couple of times before they give up, and DNS, TCP all do. Maybe your NFS stack lacks that feature, but why do you think that'd be systemd's problem?
And now if you ask me why I made automatic server restart + watchdog logic our problem and refuse to make NFS ours too: well, I think service management is inherently our thing, but some specific network protocol that is only used by a relatively restricted subset of people really is not.
(Also, please understand that NFS is really not at the center of attention for any of the core systemd developers though, it's not really where we focus our primary love… It appears you simply have a different focus than us, but that doesn't make our stuff an "aggolomeration of random features", it just makes us have a different focus.)
Lennart
Posted Feb 17, 2025 21:46 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (60 responses)
It's not a network online state. A very simple example: a compute server and its NFS storage server are starting at the same time after a power cycle. The compute server boots quickly, but the storage server takes a bit longer.
The compute server gets to the network-online target and tries to mount the NFS filesystem, and immediately receives an error. The error is definite, because the NFS server is not yet running and the server returns ECONNREFUSED. Mount fails. There are no retries, and there's no way to express this with mount units.
Ironically enough, it's easy to do that with _regular_ units that just wrap the `mount` utility.
> Maybe your NFS stack lacks that feature, but why do you think that'd be systemd's problem?
It's a bog-standard Linux. The same problem can happen with SMB and other protocols.
> Or are you suggesting we should retry establishing NFS mounts if that doesn't work for the first time because networking is borked?
Yep. For the same reason regular units have retries. Why is mounting special?
Posted Feb 17, 2025 22:50 UTC (Mon)
by mezcalero (subscriber, #45103)
[Link] (19 responses)
Have you request support for this from the NFS folks?
Posted Feb 17, 2025 23:28 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (17 responses)
Let's remove the retry logic from other units, then. And ask their developers to make sure they retry on failure themselves.
> This *really* sounds like somethings nfs and smb should just cover natively, only they know the reason why something failed
In this case the failure is fully expected. It's positively indicated by the NFS utilities: the server is not available.
You _can_ add retry logic to NFS, SMB, Ceph, and other filesystem mounting utilities. But why?!? This is literally what service supervision should do.
Posted Feb 18, 2025 1:11 UTC (Tue)
by tim-day-387 (subscriber, #171751)
[Link] (1 responses)
Posted Feb 18, 2025 5:04 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link]
But that's not an option in case of systemd.
Posted Feb 18, 2025 6:11 UTC (Tue)
by mezcalero (subscriber, #45103)
[Link] (14 responses)
Also I think nfs has exactly what I suggested already with the retry= option? So why can't you just use that?
(Let's also not forget: there are about 2 relevant network file systems around, and if you count the more exotic ones maybe 7, and they tend to be well maintained and are implemented in a layer *below* systemd (i.e. the kernel). which is systematically different from seervice mgmt where we have a bazillion of services and most of them are terrible and hence really need to be supervised and are implemented in a stack above systemd, conceptually. Supervising stuff that conceptually below you is kinda conceptual nonsense)
Frankly, just let it rest, we are not going to make nfs our problem. We are not going to go full static binaries either. I am sorry that your priorities are not ours but I nonetheless hope you can accept that.
Posted Feb 18, 2025 6:43 UTC (Tue)
by mb (subscriber, #50428)
[Link] (4 responses)
That is just an implementation design detail rather than an architectural or hierarchical necessity.
Posted Feb 18, 2025 14:00 UTC (Tue)
by bertschingert (subscriber, #160729)
[Link] (3 responses)
> That is just an implementation design detail rather than an architectural or hierarchical necessity.
I believe he's referring to the client rather than the server, in which case it being in the kernel is more than an implementation detail. An NFS client could be in userspace, too, but I imagine it would require an application to be specifically coded to use a library that implements the protocol, rather than going through the mounted filesystem.
Posted Feb 18, 2025 17:27 UTC (Tue)
by mb (subscriber, #50428)
[Link] (2 responses)
There's no fundamental reason why nfs-client mounting part must be in the kernel. nfs-client can be thought of as a *local* service that sits between the network and whatever gives it access to the filesystem mounting mechanism.
For me NFS is not properly integrated into the system and it has never been. At least on Debian. Also in pre-systemd days.
If I hit the system shutdown buttons it should first umount NFS and then tear down networking.
Posted Feb 18, 2025 19:48 UTC (Tue)
by bertschingert (subscriber, #160729)
[Link] (1 responses)
I considered a FUSE NFS client but dismissed the idea because I didn't see what benefits it would provide over the kernel client. If I'm missing something though, I'd love to learn what the use case is.
Posted Feb 18, 2025 20:32 UTC (Tue)
by mb (subscriber, #50428)
[Link]
I didn't say that.
>It seems like that issue is orthogonal to whether the FS client implementation is in kernel or userspace?
Exactly.
>If I'm missing something though
The root of this discussion was that it has been said that systemd was not supposed to manage NFS mount retries, because they are "below" systemd in architecture.
That's only an implementation detail, though. If there's a way to easily implement NFS above systemd, one can hardly argue that it is architecturally below by its nature.
It doesn't matter where a service is implemented. If the service fails to do what it is supposed to do, the service manager should retry.
Posted Feb 18, 2025 6:45 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
But it is. It literally is a service, just with an additional filesystem interface. There is no logical reason why socket and automount activation should be treated any differently.
> Also I think nfs has exactly what I suggested already with the retry= option? So why can't you just use that?
NFS retries are too basic, CIFS doesn't have them: https://linux.die.net/man/8/mount.cifs and FUSE-based filesystems are similar.
> Frankly, just let it rest, we are not going to make nfs our problem. We are not going to go full static binaries either. I am sorry that your priorities are not ours but I nonetheless hope you can accept that.
I mean, yes. I'm accepting that systemd is a poorly-run project that produces inconsistent software. This inconsistency is not limited to mount units, nspawn units are similarly special. And they are _literal_ services.
After seeing the development flow and the lack of coherency in it, I'll be steering away from using it more deeply than for the basic process supervision.
There are far too many interactions in systemd to make it reliable as a _system_.
Posted Feb 18, 2025 11:31 UTC (Tue)
by beagnach (guest, #32987)
[Link] (2 responses)
OK now you're stopping to ad-hominen attacks thinly disguised as technical critiques. It's starting to look like some personal grievance is underlying the poorly thought out arguments. You're just ruining your own credibility here.
> I'll be steering away from using it more deeply than for the basic process supervision.
Please do.
Posted Feb 18, 2025 18:49 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
I'm sorry, what? I have no problems whatsoever with systemd's authors, and I like the idea of systemd itself. I'm saying that the systemd as a project is poorly ran and lacks focus and consistency.
Posted Feb 18, 2025 19:22 UTC (Tue)
by corbet (editor, #1)
[Link]
Thanks.
Posted Feb 18, 2025 7:49 UTC (Tue)
by PeeWee (guest, #175777)
[Link] (4 responses)
Posted Feb 18, 2025 11:10 UTC (Tue)
by bluca (subscriber, #118303)
[Link]
Yes, mount units are different, because they are not just statically defined, but come from the kernel too. And the /proc/mountinfo interface was so gnarly and full of races that handling lifecycle of these is _really_ hard to do right with all the billions of corner cases. Granted we now have new syscalls that should make it better, but we haven't got around to using them yet.
But as always it's much easier to DEMAND that open source projects implement the one thing you really care about and nobody else does, and loudly whinge that they are "poorly run" if they don't do that pronto and for free and with a ribbon on top, I guess (not referring to you).
Posted Feb 18, 2025 11:45 UTC (Tue)
by taladar (subscriber, #68407)
[Link]
Posted Feb 18, 2025 14:17 UTC (Tue)
by mezcalero (subscriber, #45103)
[Link] (1 responses)
(they don't for mount units, but as mentioned I am pretty sure that is not our job to add, fs developers should add retries if desired to their mount tools/file system drivers)
Posted Feb 18, 2025 14:47 UTC (Tue)
by PeeWee (guest, #175777)
[Link]
So I stand corrected: *I* have missed them in the docs. Thanks, both of you, for clearing things up.
Posted Feb 18, 2025 0:07 UTC (Tue)
by Wol (subscriber, #4433)
[Link]
I'm not going to try and place blame - I don't have a clue how Pr1menet did it, but I was used to putting "ADDDISK DISKNAME ON COMPUTERNAME" into my Pr1me's bootscript back in the 80s. The state of COMPUTERNAME when I booted my system was irrelevant - when that computer came up, that resource (disk) appeared on my system.
It's always irked me that *nix'en can't declare resources that just appear once they are available. Although systemd does have automount units that mount when you try to access the resource - so I wonder why the GP doesn't try that? Or doesn't that work with NFS (it does with CIFS, I use it).
Cheers,
Posted Feb 17, 2025 23:13 UTC (Mon)
by MrWim (subscriber, #47432)
[Link] (3 responses)
Posted Feb 17, 2025 23:29 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Posted Feb 18, 2025 8:40 UTC (Tue)
by mbunkus (subscriber, #87248)
[Link] (1 responses)
Yes, this is ugly.
Posted Feb 18, 2025 14:50 UTC (Tue)
by hmh (subscriber, #3838)
[Link]
I like it.
And to me, what it describes does feel like a higher-level functionality that makes sense to have on mount units, yes.
Posted Feb 17, 2025 23:45 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (5 responses)
(Yes, I know, k8s is overkill for this particular use case, so much so that its documentation will outright tell you to throw away NFS and replace it with their bespoke solution. Unfortunately, pets are harder than cattle. Sometimes, you just have to choose between using the massively overengineered solution anyway, or putting up with the friction of not using it.)
Posted Feb 17, 2025 23:50 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link]
Correction: This is wrong. They will tell you that NFS is one of many, many things that can be slotted into their bespoke solution. So yes, this is fully supported under k8s. That doesn't change the fact that k8s is probably overkill for a two-node setup.
See for example https://github.com/kubernetes/examples/tree/master/stagin...
Posted Feb 18, 2025 0:38 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
To add more spice, the compute server uses iSCSI to mount the volume that has the configuration for the containers it runs. So K8s needs first to have the iSCSI mounted, which also has the same "retry" problem, btw.
Posted Feb 19, 2025 0:32 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
Strictly speaking, I believe k8s can accommodate that as a one-node system. Which is even more absurdly overkill, but at least it should work.
I would still call this a form of orchestration, because you care about the state of more than one node even if you only control one of them. But I suppose that's a matter of semantics.
> To add more spice, the compute server uses iSCSI to mount the volume that has the configuration for the containers it runs. So K8s needs first to have the iSCSI mounted, which also has the same "retry" problem, btw.
k8s has an API because you are meant to write code that calls into it (or use code that others have written). Said code can, itself, be run by k8s - there is nothing preventing you from having a Pod that spawns new Pods or updates existing ones. This pattern is normally used for release automation (and in that use case, it preferably updates Deployments or StatefulSets rather than directly manipulating individual Pods), but it can also solve the "I need this volume mounted before I can figure out what I want to run" use case, and it can even be self-hosting once you start it for the first time (assuming it is smart enough, and of course you will want to have a reasonable story for doing a black start if needed).
Posted Feb 19, 2025 4:22 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Feb 19, 2025 9:06 UTC (Wed)
by stijn (subscriber, #570)
[Link]
> Unit startup is executed as a non-idempotent parallel dataflow with weak ordering guarantees on the job level, mostly independent of the active state of dependent units.
Which superficially sounds exactly like Nextflow's 'functional dataflow model' where 'Instead of running in a fixed sequence, a process starts executing when all its input requirements are fulfilled' (lazily copied from Wikipedia - I'm well acquainted with Nextflow, not so with systems). In both cases there is a 'singleton object called Manager, responsible for launching jobs' (copied from the V.R. piece).
So .... does systemd resemble a workflow engine that dispatches jobs via a dataflow 'requirements fulfilled' model?
Posted Feb 18, 2025 9:07 UTC (Tue)
by neilbrown (subscriber, #359)
[Link] (26 responses)
Posted Feb 18, 2025 9:41 UTC (Tue)
by Wol (subscriber, #4433)
[Link]
Being naive, but I would have expected any dependency to say "NFS after network" (although I'm probably thinking of client rather that server...)
Cheers,
Posted Feb 18, 2025 18:45 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (24 responses)
Posted Feb 18, 2025 20:48 UTC (Tue)
by ferringb (subscriber, #20752)
[Link] (2 responses)
Posted Feb 18, 2025 21:10 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Feb 18, 2025 21:26 UTC (Tue)
by ferringb (subscriber, #20752)
[Link]
I reiterate, no network mount + init had ever been more than "stab my eyes out" in my experience. That said, the requires inje tion looks a helluva lot simpler than the old route if writing custom shell for mounting, and then trying to sequence init levels (thus delaying the entire boot).
Posted Feb 19, 2025 5:46 UTC (Wed)
by neilbrown (subscriber, #359)
[Link] (20 responses)
So the nfs server could be run in a container that provides a network namespace with an IP address which isn't configured until nfsd has started.
However a quick look at mount.nfs code suggests that ECONNREFUSED it treated by nfs_is_permanent_error() as not being a permanent error.
As you say, cifs doesn't have a retry option. "automount" might be the correct tool. Or you could report a bug to cifs-utils.
Can you use systemd to overcome this weakness in cifs? Probably. Local mounts often have a dependency on a device. cifs seems to have a dependency on a remote TCP port being responsive. Is there a tool that can test if port 445 is open on a given IP address? nmap can do it but doesn't return an exit status. "nc -N IP 445 < /dev/null" returns 0 when the port accepts a connection and 1 otherwise. You could create a service which execs this command and restarts on failure. Then make the mount depend on the service.
Posted Feb 19, 2025 23:11 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (19 responses)
However, having systemd stay consistent and offer retries would have helped to remove that bogosity.
And another meta-observation: systemd docs are a freaking mess. There are tons of options, and some are very powerful, but they are completely undiscoverable. E.g. the ability to transfer secrets between daemon restarts.
I believe, this stems from the very same problem: the lack of the overall vision. Systemd is being developed like a giant ball of Lego components, that slowly accretes additional pieces as it rolls through the landscape.
If I were designing something like systemd now, I would have defined a central "Service" entity with its lifecycle. Then I would implement this "Service" for runnable processes, mount units, devices, etc. Some of them might not have all the state transitions, but this can be documented explicitly.
Posted Feb 19, 2025 23:42 UTC (Wed)
by bluca (subscriber, #118303)
[Link] (3 responses)
And what have you done to make it better, given it's open source? Oh that's right, the square root of fuck all. Guess whining about stuff you get for free is easier than doing something useful.
Posted Feb 19, 2025 23:45 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
As for systemd, I don't think the docs can be fixed. They are a symptom, not the cause.
Posted Feb 19, 2025 23:52 UTC (Wed)
by corbet (editor, #1)
[Link] (1 responses)
How about if everybody in this subthread calms down, please?
Posted Feb 22, 2025 17:54 UTC (Sat)
by dmv (subscriber, #168800)
[Link]
Posted Feb 20, 2025 7:52 UTC (Thu)
by donald.buczek (subscriber, #112892)
[Link] (14 responses)
There are other ways to look at it. In my opinion, the documentation of systemd is exemplary and far above the usual level. The man pages are a complete and correct reference: they cover all levels of abstraction, from basic working models to the smallest formatting details, completely, concisely, and precisely. They are well structured and refer to each other in a meaningful way.
Posted Feb 20, 2025 9:32 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (11 responses)
Posted Feb 20, 2025 9:56 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (10 responses)
Posted Feb 20, 2025 23:14 UTC (Thu)
by Klaasjan (subscriber, #4951)
[Link] (4 responses)
Posted Feb 20, 2025 23:18 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (3 responses)
Posted Feb 20, 2025 23:36 UTC (Thu)
by Klaasjan (subscriber, #4951)
[Link] (2 responses)
Posted Feb 23, 2025 4:09 UTC (Sun)
by pabs (subscriber, #43278)
[Link] (1 responses)
https://gitlab.com/redhat/centos-stream/docs/enterprise-d...
Posted Feb 24, 2025 6:58 UTC (Mon)
by Klaasjan (subscriber, #4951)
[Link]
Posted Feb 20, 2025 23:33 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
To give an example of a good doc: https://upload.wikimedia.org/wikipedia/commons/3/37/Netfi... from https://wiki.linuxfoundation.org/networking/kernel_flow A very clear but detailed overview.
I actually looked, and I can't find anything similar for systemd. I've been following its development for years, but I can't outright tell how symlinks, overrides, and other machinery work together.
Posted Feb 20, 2025 23:54 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (3 responses)
Posted Feb 21, 2025 0:23 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
And speaking of old, systemd.io links to the series of Lennart's blog posts from 2010-2012. They're also still mostly accurate, fwiw, but do miss stuff like practical uses of journalctl.
Posted Feb 21, 2025 6:56 UTC (Fri)
by zdzichu (subscriber, #17118)
[Link] (1 responses)
Posted Feb 21, 2025 7:08 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
FWIW, I have a personal wrapper around systemd (`scl`). It started as an alias to `systemctl` because I didn't want to keep breaking my fingers typing it all the time. Then I added some obviously missing functionality like starting a service and viewing its logs in one command (like `docker compose up` does, for example). I think a lot of hostility to systemd could have been avoided, if something like this existed.
Posted Feb 20, 2025 9:34 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (1 responses)
Most developer documentation is great for reminding you of what you already know. It is absolutely hopeless at teaching you how to use the system. That's why beginners make the same mistakes over and over again - from their point of view Double Dutch would probably make more sense.
I've had exactly that with systemd - I don't know where to start looking, so I can't find anything, and what I do find I can't make sense of. That is where so much stuff (not just computing) is severely lacking nowadays. We've just bought a new modern tv, which thinks that the main purpose of the user manual is to tell you where to plug the leads in. Of course, we don't have a clue how to use it, everything is trial and error, and it's so damn complicated that we can't find anything! We just want a tv we can turn on and watch!!!
And of course, most documentation FOR beginners is written BY beginners, so it's full of beginner grade errors :-(
Cheers,
Posted Feb 21, 2025 7:38 UTC (Fri)
by donald.buczek (subscriber, #112892)
[Link]
Occasionally reading Lennard's “PID Eins” blog is more of a leisure activity, but it sometimes gives you ideas about whether one or the other idea or a new systemd feature could be useful for us as well.
Posted Feb 18, 2025 11:51 UTC (Tue)
by WolfWings (subscriber, #56790)
[Link] (2 responses)
The correct solution here is to add an additional oneshot unit. Roughly: And monitoring.sh would be roughly (using libnfs-utils for simplicity): Which would approximately wait for up to one hour, testing once a minute for the storage server to be online before allowing things to continue, and properly error out with a failure state after that hour. This isn't fully fleshed out as a .unit file, etc, but it's the gist, you just need to insert your extra dependency (wait for the storage server) as exactly that: A dependency in the chain.
Posted Feb 18, 2025 18:47 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Feb 19, 2025 22:59 UTC (Wed)
by neggles (subscriber, #153254)
[Link]
It's a completely valid use case; for example, wekafs mounts require an LXC where the DPDK-enabled frontend container service runs to be started and operational before the mount will succeed; but you may have just started a converged cluster and not have a quorum of nodes up and running yet, so you want to retry a few times over (say) 10 minutes before giving up. On shutdown, I also need that mount unit to be unmounted before the service units that back it are shut down, and before the network goes down.
This is stuff a service manager should be handling, right? Service and mount dependencies and ordering? As a bonus, this is probably the filesystem that /home lives on, so only your break-glass accounts can sign in if it's not mounted.
What is so harmful about allowing a failed mount to retry just like a failed service startup?
systemd often doesn't know why a given service failed to start any more than it knows why a mount failed to mount, so why is it OK to blindly restart services but not retry mounts?
why are automount units (which retry every time someone tries to open() a path beneath them) acceptable when an auto-retrying mount isn't?
The functionality is utterly trivial for systemd to add, it's logic that already exists for other units (and IIRC it would Just Work if mounts weren't explicitly excluded), and this is a significant pain point for users. Refusing to allow it on what seems to be ideological grounds ("you shouldn't need this" / "we can't be perfect so we won't try") is just shitty, i can't think of a better way to put that.
Posted Feb 20, 2025 10:20 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
Posted Feb 18, 2025 13:23 UTC (Tue)
by dankamongmen (subscriber, #35141)
[Link] (8 responses)
Posted Feb 18, 2025 17:42 UTC (Tue)
by Tobu (subscriber, #24111)
[Link] (3 responses)
Oh this is very good, it goes into the historical context and the social dynamics! The technical dive that follows didn't seem relevant immediately, but it does explain the issue with some unit types being more ad-hoc (and might help with finding workarounds for the NFS mount unit people were discussing). And it does make a relevant point about some dubious job engine stuff getting frozen after a period of fast growth in adoption. And about the non-stop expansion of the focus, with new features being more tractable than clean-ups that will break things for someone.
Posted Feb 18, 2025 17:55 UTC (Tue)
by bluca (subscriber, #118303)
[Link] (2 responses)
It's really not though, the chip on the shoulder is extremely evident, and it's very inaccurate. For example there have been tons of changes to the engine, the recent PIDFD-ization is one example of sweeping across-the-board changes affecting process management.
Posted Feb 18, 2025 18:52 UTC (Tue)
by Tobu (subscriber, #24111)
[Link]
I can see the author's position and I don't mind it. Because of systemd finding success, outside viewpoints are useful. As is the retrospective showing how things happened and might have happened differently. As far as components being more or less coupled as convenient for forcing adoption early on, it's also close to how I remember it. Now that systemd is well established, it could in fact stand to decouple its components more, and the adoption of varlink as an alternative to dbus that doesn't require global bus instances and tricky bootstrapping is a good move in that direction.
Posted Feb 19, 2025 16:14 UTC (Wed)
by raven667 (subscriber, #5198)
[Link]
Posted Feb 19, 2025 4:02 UTC (Wed)
by motk (guest, #51120)
[Link] (3 responses)
Posted Feb 19, 2025 16:07 UTC (Wed)
by intelfx (subscriber, #130118)
[Link] (2 responses)
What's that? (in words suitable for those unfamiliar with...presumably, Final Fantasy lore?)
Posted Feb 19, 2025 16:13 UTC (Wed)
by bluca (subscriber, #118303)
[Link] (1 responses)
Posted Feb 19, 2025 16:38 UTC (Wed)
by raven667 (subscriber, #5198)
[Link]
Posted Feb 18, 2025 20:08 UTC (Tue)
by dmoulding (subscriber, #95171)
[Link] (5 responses)
I know, I can already hear the cries of, "But the kernel code isn't secret, why do you want to encrypt your kernel binary?!" Well, there's more than just the kernel binary on my boot volume. It also contains an initramfs image which may contain all manner of things that I might not want easily divulged to a determined adversary.
It also contains the kernel command line arguments which I also don't necessarily want to be openly visible to nefarious eyes (things like file system labels or UUIDs, and other bits of system configuration information may be found therein).
Posted Feb 18, 2025 20:48 UTC (Tue)
by bluca (subscriber, #118303)
[Link]
If for any reason you have data that _must_ be stored in the ESP and _must_ be encrypted, you can use credentials and seal them against the local TPM: https://systemd.io/CREDENTIALS/
Posted Feb 18, 2025 21:04 UTC (Tue)
by ferringb (subscriber, #20752)
[Link] (3 responses)
This is absolutely talking out my backside, but if you really have stuff in initrd- that must be there but you wish to protect- I suspect you *could* use an intermediate initrd. Basically a layer that holds decryption tooling for mounting, and pivot's across to that- what a normal/proper setup should be doing (keeping the 'secret' crap behind encryption). For the 'decrypt', TPM or secureboot validation of the UKI and then require some external validation (ick).
https://lwn.net/Articles/1010466/ (systemd-soft-reboot) also seems like a hacky way to shove the same in, just mounting the decrypted inner initrd to /run/nextroot . That seems like pain, but who you do you. :)
Still, I'm curious exactly what you're concerned about, and if it's any form of 'secrets', why that isn't either buried in the encrypted drives, or utilizing systemd-creds?
Posted Feb 18, 2025 22:59 UTC (Tue)
by dmoulding (subscriber, #95171)
[Link]
Where I'm from, being anal and paranoid is what encryption is all about! :)
>What sort of things are in your initrd that you consider sensitive?
Well, just taking a quick look at one from one of my machines, without looking too hard, I see it's got /etc/machine-id in it. According to the docs, "This ID uniquely identifies the host. It should be considered "confidential", and must not be exposed in untrusted environments".
>About my only real concern w/ initrd/ramfs is using tooling like dracut
And yes, that is exactly the issue. Nobody I know builds their initramfs by hand. There's no easy way to verify on every single machine that every initramfs is being assembled (by whatever tool happens to be in use to generate it) without anything that is confidential. Or could be used by an attacker to glean information that might enable further attacks down the line (even just seeing what kernel version is in use might aid an attack). And even if I were to go through the trouble to validate all of them on all of my machines today, doesn't mean two weeks from now that will still be the case. Not to mention, nobody I know has time for checking that.
The simplest solution is to just encrypt the boot volume and not worry about it.
> you *could* use an intermediate initrd. Basically a layer that holds decryption tooling for mounting
But why should I bother cooking up some rickety workaround like that, when there's already a perfectly good solution that does exactly what I need, today?
I think maybe it's a mistake to think that nothing on the boot volume requires encrypting. I hope that the systemd project isn't baking that mistake into their systemd-boot architecture. If they do, it's likely a big step backwards from what we already have with GRUB.
Posted Feb 19, 2025 11:42 UTC (Wed)
by taladar (subscriber, #68407)
[Link] (1 responses)
Posted Feb 19, 2025 13:36 UTC (Wed)
by ferringb (subscriber, #20752)
[Link]
I'm both curious and pre-emptively a bit terrified about that setup; you're trying to do this as a way to centralize encryption keys? Specifically the ability to get into a disk if it gets pulled? Or is this intended as a form of remote attestation to get the keys?
If the former, is there any reason you're not just using a scheme that either adds a common key across all disks (ick), or alternatively mixing hardware identification into a central key, so if you know have access to the 'central' privkey and know particulars of the hardware, you can recreate the per host key disk keys?
> However for those you can't really encrypt the initrd because you need it to decrypt the rest.
I'd argue you should be using https://www.freedesktop.org/software/systemd/man/latest/s... , specifically read the encrypt section for an explanation of the two modes. It's basically the same sort of trick as automatic unlocking of a disk.
Specifically, I'm proposing you store the systemd-creds encrypted key into the initrd, and during init you use creds to decrypt the ssh key. That key can only be recovered if the system measurements match. Change the bootloader, measurements change; etc. See https://wiki.archlinux.org/title/Trusted_Platform_Module#... for the various measurements you can make it sensitive to; if you're particularly aggressive, you can include firmware measurements.
You'd have to do some custom units on that, but I assume you're already doing that for how you're pulling the luks keys.
If you've already explored this and rejected it, I'd be curious of your reasons.
Posted Feb 18, 2025 21:19 UTC (Tue)
by wahern (subscriber, #37304)
[Link] (1 responses)
Lazy linking is typical on Linux systems, but the library is still loaded at start. Have there been any experiments to extend ELF and/or the linker (e.g. glibc's rtld) to lazily load also? It would require the compile time linker to associate a shared library name with each symbol, which I don't think is possible today.
Today with lazy linking you can approximate this by not declaring the library dependency, and dlopen'ing the library before invoking any functions. But you still have to explicitly dlopen before invocation, otherwise the linker throws a fatal error (the symbol has no definition). I've used this in an OpenSSL test harness to be able to test different OpenSSL builds--you pass on the command-line the desired library path(s) to load, which is processed before any OpenSSL API is invoked. It actually works for both ELF and Mach-O. And it proved more convenient and less brittle than LD_PREOAD--it was too easy to accidentally mismatch libssl and libcrypto or otherwise quietly using (without any apparent error) the wrong library, for example.
AFAICT systemd uses a macro, DLSYM_PROTOTYPE, to generate and declare pointers to used symbols. And before dereferencing any of those symbols a utility routine is run, like dlopen_many_sym_or_warn, that takes a list of pointers to those pointers to update. Though I suppose one could make a more sophisticated macro that generated wrapper routines that called dlopen. But you still have to, effectively, redeclare every symbol. Maybe some projects, like systemd, would prefer to keep things more explicit, but for widespread adoption[1] I could easily see a more automatic mechanism preferred so you don't need to superfluously declare things that are already declared in the library's header, and otherwise leverage the existing infrastructure (e.g. cross dependencies, lazy loading of recursive dependencies, etc).
[1] I'm not sure I'd want to adopt this in the general case, but from an engineering standpoint it's an enticing problem.
Posted Feb 22, 2025 5:57 UTC (Sat)
by pabs (subscriber, #43278)
[Link]
Posted Feb 18, 2025 23:52 UTC (Tue)
by MrWim (subscriber, #47432)
[Link]
Centralised projects naturally have internal conflict, priorities to be managed, etc. - something that has to be managed in any project of sufficient scale. In other fields of endeavour this is known as politics. Many open-source advocates and developers deny its existence or necessity, and as a result are unable to scale sufficiently to solve problems that require that scale[^1]. Not systemd (nor Linux), where project is still able to function without complete consensus.
By being organised and centralised the projects opens itself to criticism, because there is a clear target to be criticised. systemd claims to solve a problem, rather than be just another developer scratching a personal itch where complaints can be dismissed with "you can't tell me what to do". Systemd is sometimes accused of buck passing - dismissing problems as needing to be solved elsewhere - but I think that it demonstrates a strength - people disagree over the scope of the project and that disagreement is met head on, rather than dismissed with excuses.
Another aspect: systemd went out of its way to be adopted by the major distributions, doing the hard work of advocating for adoption. The act of advocating is difficult - it opens you up to criticism, and involves putting yourself in a vulnerable position where you risk rejection. At the same time it requires empathy, and a willingness to adapt the project to another project's needs. Not easy.
Funnily enough I rarely see these aspects of the project discussed - except by systemd's detractors, where they are all brought up as negatives. Sometimes complaints are dressed up in technical terms like "monolithic", when really it's about people organisation, not code organisation. Advocacy is treated as cheating, when it's harder than writing code. Project organisation is hard, thankless work that few have the skills to do, and many don't even recognise that it exists at all, let alone are thankful towards those who actually step up and do it. systemd is all the more impressive for having achieved what it as how it has.
[^1]: See also https://www.bassi.io/articles/2019/01/17/history-of-gnome... and the discussion of Conway's law. As an aside I wonder if it's one reason that micro-kernels have not been successful at any scale - Conway's law makes it's possible to keep avoiding conflict, splintering teams all the way until the project fails.
Posted Feb 19, 2025 10:26 UTC (Wed)
by walex (guest, #69836)
[Link] (3 responses)
«Then came Upstart in 2006. It was an event‑based init daemon [...] The
question of why distributions didn't stick with Upstart deserves an answer, he
said. [...] It required "an administrator/developer type" to figure out all of
the things that should happen on the system "and then glue these events and
these actions together".» This is just handwaving as both Upstart and systemd are event
based and both require a systems engineer to write a lot of configuration. The reason was that Upstart was "push" (eager) based and that is less
convenient than a "pull" (lazy) based one like systemd. I am not surprised that Poettering does not understand even the positive
side of systemd as he seems to me a very intelligent person who
however does stupid things because of shallowness, in particular after 14
years systemd still had two fundamental and related issues: If some thoughts has been given to those fundamental issues instead
of hacking piecemeal new complicated features and wrappers in the past
14 years a much smaller and simpler structure might have been the result.
Posted Feb 19, 2025 14:36 UTC (Wed)
by corbet (editor, #1)
[Link] (1 responses)
Posted Feb 20, 2025 11:39 UTC (Thu)
by walex (guest, #69836)
[Link]
I will strive to never call again Poettering a "a very intelligent person" again on LWN. That is the only comment I made as to his person.
Posted Mar 5, 2025 12:53 UTC (Wed)
by dtardon (subscriber, #53317)
[Link]
So maybe you could enlighten those among us who don't know what service states are? And why they are better than the status quo?
> Since systemd aims to manage system states and the UNIX architecture does not have good facilities for connecting unrelated (by forking) processes
UNIX might not, but Linux does. It's called cgroups and systemd makes heavy use of it.
> and there is no model of service states systemd has become a huge and effectively monolithic (despite being technically split into 150 closely related executables) general service state multiplexer
IOW, because systemd doesn't have a model of service states, it's become a service state multiplexer... Sorry, but this sentence makes no sense. (And the bit about the split to separate executables being just technical is pure bullshit. It just shows that you've no idea what you're talking about.)
> where it must eventually manage all system aspects itself using a wild and complex variety of service state wrappers,
It doesn't.
> network interfaces,
It doesn't either. networkd does, but that's completely unrelated to PID1 and there's no information sharing between them.
> filesystem mounts itself etc. to ensure that their state is well defined before starting services that depend on those.
I'm eager to learn how service states help to manage dependencies between services and mounts without tracking mounts.
Posted Feb 25, 2025 22:12 UTC (Tue)
by lee_duncan (subscriber, #84128)
[Link] (8 responses)
Systemd had a chance to fix that, but they didn't. There's still no way to start an iSCSI session in initrd then pivot and have the full root understand that. But since it's not a preferred use case, it doesn't matter.
Posted Feb 25, 2025 22:36 UTC (Tue)
by bluca (subscriber, #118303)
[Link] (7 responses)
Posted Feb 25, 2025 23:49 UTC (Tue)
by lee_duncan (subscriber, #84128)
[Link] (6 responses)
We get connected in initrd, thus establishing the root disc. But then when it's time to switch root, the daemon is killed. A new daemon is started in user space. In the time between the pre-pivot daemon stopping and the post-pivot daemon starting, the connection cannot handle any errors, such as network hickups, target hickups. This also means that the post-pivot daemon needs to rediscover everything about the existing connection.
I would like the ability to have the daemon continue during post-pivot. Evidently, this is possible, the daemon can't see the post-pivot filesystem, so it can't read config files, create or read database entries, etc. That means it can't really be interacted with. So that's not a solution.
BTW, the daemon ran uninterrupted, from boot through multi-user, pre-systemd, for historical reference.
I believe the systemd solution to this is to redesign our system so the daemon isn't needed during pivot, which also will not happen.
I'm not aware of any remote-boot solution that works well with systemd, but I'm not sure how nvme handles this.
Posted Feb 26, 2025 0:28 UTC (Wed)
by bluca (subscriber, #118303)
[Link] (5 responses)
Then the daemon is not implemented correctly. This problem has been solved for a decade at least. See:
Posted Feb 28, 2025 21:56 UTC (Fri)
by lee_duncan (subscriber, #84128)
[Link] (4 responses)
I have a daemon that handles error connections, that needs to be running for the root disc to work correctly. And it does great at that. But it's somewhat complicated.
So to be able to use it the "systemd way", I would need to redesign it so that it can (1) run two at a time (not currently possible), and create a protocol for the initrd version to migrate its state to the post-pivot daemon, and not miss a beat if the root disc connection has an issue (or, even worse, have to daemons trying to fix the problem).
Or perhaps I'm supposed to keep my daemon from being killed? But then it's stuck in pre-pivot initrd root forever, and can't see any local filesystems. So calling it "solved for a decade" is off by about 10 years in my opinion.
Perhaps there's a working implementation of this "root storage daemon" policy? I'd love to see it if so, since none is reference in the link you supplied. Perhaps I can learn something.
Posted Mar 1, 2025 12:25 UTC (Sat)
by bluca (subscriber, #118303)
[Link] (3 responses)
This is not true, and even a cursory search on GH shows plenty of real world examples:
https://github.com/search?q=%22argv%5B0%5D%5B0%5D+%3D+%27...
Posted Mar 5, 2025 19:43 UTC (Wed)
by lee_duncan (subscriber, #84128)
[Link] (2 responses)
Open-iscsi is unlike any of these implementations, in that it daemon needs to access post-pivot sysfs, a post-pivot database, and post-pivot configuration files. The pre-pivot daemon has their own copies of these things.
This kind of push-back is why, in my experience, some see systemd as less than cooperative. I just see it as an immovable object I have to go around.
Posted Mar 5, 2025 21:40 UTC (Wed)
by bluca (subscriber, #118303)
[Link] (1 responses)
And? That's trivial to get
Posted Mar 6, 2025 10:22 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
Posted Feb 27, 2025 22:36 UTC (Thu)
by sombragris (guest, #65430)
[Link] (1 responses)
Not all of them. Slackware (the oldest continuously-maintained Linux distribution, which arguably fits the criteria for a "major" distro) does not use it, never has, and has no plan of including it in the foreseeable future. The article was very informative and educational. Thanks for the reporting.
Posted Feb 27, 2025 23:18 UTC (Thu)
by rahulsundaram (subscriber, #21946)
[Link]
As with many who started off with Linux early on, I too was a Slackware user at some point but I haven't seen it be considered a major distro in a really long time. It has a special place as one of the oldest Linux distros still actively maintained with a small but passionate Linux user base and over the years it has become a pretty niche distro. YMMV.
Standardization of paths
Only one heckler, no sit-in protesters, no banners, 2/10 would not attend again
Only one heckler, no sit-in protesters, no banners, 2/10 would not attend again
Only one heckler, no sit-in protesters, no banners, 2/10 would not attend again
Only one heckler, no sit-in protesters, no banners, 2/10 would not attend again
Only one heckler?
Only one heckler, no sit-in protesters, no banners, 2/10 would not attend again
Only one heckler, no sit-in protesters, no banners, 2/10 would not attend again
systemd's Image Obsession Misses the Forest for the Files
Images are required for verified boot
Images are required for verified boot
Images are required for verified boot
Images are required for verified boot
Images are required for verified boot
Images are required for verified boot
Images are required for verified boot
The OS update remains incomplete, as the kernel is not reset and continues running.
Kernel settings (such as /proc/sys/ settings, a.k.a. "sysctl", or /sys/ settings) are not reset.
Images are required for verified boot
My understanding is that live updates are usually there to tide you over until a more convenient time to reboot. Say there's an important vulnerability; you apply the live patches to shut it down immediately. However you're still expected to reboot soon, say in the middle of the night or over the weekend when there are fewer users.
Images are required for verified boot
Images are required for verified boot
Images are required for verified boot
Images are required for verified boot
Images are required for verified boot
Images are required for verified boot
Images are required for verified boot
Images are required for verified boot
Images are required for verified boot
Images are required for verified boot
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
Cryptographically verified OSes and DRM
>
> But doesn't this then come dangerously close the DRM FUD?
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
But if your threat model does include online attacks by malicious privileged processes (and for our case in Azure it very much does), the first model can mitigate it, the second one cannot.
systemd's Image Obsession Misses the Forest for the Files
But if your threat model does include online attacks by malicious privileged processes (and for our case in Azure it very much does), the first model can mitigate it, the second one cannot.
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
When a shared library (or any mmaped file) gets updated on disk, the related entry in /proc/$PID/maps will indicate that it is deleted. For example: Detected updated shared libraries
7f34474e0000-7f344763a000 r-xp 00024000 08:02 52661479 /usr/lib64/libc.so.6 (deleted)
Recent versions of Htop will highlight the process in yellow if there is such a deleted mapping. It is possible to have a program that scans every map file and restarts the associated systemd unit when it's shared libraries are updated.
Detected updated shared libraries
Detected updated shared libraries
Detected updated shared libraries
Wol
Detected updated shared libraries
Detected updated shared libraries
Detected updated shared libraries
Detected updated shared libraries
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
systemd's Image Obsession Misses the Forest for the Files
If this is imposed on users (e.g. their phones), then we have a problem
I run shell (and other) scripts on my phone using Termux, and W^X is indeed a problem: (apparently also for BOINC)
exec()
. This has various drawbacks though.DLL hell, version 2
DLL hell, version 2
If it is working well for you, please say so. That is fine. But please stop insulting the people you might disagree with on unrelated topics. Seriously.
DLL hell, version 2
DLL hell, version 2
DLL hell, version 2
DLL hell, version 2
DLL hell, version 2
> Glibc nss is a dlopen() based system, hence sealing things off entirely during early init of a process is just not realistic on Linux in general, and systemd's use of dlopen() is not making it worse.
DLL hell, version 2
DLL hell, version 2
DLL hell, version 2
DLL hell, version 2
DLL hell, version 2
Hopefully more projects that form part of the core OS will switch to this model, as it's a win-win with no downsides whatsoever. I've already seen a bunch of people interested in the elf notes spec to annotate dlopen deps, so this is very promising.
DLL hell, version 2
DLL hell, version 2
You can't have optional dependencies if they are not optional...
DLL hell, version 2
That said, while I'm very much have some experience with packaging both, dynamical and statically linked executables, and was not (yet) bitten by either linking – albeit felt the disadvantages sometimes, I certainly did not spend that much time on image based distros to have a strong opinion here and might easily overlook something – potentially even trivial stuff.
DLL hell, version 2
DLL hell, version 2
DLL hell, version 2
DLL hell, version 2
There's three ways to deal with dynamic linking in Rust:
Rust and dynamic linking
DLL hell, version 2
DLL hell, version 2
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
Network services are typically implemented in user space. (So can NFS.) And if these "normal" network services panic, you can retry with systemd units.
systemd - no rhyme or reason
systemd - no rhyme or reason
And if it would be implemented in that way, it could use systemd's restart mechanisms directly.
A problem that hits me daily (if I don't manually avoid it) is that a machine with a NFS mounted volume hangs during shutdown, if I forget to manually unmount all NFS before shutdown.
That happens because systemd tears down the network before it umounts the NFS.
That leads to possible data loss and long shutdown phases, because it has to time-out.
systemd - no rhyme or reason
systemd - no rhyme or reason
This was just another real world example where "the system" as-is falls apart. It's not clear to me which part of "the system" is at fault, but the behavior clearly is bad. But as systemd wants to be "the system manager", I tend to assume that systemd is at fault.
https://lwn.net/Articles/1010520/
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
Such a statement can certainly be seen as an attack on the people who are running the project. It also doesn't really help the discussion, so maybe we don't need to do that?
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
But in general this area is so fraught with risk that none of us are going to spend any time to add workarounds for corner cases that should really be handled by the kernel implementation or its userspace tools at best, as none of us has any use case involving networked file systems, and we have other things to do.
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
Wol
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
Wol
systemd - no rhyme or reason
If you're not telling systemd "the boot is ungodly long, wait till eternity, also it signals that the mount isn't possible due to the other end being down" I'm not sure what you can do there. Just pulling systemd - no rhyme or reason
man systemd.mount
, offhand some of the configurables here sounds like what you're after:
The NFS mount option bg for NFS background mounts as documented in
nfs(5) is detected by systemd-fstab-generator and the options are
transformed so that systemd fulfills the job-control implications of
that option. Specifically systemd-fstab-generator acts as though
"x-systemd.mount-timeout=infinity,retry=10000" was prepended to the
option list, and "fg,nofail" was appended. Depending on specific
requirements, it may be appropriate to provide some of these options
explicitly, or to make use of the "x-systemd.automount" option described
below instead of using "bg".
I mean, hanging for a stupidly long time on a mount/unit trying to bring it up seems... not great to me... but if that's the setup you've got, and the behavior you want, that's what you've got. More specifically, that's what you've got with the non-systemd components available, and what systemd has abstracted around it. If mount cuts out all retrys due to a ECONREFUSED, that's a kernel/mount complaint. You could try asking systemd folks to add an option for "MaxEconnRefused=int", but that sounds very much like a hack around something outside their scope that should be fixed.
I'm strongly suspect you could sidestep this anyways, via adding an explicit requires to the mount- target a one shot unit that confirms the nfs server is up. It's not like folks don't have to occasionally tweak the depgraph for weird setups, after all.
Also, it's not like NFS mounts and init hasn't been a colossal pain in the ass for decades. I'm not suffering your issue, but at least with this I can see a way to at least hack around it if I can't configure mount options to suppress it.
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
Or if that is too hard you could use iptables to add a rule to drop any packets to port 2049 until the NFS server is up.
mount.nfs will certainly keep trying while it gets no response - not even ICMP - from the server.
So a foreground mount will by default timeout after 2 minutes (maybe your synology takes longer than that too boot). You can change this with "retry=10000".
Remember: systemd provides a programming language for describing system management - a declarative language. You can provide your own primitives and program systemd to do whatever you want. You shouldn't expect it to already be able to do everything you could possibly want without needing to do any programming yourself.
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
Enough. Honestly, if you don't like this project, maybe you should be running something else, but slinging insults here will not help anybody. Please stop.
Second request
Second request
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
Now, what if I run systemd on a Debian system?
(Honest question, since indeed I do)
systemd - no rhyme or reason
systemd - no rhyme or reason
It would be nice if RedHat’s documentation was available under a free enough license so that it could be hosted on Debian.org as well.
Cheers
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
systemd - no rhyme or reason
Wol
systemd - no rhyme or reason
systemd - no rhyme or reason
Requires=systemd-networkd-wait-online@NFS-interface
Before=mnt-storage-server.mount
Exec=monitoring.sh
#!/bin/bash
ATTEMPTS=0
while true
do
nfs-ls -D nfs://storage-server 2> /dev/null && exit 0
ATTEMPTS=$[${ATTEMPTS}+1]
if [ "${ATTEMPTS}" -ge 60 ]
then
exit 255
fi
sleep 60
done
systemd - no rhyme or reason
systemd - no rhyme or reason
Out of interest, why can't you use an automount unit for your NFS mount? This has the properties you would want for a network FS, of remounting automatically if the NFS server is missing, and of not blocking anything until the first access to a file on the mount point.
NFS and systemd automount units
i am personally a tremendous fan of systemd, but still, no retrospective of systemd is complete without mention of "V.R."'s essay, systemd, 10 years later (yes, the https is broken). this remains one of the finest pieces of technical writing i've ever come across.
systemd, 10 years later
systemd, 10 years later
systemd, 10 years later
systemd, 10 years later
systemd, 10 years later
systemd, 10 years later
systemd, 10 years later
systemd, 10 years later
systemd, 10 years later
Can systemd safely replace GRUB?
Can systemd safely replace GRUB?
Nothing of what you mentioned needs to be encrypted. Not the kernel, nor the initrd, nor the kernel command line.
Can systemd safely replace GRUB?
Can systemd safely replace GRUB?
Can systemd safely replace GRUB?
Can systemd safely replace GRUB?
Lazy linking -> Lazy loading
Lazy linking -> Lazy loading
Governence
Lack of understanding of fundamentals even after 14 years
If somebody has a different understanding of "fundamentals" than you do, perhaps we can talk about that. But leave the personal insults out of it; they degrade both LWN and your argument.
Lack of understanding of fundamentals even after 14 years
Lack of understanding of fundamentals even after 14 years
Lack of understanding of fundamentals even after 14 years
> so for example it must manage package installs,
Still don't handle remote boot correctly
Still don't handle remote boot correctly
Still don't handle remote boot correctly
Still don't handle remote boot correctly
Still don't handle remote boot correctly
Still don't handle remote boot correctly
Still don't handle remote boot correctly
Still don't handle remote boot correctly
Can you describe how? It's clearly not obvious to lee_duncan how to get at the post-pivot filesystem contents from a daemon started pre-pivot, and documentation on how to do that might clear up their confusion.
Still don't handle remote boot correctly
"All major Linux distributions..."
All major Linux distributions use systemd
"All major Linux distributions..."