LWN.net Weekly Edition for January 19, 2023
Welcome to the LWN.net Weekly Edition for January 19, 2023
This edition contains the following feature content:
- A survey of the Python packaging landscape: there are a number of ways to manage Python modules on a system, none of which are entirely satisfactory.
- Six years with the 4.9 kernel: the 4.9 kernel has reached the end of its support life; it has been through a lot of change over the last six years.
- Support for Intel's LASS: support for a hardware-based feature to mitigate speculative-execution vulnerabilities.
- Fedora's tempest in a stack frame: as the deadline for system-wide changes approaches, the Fedora community debates the setting of a relatively obscure compiler option used to build the distribution.
- Changing Fedora's shutdown timeouts: how long should the system wait for a process that refuses to shut down?
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
A survey of the Python packaging landscape
Over the past several months, there have been wide-ranging discussions in the Python community about difficulties users have with installing packages for the language. There is a bewildering array of options for package-installation tools and Python distributions focused on particular use cases (e.g. scientific computing); many of those options do not interoperate well—or at all—so they step on each others' toes. The discussions have focused on where solutions might be found to make it easier on users, but lots of history and entrenched use cases need to be overcome in order to get there—or even to make progress in that direction.
In order to follow along on these lengthy discussions, though, an overview of Python's packaging situation and the challenges it presents may be helpful. Linux users typically start by installing whichever Python version is supplied by their distribution, then installing various other Python packages and applications that come from their distribution's repositories. That works fine so long as the versions of all of those pieces are sufficient for the needs of the user. Eventually, though, users may encounter some package they want to use that is not provided by their distribution, so they need to install it from somewhere else.
PyPI and pip
The Python Package Index (PyPI) contains a huge number of useful packages that can be installed in a system running Python. That is typically done using the pip package installer, which will either install the package in a site-wide location or somewhere user-specific depending on whether it was invoked with privileges. pip will also download any needed dependencies, but it only looks for those dependencies at PyPI, since it has no knowledge of the distribution package manager. That can lead to pip installing a dependency that actually is available in the distribution's repository, which is just one of the ways pip and the distribution package manager (e.g. DNF, Apt, etc.) can get crosswise.
Beyond that, there can be conflicting dependency needs between different packages or applications. If application A needs version 1 of a dependency, but application B needs version 2, only one can be satisfied because only a single version of a package can be active for a particular Python instance. It is not possible to specify that the import statement in A picks up a different version than the one that B picks up. Linux distributions solve those conflicting-version problems in various ways, which sometimes results in applications not being available because another, more important package required something that conflicted. The Linux-distribution path is not a panacea, especially for those who want bleeding-edge Python applications and modules. For those not following that path, this is where the Python virtual environment (venv) comes into play.
Virtual environments
A virtual environment is a lightweight way to create a Python instance with its own set of packages that are precisely tuned to the needs of the application or user. They were added to Python itself with PEP 405 ("Python Virtual Environments") in 2011, but they had already become popular via the virtualenv module on PyPI. At this point, it has become almost an expectation that developers are using virtual environments to house and manage their dependencies; it has reached a point where there is talk of forcing pip and other tools to only install into them.
When the module to be installed is pure Python, installation with pip is fairly straightforward, but Python modules can also have pieces that are written to the C API, thus they need to be built for the target system from source code in C, C++, Rust, or other languages. That requires the proper toolchain to be available on that system, which is typically easy to ensure on Linux, but is less so on other operating systems. So projects can provide pre-built binary "wheels" in addition to source distributions on PyPI.
But wheels are highly specialized for the operating system, architecture, C library, and other characteristics of the environment, which leads to a huge matrix of possibilities. PyPI relies on all of the individual projects to build "all" of the wheels that users might need, which distributes the burden, but also means that there are gaps for projects that do not have the resources of a large build farm to create wheels. Beyond that, some Python applications and libraries, especially in the scientific-computing world, depend on external libraries of various sorts, which are also needed on target systems.
Distributions
This is where Python distributions pick up the slack. For Linux users, their regular distribution may well provide what is needed for, say, NumPy. But if the version that the distribution provides is insufficient for some reason, or if the user is running some other operating system that lacks a system-wide package manager, it probably makes sense to seek out Anaconda, or its underlying conda package manager.
The NumPy installation page demonstrates some of the complexities with Python packaging. It has various recommendations for ways to install NumPy; for beginners on any operating system, it suggests Anaconda. For more advanced users, Miniforge, which is a version of conda that defaults to using the conda-forge package repository, seems to be the preferred solution, but pip and PyPI are mentioned as an alternate path.
There are a number of differences between pip and conda that are
described in the
"Python
package management" section of the NumPy installation page. The
biggest difference is that conda manages external, non-Python dependencies,
compilers, GPU compute libraries, languages, and so on, including Python
itself. On the other hand, pip only works with some version of
Python that
has already been installed from, say, python.org or as part of a Linux
distribution. Beyond that, conda is an integrated solution that handles
packages, dependencies, and virtual environments, "while with pip you
may need another tool (there are many!) for dealing with environments or
complex dependencies
".
In fact, the "pip" recommendation for NumPy is not to actually use
that tool, but to use Poetry instead, because it
"provides a dependency resolver and environment management capabilities
in a similar fashion as conda does
". So a conda-like approach is what
NumPy suggests and the difference is that Poetry/pip use PyPI,
while conda normally uses conda-forge. The split is bigger than that, though,
because conda does not use binary wheels, but instead uses its own format
that is different from (and, in some cases, predates) the packaging
standards that pip and much of the rest of the Python packaging
world use.
PyPA
The Python Packaging Authority (PyPA) is a working group in the community that maintains pip, PyPI, and other tools; it also approves packaging-related Python Enhancement Proposals (PEPs) as a sort of permanent PEP-delegate from the steering council (which was inherited from former benevolent dictator Guido van Rossum). How the PEP process works is described on its "PyPA Specifications" page. Despite its name, though, the PyPA has no real authority in the community; it leads by example and its recommendations (even in the form of PEPs) are simply that—tool authors can and do ignore or skirt them as desired.
The PyPA maintains multiple tools, the "Python Packaging User
Guide", and more. The organization's goals are specified on its
site, but they are necessarily rather conservative because the Python
software-distribution ecosystem "has a foundation that is almost 15
years old, which poses a variety of challenges to successful
evolution
".
In a lengthy (and opinionated) mid-January blog post, Chris Warrick looked at the proliferation of tools, noting that there are 14 that he found, most of which are actually maintained by the PyPA, but it is not at all clear from that organization's documentation which of those tools should be preferred. Meanwhile, the tools that check most of the boxes in Warrick's comparison chart, Poetry and PDM, are not maintained by the working group, but instead by others who are not participating in the PyPA, he said.
The situation is, obviously, messy; the PyPA is well aware of that and has
been trying to wrangle various solutions for quite some time. The
discussions of the problems have seemingly become more widespread—or more
visible—over the past few months, in part because of an off-hand comment in
Brett Cannon's (successful) self-re-nomination
to the steering council for 2023. He surely did not know how much
discussion would be spawned from a note tucked into the bottom of that
message:
"(I'm also still working towards lock/pinned dependencies files on the
packaging side and doing stuff with the Python Launcher for Unix, but
that's outside of Python core).
"
Several commented in that thread on their hopes that the council (or someone) could come up with some kind of unifying vision for Python packaging. Those responses were split off into a separate "Wanting a singular packaging tool/vision" thread, which grew from there. That discussion led to other threads, several of which are still quite active as this is being written. Digging into those discussions is a subject for next week—and likely beyond.
Readers who want to get a jump-start on the discussions will want to read Warrick's analysis and consult the pypackaging-native site that was announced by Ralf Gommers in late December. Also of interest is the results of the Python packaging survey, which further sets the stage for much of the recent discussion and work. Packaging woes have been a long-simmering (and seriously multi-faceted) problem for Python, so it is nice to see some efforts toward fixing, or at least improving, the situation in the (relatively) near term. But there is still a long way to go. Stay tuned ...
Six years with the 4.9 kernel
The release of the 4.9.337 stable kernel update on January 7 marked the end of an era: after just over six years of maintenance, the 4.9.x series will receive no more updates. This kernel saw a lot of change after Linus Torvalds made the "final" release and left the building; it's time for a look at the "stable" portion of this kernel's life to see what can be learned.The development cycle that led up to the 4.9 release saw the addition of 16,214 non-merge changesets contributed by 1,719 developers (a record at the time) working for (at least) 228 companies. In the six years between 4.9 and 4.9.337, instead, it gained 23,391 non-merge changesets from 4,037 developers working for at least 503 companies. The 4.9.337 release contains 114,000 more lines of code than 4.9 did. Rather than being the end of a kernel's development life, the final release from Torvalds is really just the beginning of a new and longer phase — at least, for long-term-support kernels.
Contributors
The top contributors of fixes to 4.9.x were:
Top bug-fix contributors to 4.9.x Developer Changesets Pct Greg Kroah-Hartman 470 2.0% Eric Dumazet 395 1.7% Johan Hovold 356 1.5% Dan Carpenter 326 1.4% Takashi Iwai 295 1.3% Arnd Bergmann 286 1.2% Thomas Gleixner 196 0.8% Jason A. Donenfeld 171 0.7% Eric Biggers 159 0.7% Colin Ian King 138 0.6% Christophe JAILLET 134 0.6% Nathan Chancellor 125 0.5% Hans de Goede 120 0.5% Geert Uytterhoeven 117 0.5% Xin Long 113 0.5% Yang Yingliang 108 0.5% Jan Kara 102 0.4% Randy Dunlap 101 0.4% Linus Torvalds 92 0.4% Johannes Berg 92 0.4% Peter Zijlstra 91 0.4% Al Viro 90 0.4% Florian Fainelli 89 0.4% Theodore Ts'o 88 0.4%
While Greg Kroah-Hartman shows as the top contributor of changesets, it is worth remembering that 337 of them are simply setting the version number for each stable release. His appearance there is thus an artifact of how the stable kernels are produced — not that he doesn't play a major role in this process, of course, as will be seen below.
The most active employers of contributors to 4.9.x were:
Employers supporting 4.9.x fixes Company Changesets Pct (Unknown) 2177 9.3% (None) 2149 9.2% 1940 8.3% Red Hat 1911 8.2% Intel 1553 6.6% SUSE 1181 5.0% Huawei Technologies 1050 4.5% IBM 834 3.6% (Consultant) 767 3.3% Linux Foundation 697 3.0% Linaro 625 2.7% Arm 434 1.9% Oracle 387 1.7% Mellanox 314 1.3% Samsung 286 1.2% Broadcom 260 1.1% Linutronix 234 1.0% 226 1.0% Renesas Electronics 201 0.9% NXP Semiconductors 196 0.8%
It can be interesting to compare these numbers to the statistics for the 4.9 release. There are many of the same names there, but the ordering is different. The biggest contributors of work for a mainline release may not be the biggest contributors of fixes after that release is made.
Backports
The stable rules require that changes appear in the mainline before being added to a stable update, so most (or all) of the patches counted above were written for the mainline. Backporting them to 4.9 is a different level of work on top of that. This task can be as simple as applying a patch unmodified to a different tree, or as complex as rewriting it altogether. Either way, there is clearly a lot of work involved in backporting over 23,000 patches to a different kernel.
One way to try to separate out that work was suggested by Srivatsa S. Bhat. A developer who backports a patch to an older kernel is essentially resubmitting it, and so must add a Signed-off-by tag to the patch changelog. Each patch in the stable kernel also contains the commit ID of the original in the mainline. Using that information, one can look at each stable patch and identify any Signed-off-by tags that were added since that patch was merged into the mainline. Those additional signoffs should indicate who backported each one.
In the 4.9.x series, 21,495 of the commits have added Signed-off-by tags. The remaining ones will include the above-mentioned version-number changes, patches that should have gotten an additional tag but didn't, and (most probably) patches that were backported by their original author. The result is thus a picture that is not perfect, but which is clear enough:
Top 4.9.x backporters Developer Changesets Pct Greg Kroah-Hartman 15135 70.41% Sasha Levin 9208 42.84% Ben Hutchings 310 1.44% David Woodhouse 142 0.66% Amit Pundir 90 0.42% Sudip Mukherjee 83 0.39% Jason A. Donenfeld 73 0.34% Mark Rutland 71 0.33% Lee Jones 52 0.24% Nathan Chancellor 44 0.20% Florian Fainelli 42 0.20% David A. Long 40 0.19% Nick Desaulniers 36 0.17% Alex Shi 27 0.13% Thomas Gleixner 24 0.11% James Morse 24 0.11% Giuliano Procida 24 0.11% Nobuhiro Iwamatsu 23 0.11% Thadeu Lima de Souza Cascardo 22 0.10% Arnd Bergmann 15 0.07%
The bulk of the backporting work is clearly being done by the two stable-kernel maintainers: Kroah-Hartman and Sasha Levin. In some cases, they have both added signoffs to the same patch, causing the percentages to add up to more than 100%. The work done by everybody else pales by comparison — especially if one only looks at the patch counts. Often, though, the reason for a developer other than the stable-kernel maintainers to backport a patch is that the backport is not trivial. So, while the other developers backported far fewer patches, many of those patches almost certainly required a lot more work.
Bug reports
In theory, almost every patch in the stable series is a bug fix, implying that somebody must have found and reported a bug. As it happens, only 4,236 of the commits in the 4.9.x series include a Reported-by tag — only 18% of the total. So most of the problems being fixed are either coming to light in some other way, or the report tags are not being included. For the patches that did include such tags, the results look like:
Top bug reporters in 4.9.x Reporter Reports Pct Syzbot 901 18.8% Hulk Robot 181 3.8% kernel test robot 156 3.2% Dmitry Vyukov 100 2.1% Andrey Konovalov 80 1.7% Dan Carpenter 79 1.6% Jann Horn 34 0.7% Guenter Roeck 29 0.6% Jianlin Shi 27 0.6% Ben Hutchings 26 0.5% Fengguang Wu 26 0.5% Al Viro 21 0.4% Arnd Bergmann 19 0.4% Lars-Peter Clausen 19 0.4% Xu, Wen 19 0.4% Eric Biggers 18 0.4% Igor Zhbanov 18 0.4% TOTE Robot 18 0.4% Tetsuo Handa 17 0.4% Linus Torvalds 16 0.3%
Bug reporting is clearly widely distributed, with the top three reporters (all robots) accounting for just over 25% of the total. Even so, it is clear that the bug-hunting robots are finding a lot of problems, hopefully before our users do.
Bug introduction
Another thing one can look at is the source of the bugs that were fixed in 4.9.x. Some work mapping Fixes tags in 4.9.x commits to the original commits can shine a light on when bugs were introduced; the result is a plot that looks like this:
The 4.9 and 4.8 releases are, unsurprisingly, the source of many of the bugs fixed in the stable updates, with nearly 700 coming from each. After that comes the usual long tail, including every release ever made since the Git era began at 2.6.12. Every pre-4.10 release in the Git history is represented here; the least-fixed release is 2.6.17, which was released in 2006, with "only" 22 fixes.
That plot is not the whole story, though. Each of the 4.9.28, 4.9.34, 4.9.51, 4.9.75, 4.9.77, 4.9.78, 4.9.79, 4.9.94, 4.9.102, 4.9.187, 4.9.194, 4.9.195, 4.9.198, 4.9.207, 4.9.214, 4.9.219, 4.9.228, 4.9.253, 4.9.258, 4.9.259, 4.9.261, 4.9.265, 4.9.298, and 4.9.299 releases included a commit that was identified by a later Fixes tag; 4.9.81 and 4.9.218 had two, and 4.9.310 had three. Each of those, clearly, indicates a regression that was introduced into the stable kernel and later fixed. But even that is not the full picture; consider this:
Every release made after 4.9 also introduced bugs that had to be fixed in the stable updates — over 1,500 fixes in all. That is a lot of buggy commits to have introduced into a "stable" kernel. One should also not take the wrong message from the lower counts for more recent kernel releases. It might be possible that our releases are getting less buggy, but a more plausible explanation is that the empty space in the upper-right half of that plot just represents bugs that have not yet been found and fixed.
The 4.9 stable series was, thus, not perfect — not that anybody ever claimed that it was. It was, however, good enough to be the core of many deployed systems, including an unimaginable number of Android devices. The 4.9 kernel series is a testament to what the development community can accomplish when it sets its mind to it. It was a base that many users could rely on, and has well earned its retirement.
Support for Intel's LASS
Speculative-execution vulnerabilities come about when the CPU, while executing speculatively, is able to access memory that would otherwise be denied to it. Most of these vulnerabilities would go away if the CPU were always constrained by the established memory protections. An obvious way to fix these problems would be to make CPUs behave that way, but doing that without destroying performance is not an easy task. So, instead, Intel has developed a feature called "linear address-space separation" (LASS) to paper over parts of the problem; Yian Chen has posted a patch set adding support for this feature.Speculative execution happens when the CPU is unable to complete an instruction because it needs data that is not resident in the CPU's caches. Rather than just wait for that data to be fetched from RAM, the CPU will make a guess as to its value and continue running in the speculative mode. If the guess turns out to be correct — which happens surprisingly often — the CPU will have avoided a stall and will be ahead of the game; otherwise, the work that was done speculatively is thrown out and the computation restarts.
This technique is crucial for getting reasonable performance out of current CPUs, but it turns out to have a security cost: speculative execution is allowed to access data that would be denied to code running normally. A CPU will be able to speculatively read data, despite permissions denying that access in the page tables, without generating a fault. That data is never made available to the running process, but accessing it can create state changes (such as loading data into the cache) that can be detected by a hostile program and used to exfiltrate data that should not be readable. In response, kernel developers have adopted a number of techniques, including address-space isolation and preemptive cache clearing, to block these attacks, but those mitigations can have a substantial performance cost.
LASS partially addresses the speculative-execution problem by wiring some address-space-management policy into the hardware. A look at, for example, the Linux x86-64 address-space layout shows that all kernel-space addresses begin with 0xffff. More to the point, they all have the highest-order (sign) bit set, while all user-space addresses have that bit clear. Linux is not the only kernel to partition the 64-bit address space in this way. LASS uses this convention (and, indeed, requires it) to provide some hardware-based address-space isolation.
Specifically, when LASS is enabled, the CPU will intercept any user-mode reference to an address with the sign bit set, or any kernel-mode access with that bit clear. In other words, it prevents either mode from accessing addresses that, according to the sign bit, belong to the other mode. Crucially, this policy is applied early in the execution of an instruction. Normal page protections can only be read (and, thus, enforced) by traversing through the page-table hierarchy, which produces timing and cache artifacts. LASS can trap a forbidden access simply by looking at the address, without any reference to the page tables, yielding constant timing and avoiding any internal state changes. And this test is easily performed during speculative execution as well.
Of course, adding a new protection mechanism like this requires adaptation in the kernel, which must disable LASS when it legitimately needs to access user-space memory. Most of the infrastructure needed to handle this is already in place, since supervisor-mode access prevention must be handled in a similar way. There is a problem, though, with the vsyscall mechanism, which is a virtual system-call implementation. The vsyscall area is hardwired to be placed between the virtual addresses ffffffffff600000 and ffffffffff601000. Since the sign bit is set in those addresses, LASS will block accesses from user mode, preventing vsyscalls from working. LASS is thus mutually exclusive with vsyscalls; if one is enabled, the other must be disabled. Vsyscalls have long since been replaced by the vDSO, but there may be old versions of the C library out there that still use them. If LASS support is merged, distributors will have to decide which feature to enable by default.
LASS should be able to protect against speculative attacks where user space is attempting to extract information from the kernel — Meltdown-based attacks in particular. It may not directly block most Spectre-based attacks, which generally involve speculative execution entirely in kernel space, but it may still be good enough to block the cache-based covert channels used to get information out of the kernel. The actual degree of protection isn't specified in the patches, though, leading Dave Hansen to ask for more information:
LASS seemed really cool when we were reeling from Meltdown. It would *obviously* have been a godsend five years ago. But, it's less clear what role it plays today and how important it is.
If LASS can allow some of the more expensive Meltdown and Spectre mitigations to be turned off without compromising security, it seems worth having. But, for now, nobody has said publicly which mitigations, if any, are rendered unnecessary by LASS.
In any case, it is not possible to buy a CPU that supports LASS now; it will be necessary to wait until processors from the "Sierra Forest" line become available. Once those CPUs get out to where they can be tested, the value of LASS will, hopefully, become more clear. Until then, the development community will have to do its best to decide whether a partial fix to speculative-execution problems is better than the current state of affairs.
Fedora's tempest in a stack frame
It is rare to see an extensive and unhappy discussion over the selection of compiler options used to build a distribution, but it does happen. A case in point is the debate over whether Fedora should be built with frame pointers or not. It comes down to a tradeoff between a performance loss on current systems and hopes for gains that exceed that loss in the future — and some disagreements over how these decisions should be made within the Fedora community.A stack frame contains information relevant to a function call in a running program; this includes the return address, local variables, and saved registers. A frame pointer is a CPU register pointing to the base of the current stack frame; it can be useful for properly clearing the stack frame when returning from a function. Compilers, though, are well aware of the space they allocate on the stack and do not actually need a frame pointer to manage stack frames properly. It is, thus, common to build programs without the use of frame pointers.
Other code, though, lacks insights into the compiler's internal state and may struggle to interpret a stack's contents properly. As a result, code built without frame pointers can be harder to profile or to obtain useful crash dumps from. Both debugging and performance-optimization work are made much easier if frame pointers are present.
Back in June 2022, a Fedora system-wide change proposal, then aimed at the Fedora 37 release, called for the enabling of frame pointers for all binaries built for the distribution. While developers can build a specific program with frame pointers relatively easily when the need arises, the proposal stated, it is often necessary to rebuild a long list of libraries as well; that makes the job rather more difficult. Some types of profiling need to be done on a system-wide basis to be useful; that can only be done if the whole system has frame pointers enabled. Simply building the distribution that way to begin with would make life easier for developers and, it was argued, set the stage for many performance improvements in the future.
There is, of course, a cost to enabling frame pointers. Each function call must save the current frame pointer to the stack, slightly increasing the cost of that call and the size of the code. The frame pointer also occupies a general-purpose register, increasing register spills and slowing down code that might put the register to better use. Avoiding these costs is the main reasons why distributions are built without frame pointers in the first place.
The proposal resulted in an extensive discussion on both the mailing list and the associated Fedora Engineering Steering Council (FESCo) ticket. As would be expected, the primary objection was the performance cost, some of which was benchmarked on the Fedora wiki. Compiling the kernel turned out to be 2.4% slower, and a Blender test case regressed by 2%. The worst case appears to be Python programs, which can see as much as a 10% performance hit. To many, these costs were seen as unacceptable.
The immediate reaction was enough to cause the proposed changed to be deferred to Fedora 38. But the discussion went on. The proponents of the change were undeterred by any potential performance loss; for example, Andrii Nakryiko argued:
Even if we lose 1-2% of benchmark performance, what we gain instead is lots of systems enthusiasts that now can do ad-hoc profiling and investigations, without the need to recompile the system or application in special configuration. It's extremely underappreciated how big of a barrier it is for people contribution towards performance and efficiency, if even trying to do anything useful in this space takes tons of effort. If we care about the community to contribute, we need to make it simple for that community to observe applications.
He added that Meta builds its internal applications with frame pointers
enabled because the cost as seen as being more than justified by the
benefits. Brendan Gregg described the
benefits seen from frame pointers at Netflix, and Ian Rogers told a similar
story about the experience at Google. On the other hand, the
developers in Red Hat's platform tools team, represented by
Florian Weimer, remained steadfastly opposed to enabling frame
pointers. Neal Gompa, instead, supported the
change but worried that Fedora would be "roasted
" on certain
benchmark-oriented web sites for reducing performance across the entire
distribution.
The change was discussed at the
November 15 FESCo meeting (the IRC log is
available) and the proposal was ultimately rejected. That led to some
unhappiness among proponents of the change, who were unwilling to let the
idea go, despite Kevin Kofler's admonition
that "the toolchain people are the most qualified experts on the
topic
" and that it was time to move on. Michael Catanzaro complained
that he could "no longer trust the toolchain developers to make rational
decisions regarding real-world performance impact due to their handling of
this issue
". But even Catanzaro said that it was time to move on.
But that is not what happened. On January 3, FESCo held another meeting in which an entirely new ticket calling for a "revote" on the frame-pointer proposal was discussed; this was the first that most people had heard that the idea was back. The new ticket had been opened six days prior — on December 28 — by Gompa; it was approved by a vote of six to one (with one abstention). So, as of this writing, the plan is to enable frame pointers for the Fedora 38 release, which is currently scheduled for a late-April release.
There appear to be a few factors that brought about FESCo's change of heart, starting with the ongoing requests from the proposal's proponents. While this whole discussion was going on, FESCo approved another build-option change (setting _FORTIFY_SOURCE=3 for increased security). That change also has a performance cost (though how much is not clear); the fact that it won approval while frame pointers did not was seen by some as the result of a double standard. The proposal was also modified to exempt Python — which is where the worse performance costs were seen — from the use of frame pointers. All of that, it seems, was enough to convince most FESCo members to support the idea.
As might be imagined; not all participants in the discussion saw things the
same way. There were complaints about the short notice for the new ticket,
which was also opened in the middle of the holiday break, and that
participants in the discussion on the older ticket were not notified of the
new one. Vitaly Zaitsev said
that the proposal came back "because big corporations weren't
happy with the results
" and called it a bad precedent; Kofler called the process
"deliberately rigged
". Fedora leader Matthew Miller disputed that
claim, but did acknowledge that things could have been done better:
I agree with your earlier post that this did not have enough visibility, enough notice, or enough time. I was certainly taken by surprise, and I was trying to keep an eye on this one in particular. [...] BUT, I do not think it was done with malice, as "deliberately rigged" implies. I don't see that at all -- I see excitement and interest in moving forward on something that already has taken a long time, and looming practical deadlines.
The rushed timing for the second vote, it seems, was done so that a result could be had in time for the imminent mass rebuild. It obviously makes sense to make a change like that before rebuilding the entire distribution from source rather than after. But even some of the participants in the wider discussion who understand that point felt that the process had not worked well.
There is still time for FESCo to revisit (again) the decision, should that seem warranted, but that seems unlikely. As FESCo member Zbigniew Jędrzejewski-Szmek pointed out, much of the discussion has already moved on to the technical details of how to manage the change. Thus, Fedora 38 will probably be a little slower than its predecessors, but hopefully the performance improvements that will follow from this change in future releases will more than make up for that cost.
Changing Fedora's shutdown timeouts
On today's Fedora systems, a reboot cycle—for a kernel update, say—is normally a fairly quick affair, but that is not always true. The system will wait for services to shut down cleanly and will wait for up to two minutes before killing a service and moving on. A recent proposal to change the default timeout to 15 seconds, while still allowing some services to require more time, ran into more opposition than was perhaps anticipated. Not everyone was comfortable shortening the timeout period, though the decision has now been made to reduce it, but not as far as was proposed.
Change proposal
The proposal
to shorten the timeout for Fedora 38, which is due in late April, was
posted to the devel mailing list on December 22. The feature is owned
by Michael Catanzaro and Allan Day; it would reduce the "extremely
frustrating
" delays that can occur when shutting down a Fedora system.
The Fedora workstation
working group has had an open bug for two
years targeting the problem and has made efforts to change
the upstream systemd default timeout, but to no avail. Thus, they are
proposing
that Fedora make a change to benefit its users:
The primary benefit of the change will be to mitigate a very annoying and - frankly - embarrassing bug. Our users shouldn't have to randomly sit waiting for their machine to shutdown.
An informal proposal to change the timeout was made to the Fedora Engineering Steering Committee (FESCo) late in the Fedora 37 cycle, but it was closed because more information (in the form of a Fedora change proposal) was needed. In that discussion and the one on the current proposal, the problem of simply hiding underlying bugs, where services should be shutting down cleanly but are not, was raised. The change proposed this time around—also available on the Fedora wiki—notes that concern:
Although this change will "paper over" bugs in services without fixing them, we emphasize that reducing the timeout is not merely a workaround for buggy services, but also the desired permanent design. Of course it is desirable to fix the underlying bugs as well, but it doesn't make sense to require this before fixing the service timeout to match our needs.
There are mechanisms to inhibit system shutdown when that is needed by a given service. In addition, packages can set a different timeout in their systemd unit files if that is required. But those timeouts can also stack up if multiple hanging service shutdowns are serialized, so the cumulative effect can be more than just one timeout period. The proposal would lower the current default timeouts (for services that do not set their own) to 15 seconds from either two minutes or 90 seconds currently, depending on the type of service.
Reaction
Adam Williamson was concerned
that the proposal was too aggressive; there may be situations where the
system needs to cleanly shut down multiple virtual machines (VMs), which
could take longer, so he thought that 30 seconds might be a more
reasonable choice. "Going all the way from 90/120 down to 15 seems
pretty radical.
" Chris Murphy wondered
it if made sense to make the shorter timeouts opt-in or to provide a way
for servers and other types of installations to opt out of the change. A
concrete reason to wait longer was provided by
"allan2016": "15 seconds will for sure kill the modem on the Pinephones
for good.
" Removing the power without waiting
the 20-30 seconds its modem needs to shut down will apparently brick
the modem.
Peter Boy was adamant
that the timeout remain unchanged, at least for the Fedora server edition.
Servers may have a lot of work to do before they can cleanly shut down
(e.g. terminate VMs with their own delays, complete in-progress database
transactions) and there is no available data on how long that might all
take. The current values are generally working for servers; "this
proposal brings no advantage at all for servers,
only potential problems
".
But Neal Gompa sees
things differently; if the administrator is shutting the system down,
they are doing so for a reason and, if the timeout is hit, it's likely
because the service is hung. He suggested that either 15 or 30
seconds would be reasonable, especially in light of how systemd handles the
timeout: "It's per service being
shut down, rather than a global timeout.
" Boy disagreed,
arguing that the current values "are empirically obviously a safe
solution
",
but Gompa
said:
"If the end result is the same, it doesn't matter whether it's
30 seconds or 2 minutes.
"
Debugging
Trying to figure out what is causing a shutdown to time out is another part of the problem. The proposal notes that PackageKit is the most common offender, which is going to be difficult to fix, according to Gompa in the workstation bug entry, but there are others. Steve Grubb thought there should be a way to easily find out which service is holding things up, but Tomasz Torcz said that a message like that already exists. Debugging is still a problem though:
The problem is: at this points it is hardly debuggable. One cannot start a new shell, sshd is off already, journalctl too. No way to gather any information what's wrong with the process holding up shutdown. We only get a name. And usually you cannot reproduce the problem easy on next shutdown.
Grubb was unaware of
the "trick" needed to access that information. Typing "Esc" at the stalled
graphical console (which only shows "a black screen and a spinning
circle
") will show the textual messages, but Grubb thought that option was
completely hidden by the interface. Fabio Valentini concurred
with that:
Even if systemd prints nice diagnostic messages, they're useless if nobody is going to see them. And I doubt that many people know that pressing the Esc key makes plymouth go away.Would it be possible to print an informative message in Plymouth instead? Something like "Shutdown is taking longer than expected, please do not force off the computer".
In another part of the thread, Catanzaro noted that
killing the services with a SIGKILL after the timeout did not
really leave any information behind to figure out what went wrong:
"Killing things silently makes it real hard to report
bugs.
" He thought it would make sense to change
FinalKillSignal for systemd to SIGQUIT so that a core
dump would be created. Lennart Poettering suggested a
different solution:
Don't use FinalKillSignal=SIGQUIT.Use TimeoutStopFailureMode=abort instead. (which covers more ground, and sends SIGABRT rather than SIGQUIT on failure, which has the same effect: coredumping).
He also cautioned that dumping core is not without costs, including time to
write the core file. "You
might end delaying things more than you hope shortening them.
" But
Zbigniew Jędrzejewski-Szmek was not
concerned about that particular problem; it would ultimately make the
problems more visible:
It'll obviously delay the shutdown, making the whole thing even more painful. I assume that we would treat any such cases as bugs. If we get the coredumps reported though abrt, it'd indeed make it easier to diagnose those cases.
Catanzaro amended the proposal to follow Poettering's advice, but Kevin Fenzi wondered if it made more sense to selectively add shorter timeouts to services that are known to take too long, but that can be safely killed. Jędrzejewski-Szmek said that approach would mean that thousands of packages would need to be updated to get lower timeouts, which is not something that is realistically going to happen.
Instead, the idea is to attack the problem from the other end: reduce the timeout for everyone. Once this happens, we should start getting feedback about what services where this doesn't work. Some services legitimately need a long timeout (databases, etc), and for those the maintainers would usually have a good idea and can extend the timeout easily. Some services are just buggy, and with the additional visibility and tracebacks, it should be much easier to diagnose why they are slow.Approaching the problem from this side is much more feasible. We'll probably have to touch a dozen files instead of thousands.
The existing timeout values were chosen arbitrarily when they were
originally added to systemd, Poettering said.
System V init had no timeouts at all, so the systemd developers chose
"a conservative (i.e. overly long) value to not
upset things too badly
", though there were still some who were unhappy
that there were timeouts. He is in favor of the change:
"lowering the time-outs by default would make sense to me, but
of course, people will be upset
".
The FESCo issue for the change has more comments along the lines of those in the mailing-list discussion. The committee took up the question at its January 17 meeting. After a lengthy discussion, FESCo approved the proposal with two changes: the new default timeout would be 45 seconds and various Fedora editions (e.g. server) must be able to override the change. The timeout could potentially be lowered again in some future Fedora release.
There are few things more infuriating than waiting for one's computer to finally decide to give up and reboot, so it is nice to see a reduction in just how long that wait might be. Server administrators may have different needs and/or expectations, but even there, an infinite wait is not particular tenable. Obviously, it would be even better if the services themselves got fixed so that they did not unnecessarily delay the inevitable, but it looks like this change will bring some more tools toward making that a reality.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Briefs: Git security releases; 2023 in Libre Arts; Rust in Chromium; Firefox 109; Flent; Quotes; ...
- Announcements: Newsletters, conferences, security updates, patches, and more.