|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for January 19, 2023

Welcome to the LWN.net Weekly Edition for January 19, 2023

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

A survey of the Python packaging landscape

By Jake Edge
January 17, 2023

Python packaging

Over the past several months, there have been wide-ranging discussions in the Python community about difficulties users have with installing packages for the language. There is a bewildering array of options for package-installation tools and Python distributions focused on particular use cases (e.g. scientific computing); many of those options do not interoperate well—or at all—so they step on each others' toes. The discussions have focused on where solutions might be found to make it easier on users, but lots of history and entrenched use cases need to be overcome in order to get there—or even to make progress in that direction.

In order to follow along on these lengthy discussions, though, an overview of Python's packaging situation and the challenges it presents may be helpful. Linux users typically start by installing whichever Python version is supplied by their distribution, then installing various other Python packages and applications that come from their distribution's repositories. That works fine so long as the versions of all of those pieces are sufficient for the needs of the user. Eventually, though, users may encounter some package they want to use that is not provided by their distribution, so they need to install it from somewhere else.

PyPI and pip

The Python Package Index (PyPI) contains a huge number of useful packages that can be installed in a system running Python. That is typically done using the pip package installer, which will either install the package in a site-wide location or somewhere user-specific depending on whether it was invoked with privileges. pip will also download any needed dependencies, but it only looks for those dependencies at PyPI, since it has no knowledge of the distribution package manager. That can lead to pip installing a dependency that actually is available in the distribution's repository, which is just one of the ways pip and the distribution package manager (e.g. DNF, Apt, etc.) can get crosswise.

Beyond that, there can be conflicting dependency needs between different packages or applications. If application A needs version 1 of a dependency, but application B needs version 2, only one can be satisfied because only a single version of a package can be active for a particular Python instance. It is not possible to specify that the import statement in A picks up a different version than the one that B picks up. Linux distributions solve those conflicting-version problems in various ways, which sometimes results in applications not being available because another, more important package required something that conflicted. The Linux-distribution path is not a panacea, especially for those who want bleeding-edge Python applications and modules. For those not following that path, this is where the Python virtual environment (venv) comes into play.

Virtual environments

A virtual environment is a lightweight way to create a Python instance with its own set of packages that are precisely tuned to the needs of the application or user. They were added to Python itself with PEP 405 ("Python Virtual Environments") in 2011, but they had already become popular via the virtualenv module on PyPI. At this point, it has become almost an expectation that developers are using virtual environments to house and manage their dependencies; it has reached a point where there is talk of forcing pip and other tools to only install into them.

When the module to be installed is pure Python, installation with pip is fairly straightforward, but Python modules can also have pieces that are written to the C API, thus they need to be built for the target system from source code in C, C++, Rust, or other languages. That requires the proper toolchain to be available on that system, which is typically easy to ensure on Linux, but is less so on other operating systems. So projects can provide pre-built binary "wheels" in addition to source distributions on PyPI.

But wheels are highly specialized for the operating system, architecture, C library, and other characteristics of the environment, which leads to a huge matrix of possibilities. PyPI relies on all of the individual projects to build "all" of the wheels that users might need, which distributes the burden, but also means that there are gaps for projects that do not have the resources of a large build farm to create wheels. Beyond that, some Python applications and libraries, especially in the scientific-computing world, depend on external libraries of various sorts, which are also needed on target systems.

Distributions

This is where Python distributions pick up the slack. For Linux users, their regular distribution may well provide what is needed for, say, NumPy. But if the version that the distribution provides is insufficient for some reason, or if the user is running some other operating system that lacks a system-wide package manager, it probably makes sense to seek out Anaconda, or its underlying conda package manager.

The NumPy installation page demonstrates some of the complexities with Python packaging. It has various recommendations for ways to install NumPy; for beginners on any operating system, it suggests Anaconda. For more advanced users, Miniforge, which is a version of conda that defaults to using the conda-forge package repository, seems to be the preferred solution, but pip and PyPI are mentioned as an alternate path.

There are a number of differences between pip and conda that are described in the "Python package management" section of the NumPy installation page. The biggest difference is that conda manages external, non-Python dependencies, compilers, GPU compute libraries, languages, and so on, including Python itself. On the other hand, pip only works with some version of Python that has already been installed from, say, python.org or as part of a Linux distribution. Beyond that, conda is an integrated solution that handles packages, dependencies, and virtual environments, "while with pip you may need another tool (there are many!) for dealing with environments or complex dependencies".

In fact, the "pip" recommendation for NumPy is not to actually use that tool, but to use Poetry instead, because it "provides a dependency resolver and environment management capabilities in a similar fashion as conda does". So a conda-like approach is what NumPy suggests and the difference is that Poetry/pip use PyPI, while conda normally uses conda-forge. The split is bigger than that, though, because conda does not use binary wheels, but instead uses its own format that is different from (and, in some cases, predates) the packaging standards that pip and much of the rest of the Python packaging world use.

PyPA

The Python Packaging Authority (PyPA) is a working group in the community that maintains pip, PyPI, and other tools; it also approves packaging-related Python Enhancement Proposals (PEPs) as a sort of permanent PEP-delegate from the steering council (which was inherited from former benevolent dictator Guido van Rossum). How the PEP process works is described on its "PyPA Specifications" page. Despite its name, though, the PyPA has no real authority in the community; it leads by example and its recommendations (even in the form of PEPs) are simply that—tool authors can and do ignore or skirt them as desired.

The PyPA maintains multiple tools, the "Python Packaging User Guide", and more. The organization's goals are specified on its site, but they are necessarily rather conservative because the Python software-distribution ecosystem "has a foundation that is almost 15 years old, which poses a variety of challenges to successful evolution".

In a lengthy (and opinionated) mid-January blog post, Chris Warrick looked at the proliferation of tools, noting that there are 14 that he found, most of which are actually maintained by the PyPA, but it is not at all clear from that organization's documentation which of those tools should be preferred. Meanwhile, the tools that check most of the boxes in Warrick's comparison chart, Poetry and PDM, are not maintained by the working group, but instead by others who are not participating in the PyPA, he said.

The situation is, obviously, messy; the PyPA is well aware of that and has been trying to wrangle various solutions for quite some time. The discussions of the problems have seemingly become more widespread—or more visible—over the past few months, in part because of an off-hand comment in Brett Cannon's (successful) self-re-nomination to the steering council for 2023. He surely did not know how much discussion would be spawned from a note tucked into the bottom of that message: "(I'm also still working towards lock/pinned dependencies files on the packaging side and doing stuff with the Python Launcher for Unix, but that's outside of Python core)."

Several commented in that thread on their hopes that the council (or someone) could come up with some kind of unifying vision for Python packaging. Those responses were split off into a separate "Wanting a singular packaging tool/vision" thread, which grew from there. That discussion led to other threads, several of which are still quite active as this is being written. Digging into those discussions is a subject for next week—and likely beyond.

Readers who want to get a jump-start on the discussions will want to read Warrick's analysis and consult the pypackaging-native site that was announced by Ralf Gommers in late December. Also of interest is the results of the Python packaging survey, which further sets the stage for much of the recent discussion and work. Packaging woes have been a long-simmering (and seriously multi-faceted) problem for Python, so it is nice to see some efforts toward fixing, or at least improving, the situation in the (relatively) near term. But there is still a long way to go. Stay tuned ...

Comments (34 posted)

Six years with the 4.9 kernel

By Jonathan Corbet
January 12, 2023
The release of the 4.9.337 stable kernel update on January 7 marked the end of an era: after just over six years of maintenance, the 4.9.x series will receive no more updates. This kernel saw a lot of change after Linus Torvalds made the "final" release and left the building; it's time for a look at the "stable" portion of this kernel's life to see what can be learned.

The development cycle that led up to the 4.9 release saw the addition of 16,214 non-merge changesets contributed by 1,719 developers (a record at the time) working for (at least) 228 companies. In the six years between 4.9 and 4.9.337, instead, it gained 23,391 non-merge changesets from 4,037 developers working for at least 503 companies. The 4.9.337 release contains 114,000 more lines of code than 4.9 did. Rather than being the end of a kernel's development life, the final release from Torvalds is really just the beginning of a new and longer phase — at least, for long-term-support kernels.

Contributors

The top contributors of fixes to 4.9.x were:

Top bug-fix contributors to 4.9.x
DeveloperChangesetsPct
Greg Kroah-Hartman 4702.0%
Eric Dumazet 3951.7%
Johan Hovold 3561.5%
Dan Carpenter 3261.4%
Takashi Iwai 2951.3%
Arnd Bergmann 2861.2%
Thomas Gleixner 1960.8%
Jason A. Donenfeld 1710.7%
Eric Biggers 1590.7%
Colin Ian King 1380.6%
Christophe JAILLET 1340.6%
Nathan Chancellor 1250.5%
Hans de Goede 1200.5%
Geert Uytterhoeven 1170.5%
Xin Long 1130.5%
Yang Yingliang 1080.5%
Jan Kara 1020.4%
Randy Dunlap 1010.4%
Linus Torvalds920.4%
Johannes Berg 920.4%
Peter Zijlstra 910.4%
Al Viro 900.4%
Florian Fainelli 890.4%
Theodore Ts'o 880.4%

While Greg Kroah-Hartman shows as the top contributor of changesets, it is worth remembering that 337 of them are simply setting the version number for each stable release. His appearance there is thus an artifact of how the stable kernels are produced — not that he doesn't play a major role in this process, of course, as will be seen below.

The most active employers of contributors to 4.9.x were:

Employers supporting 4.9.x fixes
CompanyChangesetsPct
(Unknown)21779.3%
(None)21499.2%
Google19408.3%
Red Hat19118.2%
Intel15536.6%
SUSE11815.0%
Huawei Technologies10504.5%
IBM8343.6%
(Consultant)7673.3%
Linux Foundation6973.0%
Linaro6252.7%
Arm4341.9%
Oracle3871.7%
Mellanox3141.3%
Samsung2861.2%
Broadcom2601.1%
Linutronix2341.0%
Facebook2261.0%
Renesas Electronics2010.9%
NXP Semiconductors1960.8%

It can be interesting to compare these numbers to the statistics for the 4.9 release. There are many of the same names there, but the ordering is different. The biggest contributors of work for a mainline release may not be the biggest contributors of fixes after that release is made.

Backports

The stable rules require that changes appear in the mainline before being added to a stable update, so most (or all) of the patches counted above were written for the mainline. Backporting them to 4.9 is a different level of work on top of that. This task can be as simple as applying a patch unmodified to a different tree, or as complex as rewriting it altogether. Either way, there is clearly a lot of work involved in backporting over 23,000 patches to a different kernel.

One way to try to separate out that work was suggested by Srivatsa S. Bhat. A developer who backports a patch to an older kernel is essentially resubmitting it, and so must add a Signed-off-by tag to the patch changelog. Each patch in the stable kernel also contains the commit ID of the original in the mainline. Using that information, one can look at each stable patch and identify any Signed-off-by tags that were added since that patch was merged into the mainline. Those additional signoffs should indicate who backported each one.

In the 4.9.x series, 21,495 of the commits have added Signed-off-by tags. The remaining ones will include the above-mentioned version-number changes, patches that should have gotten an additional tag but didn't, and (most probably) patches that were backported by their original author. The result is thus a picture that is not perfect, but which is clear enough:

Top 4.9.x backporters
DeveloperChangesetsPct
Greg Kroah-Hartman1513570.41%
Sasha Levin920842.84%
Ben Hutchings3101.44%
David Woodhouse1420.66%
Amit Pundir900.42%
Sudip Mukherjee830.39%
Jason A. Donenfeld730.34%
Mark Rutland710.33%
Lee Jones520.24%
Nathan Chancellor440.20%
Florian Fainelli420.20%
David A. Long400.19%
Nick Desaulniers360.17%
Alex Shi270.13%
Thomas Gleixner240.11%
James Morse240.11%
Giuliano Procida240.11%
Nobuhiro Iwamatsu230.11%
Thadeu Lima de Souza Cascardo220.10%
Arnd Bergmann150.07%

The bulk of the backporting work is clearly being done by the two stable-kernel maintainers: Kroah-Hartman and Sasha Levin. In some cases, they have both added signoffs to the same patch, causing the percentages to add up to more than 100%. The work done by everybody else pales by comparison — especially if one only looks at the patch counts. Often, though, the reason for a developer other than the stable-kernel maintainers to backport a patch is that the backport is not trivial. So, while the other developers backported far fewer patches, many of those patches almost certainly required a lot more work.

Bug reports

In theory, almost every patch in the stable series is a bug fix, implying that somebody must have found and reported a bug. As it happens, only 4,236 of the commits in the 4.9.x series include a Reported-by tag — only 18% of the total. So most of the problems being fixed are either coming to light in some other way, or the report tags are not being included. For the patches that did include such tags, the results look like:

Top bug reporters in 4.9.x
ReporterReportsPct
Syzbot90118.8%
Hulk Robot 1813.8%
kernel test robot 1563.2%
Dmitry Vyukov 1002.1%
Andrey Konovalov 801.7%
Dan Carpenter 791.6%
Jann Horn 340.7%
Guenter Roeck 290.6%
Jianlin Shi 270.6%
Ben Hutchings 260.5%
Fengguang Wu 260.5%
Al Viro 210.4%
Arnd Bergmann 190.4%
Lars-Peter Clausen 190.4%
Xu, Wen 190.4%
Eric Biggers 180.4%
Igor Zhbanov 180.4%
TOTE Robot 180.4%
Tetsuo Handa 170.4%
Linus Torvalds160.3%

Bug reporting is clearly widely distributed, with the top three reporters (all robots) accounting for just over 25% of the total. Even so, it is clear that the bug-hunting robots are finding a lot of problems, hopefully before our users do.

Bug introduction

Another thing one can look at is the source of the bugs that were fixed in 4.9.x. Some work mapping Fixes tags in 4.9.x commits to the original commits can shine a light on when bugs were introduced; the result is a plot that looks like this:

[4.9.x fixes]

The 4.9 and 4.8 releases are, unsurprisingly, the source of many of the bugs fixed in the stable updates, with nearly 700 coming from each. After that comes the usual long tail, including every release ever made since the Git era began at 2.6.12. Every pre-4.10 release in the Git history is represented here; the least-fixed release is 2.6.17, which was released in 2006, with "only" 22 fixes.

That plot is not the whole story, though. Each of the 4.9.28, 4.9.34, 4.9.51, 4.9.75, 4.9.77, 4.9.78, 4.9.79, 4.9.94, 4.9.102, 4.9.187, 4.9.194, 4.9.195, 4.9.198, 4.9.207, 4.9.214, 4.9.219, 4.9.228, 4.9.253, 4.9.258, 4.9.259, 4.9.261, 4.9.265, 4.9.298, and 4.9.299 releases included a commit that was identified by a later Fixes tag; 4.9.81 and 4.9.218 had two, and 4.9.310 had three. Each of those, clearly, indicates a regression that was introduced into the stable kernel and later fixed. But even that is not the full picture; consider this:

[post-4.9.x fixes]

Every release made after 4.9 also introduced bugs that had to be fixed in the stable updates — over 1,500 fixes in all. That is a lot of buggy commits to have introduced into a "stable" kernel. One should also not take the wrong message from the lower counts for more recent kernel releases. It might be possible that our releases are getting less buggy, but a more plausible explanation is that the empty space in the upper-right half of that plot just represents bugs that have not yet been found and fixed.

The 4.9 stable series was, thus, not perfect — not that anybody ever claimed that it was. It was, however, good enough to be the core of many deployed systems, including an unimaginable number of Android devices. The 4.9 kernel series is a testament to what the development community can accomplish when it sets its mind to it. It was a base that many users could rely on, and has well earned its retirement.

Comments (12 posted)

Support for Intel's LASS

By Jonathan Corbet
January 13, 2023
Speculative-execution vulnerabilities come about when the CPU, while executing speculatively, is able to access memory that would otherwise be denied to it. Most of these vulnerabilities would go away if the CPU were always constrained by the established memory protections. An obvious way to fix these problems would be to make CPUs behave that way, but doing that without destroying performance is not an easy task. So, instead, Intel has developed a feature called "linear address-space separation" (LASS) to paper over parts of the problem; Yian Chen has posted a patch set adding support for this feature.

Speculative execution happens when the CPU is unable to complete an instruction because it needs data that is not resident in the CPU's caches. Rather than just wait for that data to be fetched from RAM, the CPU will make a guess as to its value and continue running in the speculative mode. If the guess turns out to be correct — which happens surprisingly often — the CPU will have avoided a stall and will be ahead of the game; otherwise, the work that was done speculatively is thrown out and the computation restarts.

This technique is crucial for getting reasonable performance out of current CPUs, but it turns out to have a security cost: speculative execution is allowed to access data that would be denied to code running normally. A CPU will be able to speculatively read data, despite permissions denying that access in the page tables, without generating a fault. That data is never made available to the running process, but accessing it can create state changes (such as loading data into the cache) that can be detected by a hostile program and used to exfiltrate data that should not be readable. In response, kernel developers have adopted a number of techniques, including address-space isolation and preemptive cache clearing, to block these attacks, but those mitigations can have a substantial performance cost.

LASS partially addresses the speculative-execution problem by wiring some address-space-management policy into the hardware. A look at, for example, the Linux x86-64 address-space layout shows that all kernel-space addresses begin with 0xffff. More to the point, they all have the highest-order (sign) bit set, while all user-space addresses have that bit clear. Linux is not the only kernel to partition the 64-bit address space in this way. LASS uses this convention (and, indeed, requires it) to provide some hardware-based address-space isolation.

Specifically, when LASS is enabled, the CPU will intercept any user-mode reference to an address with the sign bit set, or any kernel-mode access with that bit clear. In other words, it prevents either mode from accessing addresses that, according to the sign bit, belong to the other mode. Crucially, this policy is applied early in the execution of an instruction. Normal page protections can only be read (and, thus, enforced) by traversing through the page-table hierarchy, which produces timing and cache artifacts. LASS can trap a forbidden access simply by looking at the address, without any reference to the page tables, yielding constant timing and avoiding any internal state changes. And this test is easily performed during speculative execution as well.

Of course, adding a new protection mechanism like this requires adaptation in the kernel, which must disable LASS when it legitimately needs to access user-space memory. Most of the infrastructure needed to handle this is already in place, since supervisor-mode access prevention must be handled in a similar way. There is a problem, though, with the vsyscall mechanism, which is a virtual system-call implementation. The vsyscall area is hardwired to be placed between the virtual addresses ffffffffff600000 and ffffffffff601000. Since the sign bit is set in those addresses, LASS will block accesses from user mode, preventing vsyscalls from working. LASS is thus mutually exclusive with vsyscalls; if one is enabled, the other must be disabled. Vsyscalls have long since been replaced by the vDSO, but there may be old versions of the C library out there that still use them. If LASS support is merged, distributors will have to decide which feature to enable by default.

LASS should be able to protect against speculative attacks where user space is attempting to extract information from the kernel — Meltdown-based attacks in particular. It may not directly block most Spectre-based attacks, which generally involve speculative execution entirely in kernel space, but it may still be good enough to block the cache-based covert channels used to get information out of the kernel. The actual degree of protection isn't specified in the patches, though, leading Dave Hansen to ask for more information:

LASS seemed really cool when we were reeling from Meltdown. It would *obviously* have been a godsend five years ago. But, it's less clear what role it plays today and how important it is.

If LASS can allow some of the more expensive Meltdown and Spectre mitigations to be turned off without compromising security, it seems worth having. But, for now, nobody has said publicly which mitigations, if any, are rendered unnecessary by LASS.

In any case, it is not possible to buy a CPU that supports LASS now; it will be necessary to wait until processors from the "Sierra Forest" line become available. Once those CPUs get out to where they can be tested, the value of LASS will, hopefully, become more clear. Until then, the development community will have to do its best to decide whether a partial fix to speculative-execution problems is better than the current state of affairs.

Comments (24 posted)

Fedora's tempest in a stack frame

By Jonathan Corbet
January 16, 2023
It is rare to see an extensive and unhappy discussion over the selection of compiler options used to build a distribution, but it does happen. A case in point is the debate over whether Fedora should be built with frame pointers or not. It comes down to a tradeoff between a performance loss on current systems and hopes for gains that exceed that loss in the future — and some disagreements over how these decisions should be made within the Fedora community.

A stack frame contains information relevant to a function call in a running program; this includes the return address, local variables, and saved registers. A frame pointer is a CPU register pointing to the base of the current stack frame; it can be useful for properly clearing the stack frame when returning from a function. Compilers, though, are well aware of the space they allocate on the stack and do not actually need a frame pointer to manage stack frames properly. It is, thus, common to build programs without the use of frame pointers.

Other code, though, lacks insights into the compiler's internal state and may struggle to interpret a stack's contents properly. As a result, code built without frame pointers can be harder to profile or to obtain useful crash dumps from. Both debugging and performance-optimization work are made much easier if frame pointers are present.

Back in June 2022, a Fedora system-wide change proposal, then aimed at the Fedora 37 release, called for the enabling of frame pointers for all binaries built for the distribution. While developers can build a specific program with frame pointers relatively easily when the need arises, the proposal stated, it is often necessary to rebuild a long list of libraries as well; that makes the job rather more difficult. Some types of profiling need to be done on a system-wide basis to be useful; that can only be done if the whole system has frame pointers enabled. Simply building the distribution that way to begin with would make life easier for developers and, it was argued, set the stage for many performance improvements in the future.

There is, of course, a cost to enabling frame pointers. Each function call must save the current frame pointer to the stack, slightly increasing the cost of that call and the size of the code. The frame pointer also occupies a general-purpose register, increasing register spills and slowing down code that might put the register to better use. Avoiding these costs is the main reasons why distributions are built without frame pointers in the first place.

The proposal resulted in an extensive discussion on both the mailing list and the associated Fedora Engineering Steering Council (FESCo) ticket. As would be expected, the primary objection was the performance cost, some of which was benchmarked on the Fedora wiki. Compiling the kernel turned out to be 2.4% slower, and a Blender test case regressed by 2%. The worst case appears to be Python programs, which can see as much as a 10% performance hit. To many, these costs were seen as unacceptable.

The immediate reaction was enough to cause the proposed changed to be deferred to Fedora 38. But the discussion went on. The proponents of the change were undeterred by any potential performance loss; for example, Andrii Nakryiko argued:

Even if we lose 1-2% of benchmark performance, what we gain instead is lots of systems enthusiasts that now can do ad-hoc profiling and investigations, without the need to recompile the system or application in special configuration. It's extremely underappreciated how big of a barrier it is for people contribution towards performance and efficiency, if even trying to do anything useful in this space takes tons of effort. If we care about the community to contribute, we need to make it simple for that community to observe applications.

He added that Meta builds its internal applications with frame pointers enabled because the cost as seen as being more than justified by the benefits. Brendan Gregg described the benefits seen from frame pointers at Netflix, and Ian Rogers told a similar story about the experience at Google. On the other hand, the developers in Red Hat's platform tools team, represented by Florian Weimer, remained steadfastly opposed to enabling frame pointers. Neal Gompa, instead, supported the change but worried that Fedora would be "roasted" on certain benchmark-oriented web sites for reducing performance across the entire distribution.

The change was discussed at the November 15 FESCo meeting (the IRC log is available) and the proposal was ultimately rejected. That led to some unhappiness among proponents of the change, who were unwilling to let the idea go, despite Kevin Kofler's admonition that "the toolchain people are the most qualified experts on the topic" and that it was time to move on. Michael Catanzaro complained that he could "no longer trust the toolchain developers to make rational decisions regarding real-world performance impact due to their handling of this issue". But even Catanzaro said that it was time to move on.

But that is not what happened. On January 3, FESCo held another meeting in which an entirely new ticket calling for a "revote" on the frame-pointer proposal was discussed; this was the first that most people had heard that the idea was back. The new ticket had been opened six days prior — on December 28 — by Gompa; it was approved by a vote of six to one (with one abstention). So, as of this writing, the plan is to enable frame pointers for the Fedora 38 release, which is currently scheduled for a late-April release.

There appear to be a few factors that brought about FESCo's change of heart, starting with the ongoing requests from the proposal's proponents. While this whole discussion was going on, FESCo approved another build-option change (setting _FORTIFY_SOURCE=3 for increased security). That change also has a performance cost (though how much is not clear); the fact that it won approval while frame pointers did not was seen by some as the result of a double standard. The proposal was also modified to exempt Python — which is where the worse performance costs were seen — from the use of frame pointers. All of that, it seems, was enough to convince most FESCo members to support the idea.

As might be imagined; not all participants in the discussion saw things the same way. There were complaints about the short notice for the new ticket, which was also opened in the middle of the holiday break, and that participants in the discussion on the older ticket were not notified of the new one. Vitaly Zaitsev said that the proposal came back "because big corporations weren't happy with the results" and called it a bad precedent; Kofler called the process "deliberately rigged". Fedora leader Matthew Miller disputed that claim, but did acknowledge that things could have been done better:

I agree with your earlier post that this did not have enough visibility, enough notice, or enough time. I was certainly taken by surprise, and I was trying to keep an eye on this one in particular. [...] BUT, I do not think it was done with malice, as "deliberately rigged" implies. I don't see that at all -- I see excitement and interest in moving forward on something that already has taken a long time, and looming practical deadlines.

The rushed timing for the second vote, it seems, was done so that a result could be had in time for the imminent mass rebuild. It obviously makes sense to make a change like that before rebuilding the entire distribution from source rather than after. But even some of the participants in the wider discussion who understand that point felt that the process had not worked well.

There is still time for FESCo to revisit (again) the decision, should that seem warranted, but that seems unlikely. As FESCo member Zbigniew Jędrzejewski-Szmek pointed out, much of the discussion has already moved on to the technical details of how to manage the change. Thus, Fedora 38 will probably be a little slower than its predecessors, but hopefully the performance improvements that will follow from this change in future releases will more than make up for that cost.

Comments (56 posted)

Changing Fedora's shutdown timeouts

By Jake Edge
January 18, 2023

On today's Fedora systems, a reboot cycle—for a kernel update, say—is normally a fairly quick affair, but that is not always true. The system will wait for services to shut down cleanly and will wait for up to two minutes before killing a service and moving on. A recent proposal to change the default timeout to 15 seconds, while still allowing some services to require more time, ran into more opposition than was perhaps anticipated. Not everyone was comfortable shortening the timeout period, though the decision has now been made to reduce it, but not as far as was proposed.

Change proposal

The proposal to shorten the timeout for Fedora 38, which is due in late April, was posted to the devel mailing list on December 22. The feature is owned by Michael Catanzaro and Allan Day; it would reduce the "extremely frustrating" delays that can occur when shutting down a Fedora system. The Fedora workstation working group has had an open bug for two years targeting the problem and has made efforts to change the upstream systemd default timeout, but to no avail. Thus, they are proposing that Fedora make a change to benefit its users:

The primary benefit of the change will be to mitigate a very annoying and - frankly - embarrassing bug. Our users shouldn't have to randomly sit waiting for their machine to shutdown.

An informal proposal to change the timeout was made to the Fedora Engineering Steering Committee (FESCo) late in the Fedora 37 cycle, but it was closed because more information (in the form of a Fedora change proposal) was needed. In that discussion and the one on the current proposal, the problem of simply hiding underlying bugs, where services should be shutting down cleanly but are not, was raised. The change proposed this time around—also available on the Fedora wiki—notes that concern:

Although this change will "paper over" bugs in services without fixing them, we emphasize that reducing the timeout is not merely a workaround for buggy services, but also the desired permanent design. Of course it is desirable to fix the underlying bugs as well, but it doesn't make sense to require this before fixing the service timeout to match our needs.

There are mechanisms to inhibit system shutdown when that is needed by a given service. In addition, packages can set a different timeout in their systemd unit files if that is required. But those timeouts can also stack up if multiple hanging service shutdowns are serialized, so the cumulative effect can be more than just one timeout period. The proposal would lower the current default timeouts (for services that do not set their own) to 15 seconds from either two minutes or 90 seconds currently, depending on the type of service.

Reaction

Adam Williamson was concerned that the proposal was too aggressive; there may be situations where the system needs to cleanly shut down multiple virtual machines (VMs), which could take longer, so he thought that 30 seconds might be a more reasonable choice. "Going all the way from 90/120 down to 15 seems pretty radical." Chris Murphy wondered it if made sense to make the shorter timeouts opt-in or to provide a way for servers and other types of installations to opt out of the change. A concrete reason to wait longer was provided by "allan2016": "15 seconds will for sure kill the modem on the Pinephones for good." Removing the power without waiting the 20-30 seconds its modem needs to shut down will apparently brick the modem.

Peter Boy was adamant that the timeout remain unchanged, at least for the Fedora server edition. Servers may have a lot of work to do before they can cleanly shut down (e.g. terminate VMs with their own delays, complete in-progress database transactions) and there is no available data on how long that might all take. The current values are generally working for servers; "this proposal brings no advantage at all for servers, only potential problems".

But Neal Gompa sees things differently; if the administrator is shutting the system down, they are doing so for a reason and, if the timeout is hit, it's likely because the service is hung. He suggested that either 15 or 30 seconds would be reasonable, especially in light of how systemd handles the timeout: "It's per service being shut down, rather than a global timeout." Boy disagreed, arguing that the current values "are empirically obviously a safe solution", but Gompa said: "If the end result is the same, it doesn't matter whether it's 30 seconds or 2 minutes."

Debugging

Trying to figure out what is causing a shutdown to time out is another part of the problem. The proposal notes that PackageKit is the most common offender, which is going to be difficult to fix, according to Gompa in the workstation bug entry, but there are others. Steve Grubb thought there should be a way to easily find out which service is holding things up, but Tomasz Torcz said that a message like that already exists. Debugging is still a problem though:

The problem is: at this points it is hardly debuggable. One cannot start a new shell, sshd is off already, journalctl too. No way to gather any information what's wrong with the process holding up shutdown. We only get a name. And usually you cannot reproduce the problem easy on next shutdown.

Grubb was unaware of the "trick" needed to access that information. Typing "Esc" at the stalled graphical console (which only shows "a black screen and a spinning circle") will show the textual messages, but Grubb thought that option was completely hidden by the interface. Fabio Valentini concurred with that:

Even if systemd prints nice diagnostic messages, they're useless if nobody is going to see them. And I doubt that many people know that pressing the Esc key makes plymouth go away.

Would it be possible to print an informative message in Plymouth instead? Something like "Shutdown is taking longer than expected, please do not force off the computer".

In another part of the thread, Catanzaro noted that killing the services with a SIGKILL after the timeout did not really leave any information behind to figure out what went wrong: "Killing things silently makes it real hard to report bugs." He thought it would make sense to change FinalKillSignal for systemd to SIGQUIT so that a core dump would be created. Lennart Poettering suggested a different solution:

Don't use FinalKillSignal=SIGQUIT.

Use TimeoutStopFailureMode=abort instead. (which covers more ground, and sends SIGABRT rather than SIGQUIT on failure, which has the same effect: coredumping).

He also cautioned that dumping core is not without costs, including time to write the core file. "You might end delaying things more than you hope shortening them." But Zbigniew Jędrzejewski-Szmek was not concerned about that particular problem; it would ultimately make the problems more visible:

It'll obviously delay the shutdown, making the whole thing even more painful. I assume that we would treat any such cases as bugs. If we get the coredumps reported though abrt, it'd indeed make it easier to diagnose those cases.

Catanzaro amended the proposal to follow Poettering's advice, but Kevin Fenzi wondered if it made more sense to selectively add shorter timeouts to services that are known to take too long, but that can be safely killed. Jędrzejewski-Szmek said that approach would mean that thousands of packages would need to be updated to get lower timeouts, which is not something that is realistically going to happen.

Instead, the idea is to attack the problem from the other end: reduce the timeout for everyone. Once this happens, we should start getting feedback about what services where this doesn't work. Some services legitimately need a long timeout (databases, etc), and for those the maintainers would usually have a good idea and can extend the timeout easily. Some services are just buggy, and with the additional visibility and tracebacks, it should be much easier to diagnose why they are slow.

Approaching the problem from this side is much more feasible. We'll probably have to touch a dozen files instead of thousands.

The existing timeout values were chosen arbitrarily when they were originally added to systemd, Poettering said. System V init had no timeouts at all, so the systemd developers chose "a conservative (i.e. overly long) value to not upset things too badly", though there were still some who were unhappy that there were timeouts. He is in favor of the change: "lowering the time-outs by default would make sense to me, but of course, people will be upset".

The FESCo issue for the change has more comments along the lines of those in the mailing-list discussion. The committee took up the question at its January 17 meeting. After a lengthy discussion, FESCo approved the proposal with two changes: the new default timeout would be 45 seconds and various Fedora editions (e.g. server) must be able to override the change. The timeout could potentially be lowered again in some future Fedora release.

There are few things more infuriating than waiting for one's computer to finally decide to give up and reboot, so it is nice to see a reduction in just how long that wait might be. Server administrators may have different needs and/or expectations, but even there, an infinite wait is not particular tenable. Obviously, it would be even better if the services themselves got fixed so that they did not unnecessarily delay the inevitable, but it looks like this change will bring some more tools toward making that a reality.

Comments (69 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Briefs: Git security releases; 2023 in Libre Arts; Rust in Chromium; Firefox 109; Flent; Quotes; ...
  • Announcements: Newsletters, conferences, security updates, patches, and more.
Next page: Brief items>>

Copyright © 2023, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds