Leading items

Welcome to the LWN.net Weekly Edition for December 6, 2018

This edition contains the following feature content:

Investigating GitLab: An LPC talk on switching graphics (and perhaps more) to use GitLab.
Taming STIBP: Making a Spectre variant-2 mitigation that seriously degrades performance optional.
Binary portability for BPF programs: An LPC discussion on making BPF programs that can run on multiple kernels.
Bounded loops in BPF programs: An LPC discussion on adding bounded loops to BPF.
Unexpected fallout from /usr merge in Debian: Problems in trying to support both merged-/usr and unmerged-/usr systems in Debian.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Investigating GitLab

By Jake Edge
December 5, 2018

Linux Plumbers Conference

Daniel Vetter began his talk in the refereed track of the 2018 Linux Plumbers Conference (LPC) by noting that it would be in a somewhat similar vein to other talks he has given, since it is about tooling and workflows that are outside of the kernel norm. But, unlike those other talks that concerned changes that had already taken place, this talk was about switching open-source graphics projects to using a hosted version of GitLab, which has not yet happened. In it, he wanted to share his thoughts about why he thinks migrating to GitLab makes sense for the kernel graphics community—and maybe the kernel as a whole.

The Direct Rendering Manager (DRM) kernel subsystem is a fairly small part of the kernel, he said. It is also a fairly small part of the open-source graphics stack, which is under the X.Org umbrella. DRM sits in the middle between the two, so the project has learned development tools and workflows from both of the larger projects.

The kernel brought DRM into the Git world in 2006, which was just a year after Git came about; it was a "rough ride" back then, Vetter said. With Git came "proper commit messages". Prior to that, the X.org commit messages might just be a single, unhelpful line; now those messages explain why the change is being made and what it does. The idea of iterating on a patch series on the mailing list came from the kernel side as did the "benevolent dictator" model of maintainership. DRM, the X server, Wayland, and others all followed that model along the way.

From the X.Org side came things like the committer model; in Mesa, every contributor had commit rights. That model has swept through the graphics community, so now DRM, the X server, and Wayland are all run using that scheme. Testing and continuous integration (CI) is something that DRM has adopted from X.Org; the kernel also does this, but DRM has adopted the X.Org approach, tooling, and test suites. For historical reasons, "almost everything" is under the MIT license, which comes from X.Org projects as well.

There has been a lot of movement of tools and development strategies in both directions via the DRM subsystem. He thinks that using GitLab may be "the next big wave of changes" coming from the user-space side to kernel graphics, and maybe to the kernel itself eventually. This won't happen this year or next year, Vetter predicted, but over the next few years we will see GitLab being used more extensively.

Pain points

There are some pain points with the current development process, he said. The "git send-email" approach for sending patches is popular, but teaching new people how to get it to work for them is not trivial, which makes it something of a barrier to entry for mailing list patch-based communities. GitLab turns a patch submission into a pull request through a web page. Pushing and pulling Git branches also works well through corporate firewalls using HTTPS, allowing a more familiar browser-based workflow to be used.

On the administration side, supporting the current workflow style requires keeping a "bouquet of services" running: mail servers for SMTP, mailing list servers, Git servers, and so on. On the freedesktop.org (fd.o) site, where most of the open-source graphics work is housed, the administrators would like to move to a single maintained solution that has a lot less duct tape holding it together. Over the past few years, some projects have moved away from fd.o to GitHub in order to switch to that style of workflow. But kernel developers have some experience in building on top of proprietary tools (i.e. BitKeeper). The fd.o administrators would like to stay in control of their own tools and not be at the mercy of a vendor, Vetter said.

Projects like the kernel have turned to Patchwork to fill in some of the gaps. It turns out to not be the solution that it was hoped to be for a variety of reasons. It tries to follow the discussion on patches in a mailing list, but trying to parse that kind of thread is tricky and Patchwork gets confused regularly; humans are really needed to be able to make sense of a complex patch review thread.

Patchwork also loses semantic information that would be useful to maintain. For example, there is a log of the previous versions of a patch that is maintained by Git, but lost when a patch is posted to a mailing list, so it is lost to Patchwork as well. It is not clear in Patchwork what branch a patch is aimed at. For the DRM subsystem, there is a hard rule that patches submitted to the list target the integration tree, but then sometimes patches that backport something for a stable tree are posted, which confuses things.

In addition, Patchwork is only a side channel. Reviewers can't comment or indicate that the patch has been pulled. The source of truth for that style of workflow is the mailing list; Patchwork only provides a read-only view of it. It is difficult for a maintainer to see what patches are actually pending and which are patches in other states (older versions, already merged, experimental, rejected, etc.). It is also hard to keep Patchwork in sync with multiple inboxes, so it is not well suited to group maintainership.

Another area with pain points is CI. The right way to integrate CI into the workflow, he said, is to ensure that developers get positive confirmation that their patch has passed the tests—or useful feedback if they haven't. Because of the mailing-list-based workflow, more spam has to be created to give that feedback on the build and test status of patches. Patchwork does show CI status but it is not necessarily obvious since it is also buried in results for patches that are not relevant for various reasons.

GitLab

Fd.o started looking at different solutions for replacing its infrastructure and has decided on running the GitLab software. No one wants to re-experience the BitKeeper situation, so GitHub was not really considered, but there are others (e.g. pagure) that were looked at. GitLab's biggest downside is that it is an "open-core" solution but, largely due to Debian's efforts, it has a reasonable open-source approach, he said. Contributions just require a developer certificate of origin (DCO), which is what the kernel uses, and availability under the MIT license.

In addition, GitLab the company cares about big project workflows. Debian, GNOME, Khronos, and others are using it with good results. Vetter is hoping that it is easier for contributors to learn than existing workflows. If GitLab the company "goes evil", there are enough projects using the code that they will be able to keep the project going. GitLab comes with "batteries included": CI, issue tracking, Git repositories, and more. It will allow the fd.o administrators to get rid of a bunch of services on the fd.o infrastructure.

Large project workflows are important for the kernel and even for the DRM subsystem. The AMD and Intel drivers are big enough that putting them in a single repository would be problematic. The X.Org model is to have lots of different repositories, issue trackers, and discussion channels; each project has its own set. There is a need to be able to move issues back and forth between the sub-projects and to be able establish a fork relationship between repositories after the fact so that pull requests can be sent from one to the other; GitLab supports both. In another sign that it cares about large project workflows, Vetter said, GitLab is working on a mechanism to be able to create multi-repository pull requests that the CI system will handle correctly; that would allow a change to go into the whole graphics stack—user space to kernel—as one entity.

There are also some GitLab features that are worth considering for adoption. Merge requests—similar to pull requests—collect up a Git branch, including its history, a target branch, discussion, review, and CI results all into one bundle that can be tracked and managed as a single thing. It is basically everything that a pull request in email provides, with CI status as followup emails, but much of that gets lost in Patchwork.

A bigger problem is patch review, Vetter said; people have panic attacks if you say that you are going to take their email setup away. They have invested a lot of effort into setting up for email review. GitLab has only recently added the idea of per-patch review, as a feature request from the GNOME project. The data model is reasonable, but the user interface for per-patch review is not really usable at this point; it will get better over time, but until then large and complex patch series will need to be reviewed on the mailing lists. The merge request feature may help track the evolution of the series, with links to the email discussion threads.

The CI features of GitLab are particularly good, he said. On the server side, Docker and scriptlets can be used to build a fully customized CI infrastructure. There is support for running tests in customized ways, so a GPU can be used for accelerating the tests rather than just using the software fallback, for example. Every repository and fork has its own CI settings file, which allows for more customization.

Testing the drivers on real hardware is not suitable for the cloud, so those kinds of tests can be run locally. The results of those tests can be fed into the merge request so they can be tracked as well. Failing the CI tests can block merging, which is important for some workflows. The CI features also provide full transparency; developers can watch their CI jobs run in the cloud, he said.

One downside is that Docker, which is used for CI, is "not so great". In theory, it gives full control over the build environment, but if you need to build Docker images inside Docker, the "root escape hatch is needed, which is not so great for a shared hosting setup". Graphic projects want to be able to use KVM and binfmt-misc for running cross-compiled code, but that is not well supported. The fd.o administrators have been wrestling with the cloud runners for GitLab for half a year or so. It is not really working right at this point but, once it gets worked out, it is hoped that it will all work well.

Automation and more

There is a need for automation to relieve the maintainers from having to do "silly repetitive work", like running scripts to check the code. GitLab has some support for that with Webhooks, but it requires a server somewhere to run the hooks and is not ("yet?") up to the level of GitHub Actions. Automating the repetitive part of a maintainer's job is an area that he thinks has a big potential for reducing burnout and generally smoothing out the development process.

The fd.o administrators are worried about Bugzilla and how it interacts with the GDPR; it looks like you are submitting private information, but it then kicks out that information as email to a public list. There are a bunch of warnings in the fd.o Bugzilla instances, but fd.o would like to stop running Bugzilla. The GitLab issue tracker has several nice features, including per-repository templates for bug reports. All of the customization is done using labels, which is "a bit unstructured, but powerful", Vetter said.

Most fd.o projects have migrated their Git repositories, at least, to gitlab.freedesktop.org. The kernel graphics migration is blocked until early 2019 because of the size of its repositories. There has been lots of experimenting with CI; around 8,000 CI runs have been done so far this year. Projects are migrating their issue tracking to GitLab and starting to use its features; there have been around 2,000 merge requests so far.

In summary, Vetter said that Patchwork is a solution to a self-inflicted problem; five years ago, he would have called that "nonsense", but his opinion has changed based on the loss of semantic information with Patchwork. Fundamentally, maintainers want to track merge requests, with all of the the history, CI results, and so on collected up together. GitLab CI is "awesome", he said, but Docker and the cloud are less so. GitLab has fallen behind GitHub in terms of automation, but he is hopeful that it will be catching up. GitLab patch review is currently "bad", but he thinks it has some potential; over time it will get better. Graphics developers and the fd.o administrators are excited about GitLab, but it remains to be seen if GitLab adoption spreads further than that.

There was some concern from the audience about the open-core nature of GitLab. Vetter noted that, unlike some other open-core projects, GitLab does all of its development in the open; there are quite a few contributors to the code from outside of the company. The issue tracker is open as well, though there are some bugs that are hidden because they are customer-specific. The company has been open to moving features from the enterprise edition to the community edition at the request of GNOME or other large projects that are adopting GitLab. That is no guarantee moving forward, of course, but for now it is working well.

A YouTube video of the talk is available, as are the slides [PDF].

[I would like to thank LWN's travel sponsor, The Linux Foundation, for assistance in traveling to Vancouver for LPC.]

Comments (24 posted)

Taming STIBP

By Jonathan Corbet
November 29, 2018

The Spectre class of hardware vulnerabilities was apparently so-named because it can be expected to haunt us for some time. One aspect of that haunting can be seen in the fact that, nearly one year after Spectre was disclosed, the kernel is still unable to prevent one user-space process from attacking another in some situations. An attempt to provide that protection using a new x86 microcode feature called STIBP has run into trouble once its performance impact was understood; now a more nuanced approach may succeed in providing protection where it is needed without slowing down everybody else.

The Spectre variant 2 vulnerability works by polluting the CPU's branch-prediction buffer (BPB), which is used during speculative execution to make a guess about which branch(es) the code will take; see this article for a refresher on the Spectre vulnerabilities if needed. Closing this hole requires changes at a number of levels, but a fundamental part of the problem is preventing any code that may be targeted from running with a BPB that has been trained by an attacker.

There are a few ways in which this can be accomplished; in many cases the appropriate tool is a new instruction called IBPB, which flushes the BPB. Developers have been discussing the right times to execute IBPB instructions for some time, but the overall strategy is relatively straightforward: an IBPB instruction should be run whenever the CPU switches between tasks that do not trust each other. A few modes for determining when IBPB should be used have been implemented and can be selected with command-line options.

IBPB leaves one part of the problem unsolved, though. When simultaneous multithreading (SMT, or "hyperthreading") is in use, two threads of execution are, for all practical purposes, executing on the same CPU simultaneously. Those threads will share the same BPB; if one thread populates the BPB with hostile entries, the other thread will be affected by them until the next IBPB instruction is executed. In other words, SMT processors create an ongoing series of time windows in which one thread may attack another, even when IBPB is in use. Some security-sensitive users have disabled SMT entirely in response to this problem (and others), but not everybody wants to pay that cost.

That is where STIBP comes in. It is a processor mode (rather than an instruction) that, according to Intel's press materials [PDF], "prevents indirect branch predictions from being controlled by the sibling Hyperthread". This sounds like just what is needed to keep threads from attacking each other. After some discussion, STIBP support was added to the kernel during the 4.20 merge window. At that time, the decision was made to enable STIBP by default and to leave it on, so that systems would automatically be protected. This patch was subsequently backported to the 4.19.2, 4.18.19, 4.14.81, and 4.9.137 stable updates.

It turns out, however, that there is a problem with STIBP: it slows the system down significantly for many workloads. Linus Torvalds managed to keep his promise to be more polite when he described what is going on, but it must have been a strain:

Yes, Intel calls it "STIBP" and tries to make it out to be about the indirect branch predictor being per-SMT thread.

But the reason it is unacceptable is apparently because in reality it just disables indirect branch prediction entirely. So yes, *technically* it's true that that limits indirect branch prediction to just a single SMT core, but in reality it is just a "go really slow" mode.

As reports of performance regressions started rolling in, it became clear that the decision to enable STIBP by default would have to be revisited. In the resulting discussion, Torvalds said that STIBP needed to be made an optional feature that could be enabled by "crazy people" who are willing to pay the performance cost it brings. Arjan van de Ven said that both Intel and AMD recommend against enabling it by default (though Intel has apparently not actually documented that recommendation anywhere). Ingo Molnar promised to require performance measurements for any future mitigations before they can be merged. The STIBP patch was reverted in the 4.19.4 4.14.83, and 4.9.140 stable updates; it remains in 4.18 since that series is no longer receiving updates.

As of this writing, the STIBP patch is also still in the mainline kernel, pending the finalization of a better solution. That solution is likely to take the form of this patch set posted by Thomas Gleixner, containing the work of a number of developers. STIBP is disabled on any system that does not actually have running processors with SMT enabled, even if such processors could materialize in the future. It is also disabled by default for most processes on the system, but it can be globally enabled with the spectre_v2_user=on command-line option.

There is also a new set of values for the spectre_v2= command-line option that can be used to enable more control over branch prediction:

spectre_v2=prctl leaves both IBPB and STIBP disabled by default, but allows them to be enabled for individual processes via a new prctl() operation. In this mode, the system can generally run without the extra overhead of the Spectre mitigations, but those mitigations can be turned on for specific processes that need extra protection.
spectre_v2=seccomp is the same as the prctl mode, with the exception that any processes running under seccomp() will have the mitigations enabled unconditionally.
spectre_v2=prctl,ibpb enables IBPB globally in the system, but only enables STIBP for processes that have turned it on with prctl().
spectre_v2=seccomp,ibpb enables IBPB globally, and STIBP for all seccomp() processes and those that have enabled it explicitly.

This set contains 28 individual patches; it is not a trivial thing to merge this late in the development cycle (or into a stable kernel update). That appears to be the plan, though; the patches have been pulled into the tip tree and are likely to hit the mainline in the near future. Invasive changes like this are just part of the deal in the post-Spectre world, it seems. Once the dust settles, though, Linux systems will have more complete protection against Spectre variant 2, but the cost of that protection will only need to be paid by those who feel that they need it.

Comments (19 posted)

Binary portability for BPF programs

By Jonathan Corbet
November 30, 2018

Linux Plumbers Conference

The BPF virtual machine is the same on all architectures where it is supported; architecture-specific code takes care of translating BPF to something the local processor can understand. So one might be tempted to think that BPF programs would be portable across architectures but, in many cases, that turns out not to be true. During the BPF microconference at the Linux Plumbers Conference, Alexei Starovoitov (assisted by Yonghong Song, who has done much of the work described) explained the problem and the work that has been done toward "compile once, run everywhere" BPF.

Many BPF programs are indeed portable, in that they will load and execute properly on any type of processor. Packet-filtering programs, in particular, usually just work. But there is a significant class of exceptions in the form of tracing programs, which are one of the biggest growth areas for BPF. Most tracing tools have two components: a user-space program invoked by the user, and a BPF program that is loaded into the kernel to filter, acquire, and possibly boil down the needed data. Both programs are normally written in C.

The BPF side of a tracing program may have to dig deeply into the guts of the kernel, and those guts can change significantly from one kernel to the next. The offsets of specific fields within structures are a particular problem; they can differ depending on architecture, kernel configuration options, and more. Tracing programs often need to use those offsets to get the data they are looking for. If the offsets built into a given BPF program do not match the current kernel, the program will not produce the correct results.

This problem is "solved" now by compiling BPF programs on the fly, just prior to loading them into the kernel. To do that, the BPF Compiler Collection (BCC) bundles a copy of the Clang compiler, which is a lot of code to haul around — and much of that code has to be linked into the tracing program itself, where it consumes RAM. This toolchain, along with the kernel development headers, must be installed on the system being traced, a painful task on embedded systems. Even then, it's often necessary to paste specific structure definitions into BPF programs to be able to access the needed fields.

The proposed solution is to introduce structure-field offset information into the BPF Type Format (BTF) section describing a compiled BPF program. Those offsets are built into BPF programs by the compiler now; what is needed is a set of pointers to where those offsets are used and their associated field names; then the libbpf library will be enhanced to "relocate" those offsets to match the current kernel before a given program is loaded into the kernel.

Parts of this problem are hard. In particular, getting the field-name information through LLVM's intermediate representation is difficult; there is "a lot of compiler work" to be done to support this feature. The information needed to perform relocation is more readily available from the vmlinux kernel image file on the target system. Ongoing work includes converting the data-type information stored in the DWARF format in the kernel image to BTF, a process that reduces the size of that information from 120MB to 2MB.

Offsets to structure fields are not the only problem that needs to be solved, though. Imagine a bit of code that looks like:

    #if KERNEL_VERSION == 406
        minrtt = ms.v1;
    #else
        minrtt = ms.v2;
    #endif

The branch that is pruned by the preprocessor never appears in the output, with the result that the generated BPF code is dependent on the kernel version. The planned solution here is to turn the preprocessor variable into a BPF variable, so that the above code could be written as:

    if (__bpf_kernel_version == 406)
        minrtt = ms.v1;
    else
 	minrtt = ms.v2;

Both paths are now present in the generated BPF code, which will do the right thing regardless of the kernel version. Other cases are harder; imagine, for example, code that is dependent on whether the REQ_OP_SHIFT macro is defined. Once again, a global variable (__bpf_req_op_shift) is created to delay the decision until run time and keep all paths present in the generated code. Things get more complicated when it comes to types that may not exist at all depending on something like a configuration variable. Solutions here include a complex "fuzzy struct-type matching" mechanism, or just creating a massive file full of type information (in the BTF format) for a wide range of kernel versions.

The problem can be made arbitrarily complex, though; Jes Sorensen asked whether it would be possible to handle CPU masks, which are stored on the kernel stack — unless the system is too large, in which case they are pushed out to heap storage. The answer was that some things will just never be possible.

Other problems include calling static inline functions and preprocessor macros from BPF programs; there does not appear to be a better solution than just copying them into the program at this point. That will bloat the size of the program, of course, and getting some of those functions past the BPF verifier could prove to be a challenge.

Some related work has to do with adding global variables and read-only data to BPF programs. Globals, which are needed to support some of the techniques described above, can be added without any compiler changes, but the kernel API to support them still needs to be designed and implemented. That is also true of read-only data, which would be especially useful for the handling of strings in BPF programs.

There are clearly a few things to be worked out in this area still, and it may never be possible to run an arbitrary BPF program on any system. But it seems likely that BPF users will see a solution that works for a lot of the commonly-used tools in the BCC collection, which should make life easier for a lot of use cases.

(The slides from this presentation [PDF] are available.)

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the event.]

Comments (21 posted)

Bounded loops in BPF programs

By Jonathan Corbet
December 3, 2018

Linux Plumbers Conference

The BPF verifier is charged with ensuring that any given BPF program is safe for the kernel to load and run. Programs that fail to terminate are clearly unsafe, as they present an opportunity for denial-of-service attacks. In current kernels, the verifier uses a heavy-handed technique to block such programs: it disallows any program containing loops. This works, but at the cost of disallowing a wide range of useful programs; if the verifier could determine whether any given loop would terminate within a bounded time, this restriction could be lifted. John Fastabend presented a plan for doing so during the BPF microconference at the 2018 Linux Plumbers Conference.

Fastabend started by noting that the lack of loops hurts; BPF developers are doing "crazy things" to work around their absence. He is working to enable the use of simple loops that can be modeled by the verifier. There is academic work on ways to verify more complex loops, but that is a problem for later. For now, the objective is to detect simple loops and verify that they will terminate; naturally, it's important that the verifier, too, is able to terminate in a reasonable amount of time.

The key to determining the behavior of a loop is to find the induction variable that controls it. For a simple loop like:

    for (i = 0; i < MAX; i++)
	/* loop body */

the induction variable is i. If the variable can be shown to be both monotonically increasing and bounded in value, then the verifier can conclude that the loop will terminate. Once the induction variable and its bounds have been identified, the verifier can also check that any memory references using that variable remain in range.

This kind of verification could be done relatively easily with knowledge of where the loops are. But BPF is an unstructured virtual machine language that doesn't contain information about loops, so the verifier has to figure out where they are itself. This is done by creating a dominator tree that describes the program to be examined; it will identify tests that control the execution of specific blocks of code. From there, it is possible to identify loops (by looking for reverse jumps to the dominator node) and the blocks of code that belong to each loop.

Doing so is not entirely easy. The use of a dominator tree requires that there be a single entry point into every loop; code that jumps into the middle of a loop cannot be verified. Identifying the looped-over code is also an expensive (O(n²)) algorithm.

Fastabend outlined a few approaches to this problem, describing the first as the "by the book" method. This algorithm builds the dominator tree, then works to detect (and abort on) loops that cannot be reduced to a verifiable case. For each loop, it finds the induction variable, and verifies that variable's bounds; the execution of the loop is then simulated with the induction variable's largest and smallest values. The problem here is that the induction variable must be found with pattern matching, and the LLVM optimizer creates a wide variety of patterns that change with every release. That makes the code fragile.

The next approach is to get the compiler to help by limiting the number of loop types that it generates. That makes the pattern-matching task easier, since the patterns that identify loops will be reduced in number and less prone to change. The verifier still has to do all of the work, but it becomes quite a bit more robust.

But the best solution, he said, would be to create a new set of BPF instructions specifically to implement loops. They would mark the beginning and end of each loop, and include the test of the induction variable; the verifier would replace those instructions with actual tests and jumps. With these markers to denote the loop blocks, there would be no need for the dominator tree, since the markers themselves would make it clear which code is controlled by the loop. That would keep the verifier code minimal which, he said, is the right tradeoff in the end.

The description of the session said that the goal "to come to a consensus on how to proceed to make progress on supporting bounded loops". That did not happen, but there was some discussion about the options and the development community is more aware of how this complex work is proceeding. In the end, real consensus is likely to come about in the usual way: through the posting of code that shows how the idea is implemented in the real world.

The slides from this talk [PDF] and a video recording [YouTube] are available. Curious readers can also see the implementation of the first alternative as it was posted in June.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the event.]

Comments (58 posted)

Unexpected fallout from /usr merge in Debian

December 4, 2018

This article was contributed by Alexander E. Patrakov

Back in 2011, Harald Hoyer and Kay Sievers came up with a proposal for Fedora to merge much of the operating system into /usr; former top-level directories, /bin, /lib, and /sbin, would then become symbolic links pointing into the corresponding subdirectories of /usr. Left out of the merge would be things like configuration files in /etc, data in /var, and user home directories. This change was aimed at features like atomic upgrades and easy snapshots. The switch to a merged /usr was successful for Fedora 17; many other distributions (Arch, OpenSUSE, Mageia, just to name a few) have followed suit. More recently, Debian has been working toward a merged /usr, but it ran into some surprising problems that are unique to the distribution.

Debian and its derivatives are definitely late to the /usr merge party. Systems running Debian testing that were initially installed before June 2018 still have /bin, /sbin, and /lib as normal directories, not as symbolic links. The same applies to Ubuntu 18.10. But both Debian and Ubuntu want to make the switch to a merged /usr. Debian tried, but it hit something completely unexpected.

The Debian /usr merge history started in 2016, when Marco d'Itri got the usrmerge package into Debian unstable. This package contains a Perl script that converts an existing system into the state with a merged /usr. Also, a change was made to the debootstrap program (which installs a Debian system into a chroot), so that it could create the needed symbolic links by itself before installing any packages. The end result is the same in both cases.

The plan was to default to a merged /usr when it is ready (and the initial testing revealed only three broken packages), but it was hoped that both merged and non-merged setups would be supported. In other words, the expectation was that there would be no flag day when everyone must switch.

The decision on whether /usr merge would be done by default has changed multiple times during the debootstrap development timeline. The initial support was coded in September 2016, and released in debootstrap version 1.0.83, in a disabled-by-default state. In October 2016 there was an attempt to enable it by default, but this was reverted in November, because the dpkg-shlibdeps program (which is used during package builds to automatically generate dependencies on packages that provide the needed shared libraries) broke. Therefore, Debian 9.0 (with the code name "Stretch") was released in June 2017 without this feature.

A second attempt to re-enable /usr merge by default happened in June 2018, with debootstrap version 1.0.102. Since June 25, 2018, new default installations of Debian testing from the "daily" builds of the install CD have a merged /usr, and that has not been reverted so far.

Happy ending? Not so fast ...

Problems with R

On November 10, a new version of debootstrap was uploaded to the "backports" repository for the Debian stable distribution. This repository is not enabled on end-user systems by default, but it is enabled on the build daemon (buildd) machines for the purpose of getting a few updated packages including debootstrap. Therefore, shortly thereafter, Debian build daemons started using a merged /usr for their chroots. One of them attempted to automatically rebuild the r-base package and produced an R binary that did not work on non-merged systems. As the bug report says:

    /usr/bin/R: line 193: /usr/bin/sed: No such file or directory

Traditionally, sed was always placed in /bin. The Debian package sed also has /bin/sed, not /usr/bin/sed. In the bug report, the problem is treated like a one-off issue, to be solved by a rebuild. However, on the debian-devel mailing list, Ian Jackson quickly pointed out that the problem is, in fact, due to /usr merge on the build daemons. He suggested that the change should be reverted. Dirk Eddelbuettel seconded that suggestion, and noted that he expects "much more breakage to follow". Indeed, similar problems were triggered in sympow, pari, and monitoring-plugins. Other bugs of this nature can be found by searching the Debian bug tracking system for a special tag (but this search also finds other kinds of issues).

Jackson provided a good explanation of the mechanics of what has happened, quoted below.

R's autoconfery is autodetecting the location of (say) sed at build time by searching the PATH. R then bakes the discovered path into the built binaries.
With usrmerge, /bin is a symlink to /usr/bin [...].
Consequently the R autoconfery always detects (say) sed in the first place out of /usr/bin and /bin it looks, and bakes /usr/bin/sed into its binaries.
On a system without usrmerge, /usr/bin/sed does not exist, because sed is in /bin.

Matthias Klumpp suggested that a sensible build system would automatically detect the correct paths even on a /usr-merged system; he also wondered why the issue with incorrect paths has not been encountered by other distributions. The crucial difference was found to be that all other distributions had a flag day, before which merged /usr was unsupported, and after which it was mandatory. That is, on other distributions there was no requirement that a package built on a /usr-merged system should work correctly on a system without a merged /usr.

D'Itri said that there is only a handful of packages affected and provided an example of what a fix would look like. Namely, the proper locations of programs should be provided to the configure script or its equivalent, or the software has to be modified so that it always looks for programs using $PATH at runtime. In fact, Debian maintainer scripts (that is, scripts written by Debian maintainers that run before or after the package installation, upgrade, or removal) are already required, by Debian policy, to use $PATH instead of hard-coding the paths to the tools they use. There is, however, no such requirement in the policy for programs other than maintainer scripts.

The argument that there is only a handful of affected packages was not compelling enough, because the number of packages affected is, in fact, unknown without someone trying to rebuild them all. So Simon McVittie added a new check to the reproducible builds infrastructure, specifically for differences caused by /usr merge. This check caught problems in perl, quilt, and systemd.

On November 20, the change was reverted — build daemon chroots were recreated without a merged /usr.

Plan B?

Due to the perceived damage to the distribution caused by /usr merge, Adam Borowski started a new discussion thread on the debian-devel list, with the subject "usrmerge — plan B?". He proposed to scrap the usrmerge package, and, instead, move binaries and libraries to /usr one by one. "If it takes 10 years, so what?" McVittie pointed out that moving binaries on a package-by-package basis would not solve anything. Indeed, any move of a binary would break all packages that hard-coded its expected location, which is exactly the same problem that was triggered by /usr merge on the build daemons.

The arguments usually presented by /usr-merge proponents have also been questioned, for example on the basis that atomic upgrade is worthless if it leaves the system with mismatching /etc or /var. McVittie wrote a long email in response to the complaint about a lack of detailed justification of the change. The criticism regarding /var and /etc mismatch was answered by showing how existing systems with immutable /usr (like OSTree and systemd-based stateless systems) are dealing with it.

McVittie explained the /usr merge failure mode again and mentioned the one-way compatibility property: packages built on a system without a merged /usr would work fine on a merged-/usr system, but not the other way around. Therefore, he concluded that the transition simply happened, by accident, in the wrong order: the build daemons should have transitioned last.

The rest of the email presents the benefits of /usr merge. In particular, "it's a significant simplification for reliable special-purpose systems" such as consumer appliances. It brings benefits for end-user systems, too, because of increased portability of scripts from other systems. This is important because some upstream authors work on merged-/usr systems and don't know (and don't need to know) the "proper" location of binaries. In addition, it was noted that the change would simplify the security-relevant code of sandboxing systems like bubblewrap.

But Borowski's objection was not to /usr merge itself, but of the need to support systems both with and without a merged /usr. In other words, according to him, in order to be supportable, /usr merge must either be mandatory or not happen at all.

The discussion also touched upon the topic of the transition freeze that will happen in January 2019 and limits the time available for planning and testing any changes. So far, there is no consensus, but the most popular opinion seems to be that a mandatory /usr merge has to be postponed until after the release. As Russ Allbery put it: "That's going to be a disappointing delay for a lot of people, I'm sure, but it's still better for them than never doing this at all."

Ubuntu also plans to go through /usr merge. Dimitri John Ledkov described the situation this way: in the upcoming Ubuntu 19.04 release, new installations will use merged /usr, but existing ones will not have their /usr merged on upgrade. Of course, the official Ubuntu build daemons will not have merged /usr.

The status quo in Debian is that debootstrap (and, thus, the Debian installer) installs a system with /usr merged, and the build daemons explicitly pass a command-line switch to avoid this. This is still dangerous, because initial uploads of new packages in Debian (but not in Ubuntu) require a binary package built by the developer on his system. As long as a single uploader has a system with merged /usr, there is a potential for the bug to resurface in his Debian package. On the other hand, as d'Itri noted, there are already too many possibilities for a Debian maintainer to misbuild a package by not using a chroot, so there is nothing new in this danger. Anyway, support for marking packages as "tainted" by the build environment has been implemented in dpkg in response to this concern, and dpkg taints the package if it detects the symbolic links related to /usr merge.

The discussion is still in progress, though; no consensus has been reached. A bug was filed against debootstrap by Jackson to revert the change to merge by default for the next release of Debian. Due to the disagreement of the debootstrap maintainer to the proposed change, Jackson reassigned the bug to the Debian Technical Committee, which is the ultimate authority for resolving otherwise unresolvable technical disputes within Debian. There is also a request from the Debian backports FTP master that the default should be the same in Debian stable backports and in Debian testing. Emilio Pozuelo Monfort, a member of the release team, also spoke in favor of reverting to non-merged /usr in new installations.

It is impossible to predict now how the Technical Committee will rule. In the worst case for /usr-merge proponents, proper introduction of a merged /usr into Debian may be delayed by a few more years. But, if it votes for keeping the status quo, new end-user systems in the next stable release of Debian will have merged /usr, old but upgraded ones won't, and the build daemons will reliably build packages suitable for both cases, just like what's planned for Ubuntu 19.04. No flag day is needed in this scenario, so it would follow the best Debian traditions of not forcing transitions onto users.

Comments (89 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>