Leading items

Welcome to the LWN.net Weekly Edition for June 24, 2021

This edition contains the following feature content:

Pulling GitHub into the kernel process: the kernel community debates a system to interface between GitHub pull requests and the email-based process.
New features and other changes in Python 3.10: what to expect in the upcoming major Python release.
Landlock (finally) sets sail: after more than five years, a new sandboxing mechanism makes it into the mainline kernel.
Protecting control dependencies with volatile_if(): compilers don't understand control dependencies, but the kernel depends on them.
A stable bug fix bites proprietary modules: a potential surprise for stable-kernel users shows the limits of the "no regressions" policy.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Pulling GitHub into the kernel process

By Jake Edge
June 23, 2021

There is an ongoing effort to "modernize" the kernel-development process; so far, the focus has been on providing better tools that can streamline the usual email-based workflow. But that "email-based" part has proven to be problematic for some potential contributors, especially those who might want to simply submit a small bug fix and are not interested in getting set up with that workflow. The project-hosting "forge" sites, like GitHub and GitLab, provide a nearly frictionless path for these kinds of one-off contributions, but they do not mesh well—at all, really—with most of mainline kernel development. There is some ongoing work that may change all of that, however.

Konstantin Ryabitsev at the Linux Foundation has been spearheading much of this work going back at least as far as his September 2019 draft proposal for better kernel tooling. Those ideas were discussed at the 2019 Kernel Maintainers Summit and at a meeting at Open Source Summit Europe 2019 in October. Throughout, Ryabitsev has been looking at ways to make it easier for non-email patch submitters; along the way, he has also released the b4 tool for collecting up patches and worked on patch attestation.

A recent post to the kernel workflows mailing list shows some progress toward a bot that can turn a GitHub pull request (PR) into a well-formed patch series to send to the proper reviewers and mailing lists. "This would be a one-way operation, effectively turning Github into a fancy 'git-send-email' replacement." He also laid out some of the benefits that this bot could provide both for maintainers and patch submitters:

submitters would no longer need to navigate their way around git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to have a patch-friendly outgoing mail gateway to properly contribute patches
subsystem maintainers can configure whatever CI pre-checks they want before the series is sent to them for review (and we can work on a library of Github actions, so nobody needs to reimplement checkpatch.pl multiple times)
the bot should (eventually) be clever enough to automatically track v1..vX on pull request updates, assuming the API makes it straightforward

He had some questions about whether the bot should be centralized in a single repository (per forge platform) that would serve as the single submission point, or whether subsystem maintainers would want to configure their own repositories. The latter would give maintainers the opportunity to set their own criteria for checks that would need to pass (e.g. checkpatch.pl) before the PR was considered valid, but would mean that they might have to ride herd on the repository as well.

In addition, Ryabitsev wondered when and how PRs would get closed. The bot could potentially monitor the mainline and auto-close PRs once the patch set gets merged, but that won't be perfect, of course. An easier approach for him would be "to auto-close the pull request right after it's sent to the list with a message like 'thank you, please monitor your email for the rest of the process'", but he was unsure if that would be best.

As might be guessed, reactions from those participating in the thread were all over the map. While there is a lack of many kinds of diversity within the kernel community, opinions on development workflow—opinions, in general, in truth—do not have that problem. Some maintainers have zero interest in this kind of effort at all. As Christoph Hellwig put it: "Please opt all subsystems I maintain out of this crap. The last thing I need is patches from people that can't deal with a sane workflow."

Hellwig's complaint, which Jiri Kosina agreed with, may be more about the expectations of those who use GitHub (and the like), and less about the possibility of having a web-based interface to kernel development. Dmitry Vyukov asked why Hellwig and Kosina would be unwilling to accept patches from the system if they cannot really distinguish them from a regular submission. Vyukov said that he is currently experiencing a Git email submission problem that he is uninterested in working around, so he can see why others might be similarly inclined. Meanwhile, though, he sees benefits from this kind of bot:

On the other hand this workflow has the potential to ensure that you never need to remind to run checkpatch.pl, nor spend time on writing code formatting comments and re-reviewing v2 because code formatting will be enforced, etc. So I see how this is actually beneficial for maintainers.

Hellwig is not opposed to a web-based solution, though he wants nothing to do with GitHub. But Ryabitsev seems uninterested in "reimplementing a lot of stuff that we already get 'for free' from Github and other forges". Both Mark Brown and Laurent Pinchart suggested that there are mismatches between GitHub-normal practices and those of the kernel community. Pinchart mentioned the inability to comment on a patch's commit message on GitHub as something that generally leads to poor messages; the platform is training these developers to a certain extent:

Developers who have only been exposed to those platforms are very likely to never have learnt the importance of commit messages, and of proper split of changes across commits. Those are issues that are inherent to those platforms and that we will likely need to handle in an automated way (at least to some extent) or maintainers will become crazy [...]

But Miguel Ojeda thinks that it is really no different from new developers showing up on the mailing list with patches. "The same happens in the LKML -- some people have sent bad messages, but we correct them and they learn." He also noted that automated checking of patches can help both developers and maintainers:

[...] it is particularly useful to teach newcomers and to save time for maintainers having to explain things. Even if a maintainer has a set of email templates for the usual things, it takes time vs. not even having to read the email.

Ojeda is working on the Rust for Linux project, which we looked at back in April; he said that he has also been working on a bot:

For Rust for Linux, I have a GitHub bot that reviews PRs and spots the usual mistakes in commit messages (tags, formatting, lkml vs. lore links, that sort of thing). It has been very effective so far to teach newcomers how to follow the kernel development process.
I am also extending it to take Acks, Reviewed-by's, Tested-by's, etc., and then performing the merge only if the CI passes (which includes running tests under QEMU, code formatting, lints, etc.) after applying each patch.

But Ojeda is taking things in a rather different direction than what Ryabitsev is envisioning. Ojeda wants to move the main place for patch review and the like from the mailing lists to GitHub. He is also considering having his bot pick up patches from the mailing list and turning them into GitHub PRs—the reverse of what Ryabitsev is doing.

For his part, Ryabitsev said: "That's pretty cool, but I'm opposed to this on theological grounds. :)" In particular, he is concerned about the "single point of failure" problem for the kernel-development infrastructure. If his bot is unavailable for any reason, it may be inconvenient for those who use it, but that will not hobble development. He sees GitHub as simply a "developer frontend tool".

Somewhat similar to Ojeda's intentions, Brendan Higgins has a tool to pick up patches from a mailing list (kselftest in this case) and upload them to a Gerrit instance. He sees some potential synergies between his bot and the one Ryabitsev is working on. Similarly, Drew DeVault has been working on the reverse direction, from a mailing list to a project forge, as well. Patchwork is a longstanding code-review project that also collects up patches from mailing lists to populate a web application. It would seem that much of the focus is on getting patches out of mailing lists, though, which is not where Ryabitsev is headed.

While some maintainers want no part of this "GitHub Future", others are enthusiastic about the possibilities it could bring. Vyukov thinks that having a single GitHub repository with multiple branches will help consolidate the kernel-development landscape, which is currently fragmented on subsystem lines. He sees it as an opportunity to apply consistent coding-style standards; it does not matter which, he said, "as long as it's consistent across the project". It would also allow testing consistency throughout the tree and the same for the development process:

For once: it will be possible to have proper documentation on the process (as compared to current per-subsystem rules, which are usually not documented again because of low RoI [return on investment] for anything related to a single subsystem only).

It is not at all clear that Vyukov's interest in consistency throughout the tree is shared widely, but there have certainly been complaints along the way about the difficulty of navigating between the different subsystem processes and requirements for submissions. There is also interest in making things easier for quick, one-off contributions; as Ryabitsev put it:

Our code review process must also allow for what is effectively a "report a typo" link. Currently, this is extremely onerous for anyone, as a 15-minute affair suddenly becomes a herculean effort. The goal of this work is to make drive-by patches easier without also burying maintainers under a pile of junk submissions.

Clearly keeping "junk submissions" to a bare minimum is going to be important. Linus Torvalds said that he has had to turn off email from GitHub because it is too noisy; people have apparently signed him up as a project member without any kind of opt-in check. Beyond that, any kind of patch submission from PRs would need to have some sanity checks, including size limits, so that PRs like one pointed out by Ryabitsev do not end up on the mailing list.

That kind of PR highlights another problem: repository maintenance. Greg Kroah-Hartman said that there will be a need to monitor whatever repositories are being used for this purpose. It is not a small task:

What ever repo you put this on, it's going to take constant maintenance to keep it up to date and prune out the PRs that are going to accumulate there, as well as deal with the obvious spam and abuse issues that popular trees always accumulate.

Torvalds does not want his GitHub tree used for this purpose and Kroah-Hartman said the same. However it plays out, someone will have to be tasked with keeping the repository tidy, which is "a thankless task that will take constant work". But Ryabitsev is hopeful that the Linux Foundation could fund that kind of work if it becomes necessary.

In the end, it will likely come down to how seamlessly the GitHub bot fits in. If maintainers truly cannot really tell the difference in any substantive way, it is hard to see many of them rejecting well-formed patches that fix real problems in their subsystems. That ideal may not be reached right away, however, which might lead to a premature end to the experiment. It will be interesting to see it all play out over the coming months and years.

Comments (90 posted)

New features and other changes in Python 3.10

By Jake Edge
June 23, 2021

Python 3.10 is proceeding apace; everything looks to be on track for the final release, which is expected on October 4. The beta releases started in early May, with the first of those marking the feature-freeze for this version of the language. There are a number of interesting changes that are coming with Python 3.10, including what is perhaps the "headline feature": structural pattern matching.

As we did with Python 3.9, and Python 3.8 before it, taking a look at what is coming in the (now) yearly major release of the language has become something of a tradition here at LWN. The release notes that are compiled as part of the release process are invaluable in helping track all of the bits and pieces that make up the release. "What's New In Python 3.10" does not disappoint in that regard; those looking for more information about this release are encouraged to give it a look. In addition, we have covered some of these changes as they were discussed and developed over the last year or so.

Headlining

The structural pattern matching feature fills a longstanding hole that many have complained about along the way, but it also does a whole lot more than that. Python has never had a "switch" statement or its equivalent; programmers have relied on a series of if/elif/else blocks to handle the various values of a particular expression instead. But there have been proposals to add a switch statement going back at least 20 years.

A year ago, Python creator Guido van Rossum and few other folks resurrected the idea, but in a much more sweeping form. That led to multiple large threads on the python-dev mailing list, and to a second version of the Python Enhancement Proposal (PEP) for the feature. After the steering council looked, though, that original proposal became three PEPs (two informational) in October 2020, and two other competing PEPs were added into the mix. In February, the council decided to accept one of the three, PEP 634 ("Structural Pattern Matching: Specification"), along with its two companions. The other two PEPs were rejected.

The basic idea of the feature is that the value being "matched" (the new Python statement is match) can be unpacked in various ways, so that pieces of the object can be extracted. The example that probably gives the most "bang for the buck" comes from PEP 622 ("Structural Pattern Matching"):

def make_point_3d(pt):
    match pt:
        case (x, y):
            return Point3d(x, y, 0)
        case (x, y, z):
            return Point3d(x, y, z)
        case Point2d(x, y):
            return Point3d(x, y, 0)
        case Point3d(_, _, _):
            return pt
        case _:
            raise TypeError("not a point we support")

Perhaps the most unfamiliar piece of that example is the use of "_" as a "wildcard" (i.e. match anything), which was a major point of contention during the discussions of the feature. But the look of match is only really Pythonic if you squint ... hard. The case statements are unlike anything else in the language, really. If pt is a 2-tuple, the first case will be used and x will get the value of pt[0] and y will get pt[1].

The third and fourth cases are even weirder looking, but the intent should be reasonably clear: objects of those types (Point2d and Point3d) will be matched and the variables will be filled in appropriately. But the normal rules for reading Python are violated, which was another controversial part of the proposal; "case Point2d(x,y):" does not instantiate an object, instead it serves as a template for what is to be matched. The references to x and y in the case do not look up the values of those variables, rather they are used to specify the variables that get assigned from the unpacking.

There is a lot more to the match statement; those interested should dig into the PEPs for more information, or run the most recent beta (3.10.0b3 at the time of this writing) to try it out. There are also other parts of the syntax (and semantics) that are at least somewhat controversial; the LWN articles, and the mailing-list threads they point to, will help unravel those concerns as well.

Parsing and error reporting

One of the bigger changes that came with Python 3.9 was the new parsing expression grammar (PEG) parser for CPython. The PEG parser was added as the default in 3.9, but the existing LL(1) parser (with "hacks" to get around the one-token lookahead limitation) would remain as an option. In 3.10, that option has disappeared, along with the code for the older parser. In addition, the deprecated parser module has been removed.

Now that there is no requirement to stick (mostly) to LL(1) for CPython parsing, that opens up other possibilities for the syntax of the language. In a semi-prescient post as part of a discussion about the PEG-parser proposal in April 2020, Van Rossum suggested one possibility: "(For example, I've been toying with the idea of introducing a 'match' statement similar to Scala's match expression by making 'match' a keyword only when followed by an expression and a colon.)"

For 3.10, there is another example of a place where the new parser improves the readability of the language: multiple context managers can now be enclosed in parentheses. A longstanding enhancement request was closed in the process. Instead of needing to use the backslash continuation for multi-line with statements, they can be written as follows:

    with (open('long_file_name') as foo,
            open('yet_another_long_file_name') as bar,
	    open('somewhat_shorter_name') as baz):
        ...

Various error messages have been improved in this release as well. The SyntaxError exception has better diagnostic output in a number of cases, including pointing to the opening brace or parenthesis when the closing delimiter is missing, rather than pointing to the wrong location or giving the dreaded "unexpected EOF while parsing" message:

expected = {9: 1, 18: 2, 19: 2, 27: 3, 28: 3, 29: 3, 36: 4, 37: 4,
            38: 4, 39: 4, 45: 5, 46: 5, 47: 5, 48: 5, 49: 5, 54: 6,
some_other_code = foo()

Previous versions of the interpreter reported confusing places as the location of the syntax error:

File "example.py", line 3
    some_other_code = foo()
                    ^
SyntaxError: invalid syntax

but in Python 3.10 a more informative error is emitted:

File "example.py", line 1
    expected = {9: 1, 18: 2, 19: 2, 27: 3, 28: 3, 29: 3, 36: 4, 37: 4,
               ^
SyntaxError: '{' was never closed

That fix was inspired by similar error messages in PyPy. Several other syntax errors, for things like missing commas in dict or list literals, missing colons before blocks (e.g. after while, if, for, etc.), unparenthesized tuples as targets in comprehensions, missing colons in dict literals, and more, have all gotten revamped messages and indicators to make it easier to diagnose the problem. Beyond that, an IndentationError will indicate what kind of block was expecting the indentation, which should help track down a problem of that sort. The AttributeError and NameError exceptions will now give suggestions of similar names, under the assumption that the error is actually a typo; those suggestions are only given if PyErr_Display() is called, however, which is not the case for some alternate read-eval-print loops (REPLs), such as IPython.

Type hints

There are several upgrades to the feature introduced in PEP 484 ("Type Hints"). The most visible new feature will likely be the new union operator specified in PEP 604 ("Allow writing union types as X | Y"). As the title indicates, types can now be separated by the "|" operator to indicate that multiple types are accepted. The release notes show that it is a big upgrade in readability:

In previous versions of Python, to apply a type hint for functions accepting arguments of multiple types, typing.Union was used:
def square(number: Union[int, float]) -> Union[int, float]:
    return number ** 2
Type hints can now be written in a more succinct manner:
def square(number: int | float) -> int | float:
    return number ** 2

The operator can be used to "or" types in isinstance() and issubclass() calls as well. A new meta-type has been added with PEP 613 ("Explicit Type Aliases") so that static type-checkers and other programs can more easily distinguish type aliases from other module-level variables. As would be expected, the PEP gives lots of examples of the kinds of the problems the PEP is meant to solve, but the example in the release notes gives the general idea:

PEP 484 introduced the concept of type aliases, only requiring them to be top-level unannotated assignments. This simplicity sometimes made it difficult for type checkers to distinguish between type aliases and ordinary assignments, especially when forward references or invalid types were involved. Compare:
StrCache = 'Cache[str]'  # a type alias
LOG_PREFIX = 'LOG[DEBUG]'  # a module constant
Now the typing module has a special value TypeAlias which lets you declare type aliases more explicitly:
StrCache: TypeAlias = 'Cache[str]'  # a type alias
LOG_PREFIX = 'LOG[DEBUG]'  # a module constant

One related change that was planned for 3.10 has been put on the back burner, at least for now. When originally specified in PEP 3107 ("Function Annotations") and PEP 526 ("Syntax for Variable Annotations"), annotations were presented as a way to attach the value of an expression to function arguments, function return values, and variables. The intent was to associate type information that could be used by static type-checkers to those program elements.

Forward references in annotations and a few other problems led to PEP 563 ("Postponed Evaluation of Annotations"), which sought to delay the evaluation of the annotation values until they were actually being used. That new behavior was gated by a __future__ import, but was slated to become the default in 3.10, with no way to request the previous semantics. That would not change things for static type-checkers, which do their own parsing separate from CPython, but it was a rather large change for run-time users of the annotations.

There seems to have been an unspoken belief that run-time users of annotations would be rare—or even nonexistent. But, as the 3.10 alpha process proceeded, it became clear that the PEP 563 solution might not be the best way forward. In PEP 649 ("Deferred Evaluation Of Annotations Using Descriptors"), Larry Hastings pointed out a number of problems he saw with PEP 563 and offered an alternate solution. The maintainer of the pydantic data-validation library, which uses type annotations at run time, noted the problems he has encountered trying to support PEP 563; he implored the steering council to adopt PEP 649 in its stead.

While the council did not do that, it did put the brakes on making PEP 563 the default in order to give everyone some time (roughly a year until Python 3.11) to determine the best course without the time pressure of the imminent feature-freeze. In the meantime, though, the annotation oddities that Hastings noticed elsewhere in the language did get fixed so that annotations are now handled more consistently throughout Python.

Other bits

There are plenty of other features, fixes, and changes coming in the new version. The release notes show a rather eye-opening body of work for roughly a year's worth of development. For example, one of the most basic standard types, int, had a new method added in 3.10: int.bit_count() gives the "population count" of the integer, which is the number of ones in the binary representation of its absolute value. Discussion of this "micro-feature" goes back to 2017.

The pair of security vulnerabilities from February that led to fast-tracked releases for all of the supported Python versions have, naturally, been fixed in 3.10. The fix for the buffer overflow when converting floating-point numbers to strings was not mentioned in the release notes, presumably because it is not exactly highly visible—the interpreter simply no longer crashes. The second vulnerability led to a change in the urllib.parse module that users may need to pay attention to.

Back in the pre-HTML5 days, two different characters were allowed in URLs for separating query parameters: ";" and "&". HTML5 restricts the separator character to only be "&", but urllib.parse did not change until a web-cache poisoning vulnerability was reported in January. Now, only a single separator is supported for urllib.parse.parse_qs() and urllib.parse.parse_qsl(), which default to "&". Those changes also affect cgi.parse() and cgi.parse_multipart() because they use the urllib functions. In addition, urllib.parse has been fixed to remove carriage returns, newlines, and tabs from URLs in order to avoid certain kinds of attacks.

Another security change is described in PEP 644 ("Require OpenSSL 1.1.1 or newer"). From 3.10 onward, the CPython standard library will require OpenSSL version 1.1.1 or higher in order to reduce the maintenance burden on the core developers. OpenSSL is used by the hashlib, hmac, and ssl modules in the standard library. Maintaining support for multiple older versions of OpenSSL (earlier Pythons support OpenSSL 1.0.2, 1.1.0, and 1.1.1) combined with various distribution-specific choices in building OpenSSL has led to something of a combinatorial explosion in the test matrix. In addition, the other two versions are no longer getting updates; OpenSSL 1.1.1 is a long-term support release, which is slated to be supported until September 2023.

One feature that did not make the cut for the language is indexing using keywords, which is an oft-requested feature that has now presumably been laid to rest for good. The basic idea is to apply the idea of keyword function arguments to indexing:

    print(matrix[row=4, col=17])
    some_obj[1, 2, a=43, b=47] = 23
    del(grid[x=1, y=0, z=0])

The most recent incarnation is PEP 637 ("Support for indexing with keyword arguments") but the idea (and lengthy discussions) have gone back to 2014—at least. The steering council rejected the PEP in March; "fundamentally we do not believe the benefit is great enough to outweigh the cost of the new syntax". The PEP will now serve as a place for people to point the next time the idea crops up on python-ideas or elsewhere; unless there are major changes in the language or use cases, there may well be no need to discuss the idea yet again. That is part of the value of rejected PEPs, after all.

The future

Development is already in progress for Python 3.11; in fact, there is already a draft of the "what's new" document for the release. It can be expected in October 2022. With luck, it will come with major CPython performance improvements. It will likely also come with the exception groups feature that we looked at back in March; the feature was postponed to 3.11 in mid-April. In addition, of course, there will be lots of other changes, fixes, features, and such, both for 3.11 and for the much nearer 3.10 release. Python marches on.

Comments (9 posted)

Landlock (finally) sets sail

By Jonathan Corbet
June 17, 2021

Kernel development is not for people who lack persistence; changes can take a number of revisions and a lot of time to make it into a mainline release. Even so, the story of the Landlock security module, developed by Mickaël Salaün, seems like an extreme case; this code was merged for 5.13 after more than five years of development and 34 versions of the patch set. This sandboxing mechanism has evolved considerably since LWN covered version 3 of the patch set in 2016, so a look at what Landlock has become is warranted.

Like seccomp(), Landlock is an unprivileged sandboxing mechanism; it allows a process to confine itself. The long-term vision has always included adding controls for a wide range of possible actions, but those in the actual patches have been limited to filesystem access. In the early days, Landlock worked by allowing a process to attach BPF programs to various security hooks in the kernel; those programs would then make access-control decisions when asked. BPF maps would be used to associate specific programs with portions of the filesystem, and a special seccomp() mode was used to control the whole thing.

The goals behind Landlock have never been particularly controversial, but the implementation is a different story. The use of BPF was questioned even before making BPF available to unprivileged users in any context fell out of favor. It was also felt that seccomp(), which controls access to system calls, was a poor fit for Landlock, which does not work at the system-call level. For some time, Salaün was encouraged by reviewers to add a set of dedicated system calls instead; it took him a while to give that approach a try.

In the end, though, dedicated system calls turned out to be the winning formula. Version 14 of the patch set, posted in February 2020, dropped BPF in favor of a mechanism for defining access-control rules and added a multiplexing landlock() system call to put those rules into force. The 20th version split the multiplexer into four separate system calls, but one of those was dropped in the next revision. So Landlock, as it will appear in 5.13, will bring three system calls with it.

The 5.13 Landlock API

The first of those system calls creates a rule set that will be used for access-control decisions. Each rule set must be given a set of access types that it will handle. To define a rule set that can handle all action types, one would start like this:

    struct landlock_ruleset_attr ruleset_attr = {
        .handled_access_fs =
            LANDLOCK_ACCESS_FS_EXECUTE |
            LANDLOCK_ACCESS_FS_WRITE_FILE |
            LANDLOCK_ACCESS_FS_READ_FILE |
            LANDLOCK_ACCESS_FS_READ_DIR |
            LANDLOCK_ACCESS_FS_REMOVE_DIR |
            LANDLOCK_ACCESS_FS_REMOVE_FILE |
            LANDLOCK_ACCESS_FS_MAKE_CHAR |
            LANDLOCK_ACCESS_FS_MAKE_DIR |
            LANDLOCK_ACCESS_FS_MAKE_REG |
            LANDLOCK_ACCESS_FS_MAKE_SOCK |
            LANDLOCK_ACCESS_FS_MAKE_FIFO |
            LANDLOCK_ACCESS_FS_MAKE_BLOCK |
            LANDLOCK_ACCESS_FS_MAKE_SYM,
    };

(This example and those that follow were all taken from the Landlock documentation).

Once that structure is defined, it can be used to create the rule set itself:

    int landlock_create_ruleset(struct landlock_ruleset_attr *attr,
    				size_t attr_size, unsigned int flags);

The attr_size parameter must be the size of the landlock_ruleset_attr structure (which allows for future expansion in a compatible manner); flags must be zero (with one exception, described below). If all goes well, the return value will be a file descriptor representing the newly created rule set.

That set does not actually contain any rules, yet, so it is of limited utility. The 5.13 version of Landlock only supports a single type of rule, controlling access to everything contained within (and below) a given directory. The first step is to define a structure describing what accesses will be allowed for a given subtree; to limit access to reading and executing, one could use something like this:

    struct landlock_path_beneath_attr path_beneath = {
        .allowed_access =
            LANDLOCK_ACCESS_FS_EXECUTE |
            LANDLOCK_ACCESS_FS_READ_FILE |
            LANDLOCK_ACCESS_FS_READ_DIR,
    };

The landlock_path_beneath_attr structure also contains a field called parent_fd that should be set to a file descriptor for the directory where the rule is to be applied. So, for example, to limit access to /usr to the above operations, a process could open /usr as an O_PATH file descriptor, assigning the result to path_beneath.parent_fd. Finally, this rule should be added to the rule set with:

    int landlock_add_rule(int ruleset_fd, enum landlock_rule_type rule_type,
			  void *rule_attr, unsigned int flags);

Where ruleset_fd is the file descriptor representing the rule set, rule_type is LANDLOCK_RULE_PATH_BENEATH (the only supported value, currently), rule_attr is a pointer to the structure created above, and flags is zero. The return value will be zero if all goes well. Multiple rules can be added to a single rule set.

The rule set has now been defined, but is not yet active. To bind itself to a given set, a process will call:

    int landlock_restrict_self(int ruleset_fd, unsigned int flags);

Once again, flags must be zero. This operation will fail unless the process has previously called prctl() with the PR_SET_NO_NEW_PRIVS operation to prevent the acquisition of capabilities through setuid programs. Multiple calls may be made to landlock_restrict_self(), each of which will increase the number of restrictions in force. Once a rule set has been made active, it cannot be removed for the life of the process. Rules enforced by Landlock will be applied to any child processes or threads as well.

For the curious, there is a sample sandboxing program using Landlock that was added in this commit.

After 5.13

Landlock is useful in its current form, but it can be expected to gain a number of new features in future kernel releases now that the core infrastructure is in place. That could present a problem for sandboxing programs, which would like to use those newer features but must be prepared to cope with older kernels that lack them. To help future application developers, Salaün added a mechanism to help determine which features are available. If landlock_create_ruleset() is called with flags set to LANDLOCK_CREATE_RULESET_VERSION, it will return an integer value indicating which version of the Landlock API is supported; currently that value will always be one. When new features are added, the version number will be increased; developers will thus be able to use the version to know which features are supported on any given system.

Landlock has clearly reached an important milestone after more than five years of work, but it seems just as clear that this story is not yet done. After perhaps taking a well-deserved break, Salaün can be expected to start fleshing out the set of Landlock features; with luck, these will not take as long to find acceptance in the kernel community. There may come a time when Landlock can do much of what seccomp() can do, but perhaps in a way that is easier for application developers to use.

Comments (9 posted)

Protecting control dependencies with volatile_if()

By Jonathan Corbet
June 18, 2021

Memory ordering issues are, as Linus Torvalds recently observed, "the rocket science of CS". Understanding memory ordering is increasingly necessary to write scalable code, so kernel developers often find themselves having to become rocket scientists. The subtleties associated with control dependencies turn out to be an especially tricky sort of rocket. A recent discussion about how to force control dependencies to be observed shows the sorts of difficulties that arise in this area.

Control dependencies

The C programming language was designed in the era of simple, uniprocessor computers. When a developer wrote a series of C statements, they could expect those statements to be executed in the order written. Decades later, though, the situation has become much more complicated; code can be extensively optimized by both compilers and CPUs to the point that it bears little resemblance to what was originally written. Code can be reordered and even eliminated if the compiler (or the processor) thinks that the end result will be the same. The effects of this reordering on single-threaded code are (in the absence of bugs) limited to making it run faster. When there are multiple threads of execution running simultaneously, though, there can be surprises in store. One thread may observe things happening in a different order than others, leading to all sorts of unfortunate confusion.

When the visible order of operations across processors is important, developers will often use barriers to ensure that operations are not reordered in damaging ways. There are, however, cases where developers can count on things happening in the right order because there is no alternative; these are described in terms of "dependencies". There are three broad classes of dependencies, described in this article from our recent lockless patterns series. Consider, for example, a simple data dependency:

    int x = READ_ONCE(a);
    WRITE_ONCE(b, x + 1);

The write to b simply cannot be reordered ahead of the read of a because neither the compiler nor the CPU knows what value should be written. The write has a data dependency on the preceding read; that dependency will prevent those two operations from being reordered. That, of course, assumes that the compiler does not conclude that it already knows what the value of a will be, perhaps from a previous read; that is why READ_ONCE() is used. The second article in the lockless patterns series describes READ_ONCE() and WRITE_ONCE() in detail.

Control dependencies are a bit more complex. Consider code like this:

    if (READ_ONCE(a))
    	WRITE_ONCE(b, 1);

There is no data dependency linking the read of a and the write to b, but that write can only occur if a has a non-zero value; the read of a must thus occur before the write. This ordering forced by a conditional branch is a control dependency. More generally, there are three things that must be present to establish a control dependency:

A read from one location (a in the case above)
A conditional branch that depends on the value that was read
A write to another location in one or more branches

When those conditions exist, there is a control dependency from the read to the write that prevents the two operations from being reordered with respect to each other.

The evil optimizing compiler

Or, at least, it would be nice if things worked that way. The problem is that, while the hardware works that way, the C language does not recognize the existence of control dependencies or, as the infamous kernel memory-barriers.txt document puts it: "Compilers do not understand control dependencies. It is therefore your job to ensure that they do not break your code." While there does not appear to be much of a history of code being broken through overly aggressive optimization of code with control dependencies, it is something that developers worry about. That has led to the proposal by Peter Zijlstra of a mechanism called volatile_if().

What sort of problem is this patch trying to address? Consider an example posted by Paul McKenney in the discussion:

    if (READ_ONCE(A)) {
	WRITE_ONCE(B, 1);
	do_something();
    } else {
	WRITE_ONCE(B, 1);
	do_something_else();
    }

This code has a control dependency between the read of A and the writes to B; each write is in a branch of the conditional statement and the fact that they write the same value does not affect the dependency. So one might conclude that the two operations could not be reordered. Compilers, though, might well rearrange the code to look like this instead:

    tmp = READ_ONCE(A);
    WRITE_ONCE(B, 1);
    if (tmp)
	do_something();
    else
	do_something_else();

This code looks equivalent, but the test on the value read from A no longer occurs before the write to B. That breaks the control dependency, freeing a sufficiently aggressive CPU to move the write ahead of the read, possibly creating a subtle and unpleasant bug.

Since C doesn't recognize control dependencies, avoiding this kind of bug can be difficult, even in cases where the developer is aware of the problem. One sure solution is to read A with acquire semantics and write B with release semantics, as described in the lockless patterns series, but acquire and release operations can be expensive on some architectures. That expense is not usually needed in this case.

volatile_if()

Zijlstra wrote in his proposal that a good solution would be to add a qualifier to the if statement to indicate that a dependency exists:

    volatile if (READ_ONCE(A)) {
    	/* ... */

The compiler would respond by ensuring that a conditional branch is emitted and that code from within the branches is not lifted out of those branches. That, however, requires cooperation from compiler writers; as Segher Boessenkool noted, that is unlikely to happen unless the standards committee gives its blessing to the idea of putting qualifiers like volatile on statements. Failing that, Zijlstra proposed a magic macro:

    volatile_if(condition) {
    	/* true case */
    } else {
        /* false case */
    }

He provided implementations for a number of architectures; these generally depend on hand-written assembly code to manually emit the conditional branch instruction needed to create the control dependency at the CPU level.

The resulting discussion focused on two main topics: the implementation of volatile_if() and whether it is needed at all. On the implementation side, Torvalds suggested a simpler approach:

    #define barrier_true() ({ barrier(); 1; })
    #define volatile_if(x) if ((x) && barrier_true())

The barrier() macro causes no code to be emitted; it is just an empty block presented to the compiler as assembly code. That keeps the compiler from reordering operations from one side of the barrier to the other; it also, Torvalds said, would force the compiler to emit the branch since it could only be evaluated on the "true" side of the branch. Life turned out to not be so simple, though; a redefinition of barrier() along the lines suggested by Jakub Jelinek would be required to make this scheme actually work.

But Torvalds also wondered why developers were worried about this problem in the first place, since he does not think it can manifest in real code:

Again, semantics do matter, and I don't see how the compiler could actually break the fundamental issue of "load->conditional->store is a fundamental ordering even without memory barriers because of basic causality", because you can't just arbitrarily generate speculative stores that would be visible to others.

And, indeed, evidence of such problems actually occurring is hard to find. He did eventually come around to seeing that a problem could potentially exist but also made it clear that he doesn't think there is any code in the kernel now that would be affected by it.

The conversation (eventually) wound down without coming to any real conclusion on whether volatile_if() is needed or not. Experience says, though, that wariness toward compiler optimizations is usually a good idea. Even if no mechanism for explicitly marking control dependencies is merged into the mainline now, it will be waiting in the wings should future compiler releases create problems.

Comments (39 posted)

A stable bug fix bites proprietary modules

By Jonathan Corbet
June 21, 2021

The kernel-development community has long had a tense relationship with companies that create and ship proprietary loadable kernel modules. In the view of many developers, such modules are a violation of the GPL and should simply be disallowed. That has never happened, though; instead, the community has pursued a policy of legal ambiguity and technical inconvenience to discourage proprietary modules. A "technical-inconvenience" patch that was merged nearly one year ago has begun to show up in stable kernel releases, leading at least one developer to complain that things have gone a little too far.

Code that is directly linked into the kernel can access any symbol that is visible to it. Loadable modules, instead, are restricted to a smaller (though still large) set of symbols that are explicitly "exported" for that purpose. Symbols that are exported with EXPORT_SYMBOL() are available to all loadable modules, while those exported with EXPORT_SYMBOL_GPL() can only be used by modules that declare a GPL-compatible license. A non-GPL-compatible module that tries to use a GPL-only symbol will fail to load.

The idea behind GPL-only exports is that the affected symbols are so deeply situated within the kernel that any module using them must be a derived product of the kernel and, thus, be subject to the requirements of the GPL. In practice, that sort of analysis is rarely (if ever) done, and the decision of whether to use a GPL-only export is left to individual developers. Many developers habitually use EXPORT_SYMBOL_GPL() for every symbol they export out of a general distaste for proprietary modules; some maintainers encourage this practice for code that passes through their hands.

Over the years, purveyors of proprietary modules have engaged in a number of tricks to get around GPL-only exports. One of those was manually looking up symbol addresses with kallsyms_lookup_name(); that practice was shut down in early 2020. Another is to split a module into two, one GPL-licensed and one proprietary. The GPL-licensed module interfaces directly with the kernel, using GPL-only symbols where needed; it then calls into the proprietary module, where all the real work gets done.

In July 2020, the posting of this kind of shim module created a stir on the mailing lists, leading to the posting by Christoph Hellwig of a patch set making this trick harder to exploit. Specifically, any module that uses symbols exported by a proprietary module is itself marked proprietary, regardless of the license it declares to the kernel. Modules that hook into proprietary modules, thus, will lose access to GPL-only symbols, making it impossible to perform the shim function they were created for in the first place. This series was merged for the 5.9 kernel release in October.

That was the end of that story — until May 2021, when that patch series found its way into the large 5.4.118, 4.19.191, and 4.14.233 updates. It seemingly took nearly another month for 5.4.118 to find its way into at least one distribution and create trouble for users, at which point Krzysztof Kozlowski asked why such a change was being included in a stable update:

How this is a stable material? What specific, real bug that bothers people, is being fixed here? Or maybe it fixes serious issue reported by a user of distribution kernel? IOW, how does this match stable kernel rules at all?
For sure it breaks some out-of-tree modules already present and used by customers of downstream stable kernels. Therefore I wonder what is the bug fixed here, so the breakage and annoyance of stable users is justified.

Stable-kernel maintainer Greg Kroah-Hartman responded:

It fixes a reported bug in that somehow symbols are being exported to modules that should not have been. This has been in mainline and newer stable releases for quite some time, it should not be a surprise that this was backported further as well.

That pretty much ended the conversation; others may be unhappy about this change making it into older kernels, but it is doubtful that anybody realistically expects that it could be reverted. It might be interesting, though, to watch kernel updates from distributors to see whether this additional restriction on proprietary modules is retained or quietly removed.

This change does show where at least one limit to the kernel's "no regressions" policy is to be found, though. The core idea behind that policy is that a kernel upgrade should never break a working system; kernel developers want users to feel confident that they can move to newer kernels without risking unpleasant surprises. This change does indeed create such a surprise for some users, according to Kozlowski. But kernel modules have never been included in the kernel's stability guarantees — not even the GPL-licensed ones. Kernel-developer tears are rarely shed when proprietary modules break, and they are not in evidence this time either.

When Hellwig's patch series was first posted, LWN noted that it did not close all of the loopholes. Specifically, a GPL-licensed module that wraps and re-exports symbols to a proprietary module will still work as designed as long as no symbols are imported from that proprietary module; this problem was pointed out by David Laight in this discussion. Kroah-Hartman responded with a promise to "work on fixing that up in a future patch series next week", so there may be more unpleasant surprises in store for the creators and users of proprietary loadable kernel modules.

The kernel community's policy on loadable modules has, over the years, drawn criticism from many sides. Some see even a grudging tolerance of proprietary modules as a weakening of the protections provided by the GPL and a cover for vendors that don't want to play by the rules. Others see it as an impediment to the use of Linux in general that reduces available hardware support and makes users jump through unnecessary hoops. The best way to judge this policy, though, is to look at what its results have been over nearly three decades.

Proprietary modules still exist, but they are in the minority; most hardware is supported with free drivers, and the situation seems to continue to slowly improve. Vendors that have clung to proprietary modules in the past have found ways to change their approach; this might not have happened if those vendors had been excluded from the Linux community entirely. So, perhaps, making life uncomfortable for distributors of such modules while not trying to ban them outright may be the most productive policy in the long run.

Comments (152 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>