LWN.net Weekly Edition for February 3, 2022

Welcome to the LWN.net Weekly Edition for February 3, 2022

This edition contains the following feature content:

Fedora and pkexec: the pkexec utility was recently found to contain a security vulnerability. But does a modern distribution need pkexec at all?
Python and deprecations redux: the Python community struggles with the best way to handle deprecated features.
An attic for LibreOffice Online: a proposal to put LibreOffice Online in an "attic" reveals ongoing disagreements within the Document Foundation.
Handling argc==0 in the kernel: how can the kernel protect applications from a maliciously empty argv vector without causing application regressions?
Restartable sequences in glibc: support for this scalability feature finally comes to the GNU C Library.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Fedora and pkexec

By Jake Edge
February 2, 2022

The nasty vulnerability in pkexec has been rippling through the Linux world, leading to lots of security updates to the underlying polkit authorization toolkit. It also led to a recent discussion on the Fedora devel mailing list about whether pkexec, which runs a program as another user, is actually needed—or wanted—in some or all of the distribution's editions. But pkexec is used by quite a few different Fedora components, particularly in desktop-oriented editions, and it could perhaps be a better choice than the alternatives for running programs with the privileges of another user.

Adam Williamson raised the issue on the day after the disclosure of the PwnKit flaw. It is, as he noted, a longstanding (since the first version of pkexec in 2009) local root privilege escalation. That started him thinking:

The issue and some of the comments around it prompted me to wonder - why is `pkexec` still a thing? Particularly, why is it still a thing we are shipping by default in just about every Fedora install?
My best recollection is that pkexec was kinda a kludge to allow us to get rid of consolehelper: some apps weren't getting rewritten to the Right Way of doing things under policykit, they still just wanted to have the entire app run as root, and pkexec was a way to make that happen.

consolehelper is another program that allows users to run code as other users, while PolicyKit is a former name for polkit. Back in 2013, the Fedora Usermode Migration feature targeted moving all users of consolehelper to polkit. Williamson wondered which parts of Fedora still needed pkexec at this point (given its "kludge" status). But, since it is included in the polkit package, which is used far more widely, perhaps pkexec could be split out into a sub-package that would only be installed where that piece is actually needed, he suggested.

In a followup message, Williamson noted that the bug ticket for the Usermode Migration showed that it had not been completed. There are still 15 packages in Fedora that require consolehelper. Peter Robinson said that most of those "appear to be legacy" packages that should not be in general use.

Michael J Gruber jokingly asked if he should switch his package away from pkexec and back to using the consolehelper-based beesu wrapper for the su command since beesu still exists in Fedora. But Williamson said that is not his point at all:

The issue is not that pkexec is inherently worse than any other tool to do approximately the same thing (prompt for some kind of password, then run the entire app as root) - it's unfortunate that pkexec happened to have a giant security flaw, but it's not unlikely that other tools to do the same thing will turn out to have security flaws if someone decides to take a close look at them. The issue is that *that whole design* is suboptimal.

The idea was that applications would move to using the other mechanisms provided by polkit in order to do their privileged operations, he said. For applications that will not make that transition, pkexec was a "less-good second choice option" to allow them to continue to work. Removing consolehelper was meant to reduce the number of "run things as root" tools that the distribution needed to pay attention to, though that has not ever fully been completed. "Switching from pkexec to any other 'run-this-thing-as-root' tool would not be an improvement."

If you are going to run programs as root anyway, though, pkexec is probably better than using sudo or other options, Lennart Poettering said. pkexec already uses the polkit interprocess communication (IPC) mechanism to request running a binary as root, which is a superior model in his mind:

I mean, polkit has some issues, but I am pretty sure that "pkexec" is not what I'd consider the big problem with it. Or to say this differently: the whole concept of tools like su/sudo/setpriv/runuser/suid binaries is [questionable]: i.e. I am pretty sure we'd be better off if we would systematically prohibit acquiring privs through execve(), and instead focus on delegating privileged operations to IPC services — but of course that would be quite a departure from traditional UNIX.
[...] "pkexec" is a *short* program, it runs very little code with privileges actually. That makes it a *ton* better than the [humongous] code monster that "sudo" is. It has a smaller security footprint, and is easier to review than "sudo". That's worth a lot actually.

But Sam Varshavchik objected to the idea that moving the privileged operation to the other end of an IPC mechanism, such as a socket, really would solve the underlying problem. Having programs that run as root, which have bugs, is actually the problem at hand:

The same bug that's exploitable in a suid [or setuid] binary will also be exploitable, exactly the same way, in its suid-less equivalent. If you have a buffer overrun in a suid binary as a result of carefully-crafted command-line parameters or environment, then if you replace the suid binary with an identical bug-for-bug implementation that uses a socket to carefully pass along the same environment or parameters to a native root binary, and the buffer overrun is the same, guess what: you still have the same exploit.
suid is not the problem. An execved program will inherit the environment, some open file descriptors, and maybe a few other things that a standalone daemon that accepted a socket connection does not have. But that's what most exploits leverage, so cleaning up the environment and open file descriptors would remedy that. It will take some effort to exploit whatever remains.

Poettering did not see things that way, however. There is an enormous amount of state that comes along for the ride when executing a setuid binary, and that state is growing over time:

The thing with setuid/setgid is that the invoked privileged process inherits a lot of implicit state and context that people aren't really aware of or fully understand. i.e. it's not just env vars and argv[], it's cgroup memberships, audit fields, security contexts, open fds, child pids, parent pids, cpu masks, IO/CPU scheduling priorities, various prctl() settings, tty control, signal masks + handlers, … and so on. And it's not even clear what gets inherited as many of these process properties are added all the time.
If you do privileged execution of specific operations via IPC you get the guarantee that whatever is done, is done from a well-defined, pristine execution environment, without leaking context implicitly. The IPC message is the *full* vulnerable surface, and that's as minimal as it can get. And that's great.

While all of that context and state is present for the setuid program, the underlying problem is in the program itself, Varshavchik said:

And none of that makes any difference, whatsoever, unless there's also an underlying logical bug in the code itself. And that's going to be the real problem. So then, question becomes, how many exploits leveraged any of these exotic resources, like cgroup members, tty control, and all of those other fringe attributes? I suppose there might've been a few, but I can't recall many. On the other hand, exploits that accomplish execve("/bin/bash") would work equally well, given the same underlying bug in the root daemon, that's triggerable via its front end, or in a suid program. A sanitized environment won't be much of a help.

But sanitizing the environment fully is difficult or impossible to do for the setuid program, Poettering said. There are a number of interacting attributes that are under the control of the caller (read attacker) of the setuid binary—and that number keeps increasing:

You might *hope* that there was no bug introduced through the interaction of all that code, but the point I am making is that "struct task" and associated objects in kernel get extended all the time, and suid programs that didn't expect these additions will be exposed to them, and cannot sanely reset them even if they wanted, since typically developers cannot look into the future (and the cgroup stuff cannot even be reset reasonably at all).

Varshavchik wanted "hard data" on the kinds of vulnerabilities being envisioned here. He and Poettering went back and forth about what constituted said data, without coming to any real agreement. Poettering ended his participation in that sub-thread by noting that he understands "your love [for] suid and sudo" but that he and others consider them to be "a very bad idea". For his part, Varshavchik is not particularly enamored with those mechanisms:

I love suid and sudo no more, no less, and exactly the same as I love my screwdriver. I'm married to neither. Both are just tools. Both can be used, by people, correctly. Both can be used also, by people, incorrectly. But it makes no sense to use that as a reason for replacing screwdrivers with wrenches.

Demi Marie Obenour suggested that the state information that gets inherited by a setuid binary should be chosen by that binary: "State inheritance definitely should be opt-in, with the possible exception of certain audit data such as the audit user ID." Steve Grubb, who works on the Linux audit system, said that auditing relies on binaries inheriting security contexts and audit information, so the existing polkit mechanisms fall short. Had a kernel mechanism, like kdbus, been adopted, that problem would have been avoided. As it stands, polkit "doesn't satisfy our security requirements":

[...] it's impossible to reason about what's allowed and what's not because the policy is free-form javascript rather than assignments that can be checked by any configuration scanners. And access decisions do not go through the audit system. So, we really have little insight into access control and information flows from anything using IPC started applications.
Setuid may be distasteful to some people. But it's properties are well known and its fully plumbed to make some assurance arguments about the platform.

He was not alone in complaining about polkit's JavaScript configuration mechanism, as that is seen as something of poor design decision by many. Meanwhile, Andrew Lutomirski said that getting an in-kernel IPC mechanism is a difficult problem; he had some thoughts on setuid as well:

I would love to see a nice security model for IPC in the Linux kernel, but I haven't seen a credible proposal. This problem is *hard*.
As for setuid, I'm strongly of the opinion that setuid was and remains a mistake. Executing a process should never add privileges.

Circling back to his original message, Williamson noted that the thread made it clear that pkexec is currently needed by at least some of the desktop editions/spins of Fedora, though: "It's not an easy question to answer since our packaging doesn't distinguish between something needing *polkit* and something needing *pkexec*." Fedora project leader Matthew Miller thought that at least could be changed: "I wonder if a first step might be to split out pkexec into a subpackage, and then gradually remove requirements on it."

That's where things stand at this point. Pulling pkexec out into its own package seems like a good plan, regardless of whether it should slowly be removed from use or not; it will make tracking which other packages use it that much easier should another vulnerability arise. A longer-term shift toward more—or exclusively—polkit-style privilege enhancement may not necessarily be in the offing. The consensus on that path is not entirely clear and it is also an area where the distribution does not want to see a lot of churn—at least without a lot of code review to go with it.

As with most (all?) vulnerabilities, the real culprit is the lack of focused security review of code. For setuid binaries and other code that runs as root, that review is imperative, but it is far from clear that a huge amount of it is being done. Every time another widespread vulnerability crops up we get reminded of that, but the impact seems to fade rather quickly and things return to "normal". That is a rather worrisome state of affairs.

Comments (41 posted)

Python and deprecations redux

By Jake Edge
February 1, 2022

The problem of how to deprecate pieces of the Python language in a minimally disruptive way has cropped in various guises over the last few years—in truth, it has been wrangled with throughout much of language's 30-year history. The scars of the biggest deprecation, that of Python 2, are still rather fresh, both for users and the core developers, so no one wants (or plans) a monumental change of that sort. But the language community does want to continue evolving Python, which means leaving some "baggage" behind; how to do so without leaving further scars is a delicate balancing act, as yet another discussion highlights.

We looked in on some discussion of the topic back in December, but the topic pops up frequently. There is a policy on handling deprecations that is described in PEP 387 ("Backwards Compatibility Policy"), but the reality of how they are handled is often less clear-cut. Python has several warnings that can be raised when features slated for deprecation are used: PendingDeprecationWarning and DeprecationWarning. The former is meant to give even more warning for a feature that will coexist with its replacement for multiple releases, while the latter indicates something that could be removed two releases after the warning is added—effectively two years based on the relatively recent annual release cycle.

But, as noted in that earlier discussion, the deprecation period is for a minimum of two release cycles. There are concerns that time frame is being treated as a deadline of sorts—to the detriment of some parts of the ecosystem. So on January 18, Victor Stinner, Tomáš Hrnčiar, and Miro Hrončok proposed postponing some deprecations that had been scheduled for Python 3.11, which is due in October. The message referred to an early January posting by Hrnčiar to the Python discussion forum that described the problems Fedora had encountered when building its packages using a development version of 3.11.

In particular, two specific sets of deprecations were causing the most trouble for Fedora packages. Removing deprecated aliases from the unittest module (bug 45162) and getting rid of deprecated pieces from the configparser module (bug 45173) led to the bulk of the problems that Fedora encountered. The unittest deprecation caused 61 Fedora packages to break, while the configparser changes broke another 28. In the proposal, Stinner said that they and others had reported the problems upstream and often contributed a fix, but that there is still a lengthy process before the changes actually reach the distribution:

The problem is that fixing a Fedora package requires multiple steps:

Propose a pull request upstream
Get the pull request merged upstream
Wait for a new release upstream
Update the Fedora package downstream, or backport the change in Fedora (only needed by Fedora)

Reverting those two changes, which caused most of the problems Fedora has run into in its testing of the new version of Python, will allow for "more time on updating projects to Python 3.11 for the other remaining incompatible changes". As reported by Hrnčiar, four other changes led to problems building Python packages, but those were fewer in number.

Silencing deprecations

In a reply to the proposal, Antoine Pitrou wondered whether it showed "that making DeprecationWarning silent by default was a mistake?" He is referring to the changes to the visibility of DeprecationWarning that have occurred over the years. While DeprecationWarning is useful for the developers of a Python package, it is often seen by users, who may not be in a position to do much about it. The warnings were made invisible by default for Python 2.7 and 3.2 (in 2010 and 2011), but that policy was changed for Python 3.7 in 2017 with PEP 565 ("Show DeprecationWarning in __main__").

Guido van Rossum did not think that the evidence was quite that clear, but deprecations are tricky:

At best it shows that deprecations are complicated no matter how well you plan them. I remember that "noisy by default" deprecation warnings were widely despised.

Some ideas of further tweaks that could be made to the visibility of the warnings were raised. Richard Damon suggested having them only be visible when running unit tests. It turns out that pytest already enables those warnings, as Brett Cannon pointed out. That is something of a double-edged sword, though, Christopher Barker noted: "It's really helpful for my code, but they often get lost in the noise of all the ones I get from upstream packages." Gregory P. Smith pointed out that the standard library unit tests enable the warnings as well; "Getting the right people to pay attention to them is always the hard part."

Fixing deprecations

There was a bit of discussion about how to silence warnings from imported modules, possibly semi-automatically, but Steven D'Aprano had a bit of a warning about that approach:

If we use a library, then we surely care about that library working correctly, which means that if the library generates warnings, we *should* care about them. They are advanced notice that the library is going to break in the future.
Of course I understand that folks are busy maintaining their own project, and have neither the time nor the inclination to take over the maintenance of every one of their dependencies. But we shouldn't just dismiss warnings in those dependencies as "warnings I don't care about" and ignore them as Not My Problem.
Like it or not, it is My Problem and we should care about them.

In the world of open-source software, the lines between users and "vendors" of software are blurred, he said. Users often have the ability, and certainly have the legal right, to change the code based on observing problems of this (or any other) nature, but there is something of a social problem, "and you cannot fix social problems with technology". Ignoring warnings breaks some assumptions about how open source works:

The open source mantra about many eyes making bugs shallow doesn't work when everyone is intentionally closing their eyes to the warnings of pending bugs.

Barker said that he does try to submit fixes upstream when he notices problems of that sort, as did others in the thread. There is still the problem, mentioned by Stinner, that even once fixes are contributed, releases including them may still take a while; as Stephen J. Turnbull put it: "even if you submit a patch, there's no guarantee that the next version (or three) will contain it".

With regard to silencing DeprecationWarning, Steve Dower said that it was not necessarily a mistake to do so:

If we'd gone the other way, perhaps we'd be looking at massive complaints from "regular" end users about all the noisy warnings that they can't fix and saying that making it noisy was the mistake.

He was not opposed to reverting the changes as proposed, though he thought it might be "a bit premature" to do so now, roughly nine months before the release. They can be reverted closer to the release if the packages in question still are not fixed (and released). If they do get reverted now, because "they cause churn for no real benefit", that would be reasonable; those who are opposed can argue that the benefit is real, however, "as long as they also argue in favour of the churn". He also made a broader point:

We shouldn't pretend to be surprised that something we changed causes others to have to change. We *know* that will happen. Either we push forward with the changes, or we admit we don't really need them.

Stinner pointed to two different examples of the kinds of problems that Fedora has found by testing with development versions of upcoming Python releases. There are advantages to finding these problems as early as possible: "If issues are discovered earlier, we get more time to discuss and design how to handle them." He thinks it makes sense to revert these particularly problematic deprecations now because it will help flush out more problems further down in the dependency chain:

In Fedora, if a frequently used dependency is broken, a long list of packages "fail to build". (In Fedora, the package test suite must pass to build a package successfully.) If it takes 9 months to fix this dependency, we will likely miss other issues before the Python final version in dependent packages.

Sebastian Rittau said "that some (semi-) automated way to actively test and notify important projects of deprecations/removals before a release would be a great addition to the Python ecosystem", though he acknowledged that it might be difficult to do. Stinner replied that, in effect, Fedora is already doing that, albeit with "changes already merged in Python". He has done some work on ways to automatically test Python with patches applied, to test upcoming or proposed changes, but it turned out to be rather complicated.

Smith was also in favor of the reversions; he thanked the Fedora team for helping bring these problems to light, and noted that being proactive is a better way forward:

Deprecation removals are hard. Surfacing these to the impacted upstream projects to provide time for those to integrate the changes is the right way to make these changes stick in 3.12 or later. [...]
As you've done the work to clean up a lot of other OSS projects, I suggest we defer this until 3.12 with the intent that we won't defer it again. That doesn't mean we can't hold off on it, just that we believe pushing for this now and proactively pushing for a bunch of cleanups has improved the state of the world such that the future is brighter. That's a much different strategy than our passive aggressive DeprecationWarnings.

Toward the end of the original proposal message, Stinner had some thoughts on being even more proactive in the future. He suggested that before making an incompatible change, doing a search of the Python Package Index (PyPI) for uses of the feature in question "and try to update these projects *before* making the change". Once the number of affected projects has been reduced to some low number (he suggested 15), the change could be made in Python.

The Python ecosystem is huge, with an amazing number of projects, libraries, packages, tools, and so on, subsets of which are gathered up together into Linux (and other) distributions. All of those packages support differing ranges of Python versions, which makes the job of distributions that much harder, since they typically settle on one Python version to maintain throughout the life of a particular distribution release. Deprecating pieces along the way makes that ever more difficult, of course.

There are other software projects that take a different approach; the Linux kernel somewhat famously almost never deprecates something unless it truly can no longer be supported (e.g. ancient hardware or an API that leads to a security hole), but Python (and some other languages) have not chosen that course. There are certainly advantages to leaving things behind, especially when replacing them with something emphatically and unquestionably better, but it does have its downsides as well. It would seem that Python is drawing closer to finding the right balance when the deprecation route is taken, though there are always likely to be bumps along the way.

Comments (115 posted)

An attic for LibreOffice Online

By Jonathan Corbet
January 27, 2022

In mid-December, Thorsten Behrens, a board member for the Document Foundation (TDF), posted a seemingly simple proposal for an "attic" that would become the home of abandoned projects. No specific projects were named as the first intended residents of the attic, but the proposal clearly related to the LibreOffice Online (LOOL) project. The following discussion made it clear that the unhappiness around LOOL has yet to fade away, and that the Foundation still has some work to do when it comes to defining its relationship with its corporate members.

The Document Foundation is, of course, the entity charged with furthering the development of LibreOffice and related software. LOOL, which allows collaborative editing of documents, has been under development for a decade or so; that work picked up in 2015 when TDF announced that LOOL would be "developed into a state of the art cloud application". Over the years, though, the relationship between TDF and Collabora, the company doing the bulk of the development work for LOOL, began to sour. In mid-2020, the company complained that it could not bring in revenue to support LibreOffice developers when everything was available for free; among other things, Collabora wanted a clearer path toward making money on LOOL. Despite efforts on all sides to find a solution, Collabora stopped working on LOOL later that year, choosing instead to continue to develop that code outside the TDF under its "Collabora Online" brand.

The elephant in the attic

Since then, development on LOOL has come to a halt; without Collabora's contributions, there wasn't much to sustain an ongoing project. That left TDF with a body of abandoned code with no maintenance or credible prospects for future development. Not wanting to emulate Apache OpenOffice, TDF started looking for a solution to this problem; the answer was the attic proposal. Code that TDF cannot support would be "atticized" — moved to a read-only repository — available for anybody who wanted to do something with it, but not a target for ongoing development. The proposal also included a process for "un-atticization" should enough developers demonstrate interest (in the form of patches) in carrying the project forward.

The response to the proposal was swift, but few participants wanted to talk about the attic itself; instead, there was a fair amount of rehashing of the LOOL story and discussion of what should be done with that code now. Not everybody feels that LOOL should be moved into any sort of attic. Marco Marinello, for example, replied that what was really needed was a new set of agreements with TDF's member companies:

I have already said this many times but I want to repeat it: it has to be clear (and hopefully stated by legal contracts) to the companies working in the LibreOffice ecosystem that they cannot wake up one day and bring their development outside the LibreOffice project. They cannot stay with one foot inside the ecosystem, contributing to it, and with the other one bringing their development effort outside.

Just-elected TDF board member (and Collabora employee) Jan Holesovsky answered that a different approach was called for:

From my point of view, rather than legal contracts, a much better strategy is to listen to what the ecosystem companies or other contributors are telling you; work with them, instead of against them; treat them as partners, not as enemies. If you do that, there is no reason for anybody to leave the community, move the code away, or fork.

The fork of LOOL, Holesovsky continued, was the result of a failure to listen to contributors (and Collabora in particular), especially when it came to distributing a binary version of LOOL under the LibreOffice brand. Board member Paolo Vecchi responded with a somewhat different view of the events surrounding LOOL. The proposal to publish binary LOOL builds, he said, was meant to help schools and nonprofits dealing with COVID-related disruptions, which is well within TDF's public-benefit mission. He, too, called for agreements to be made with companies working on TDF projects:

It is totally fine if someone wants to start his/her own projects based on LibreOffice and host them under his/her rules. However, I don't think it is fine to benefit from TDF and the work of its community for years, and then change the rules and walk away.

He later added that, in his opinion, companies making major contributions to TDF projects should be made to agree to a "backporting agreement" for a one-year period; Behrens disagreed, saying that such a requirement would kill any willingness to bring projects into TDF at all.

On January 9, Marinello posted a counterproposal to putting LOOL in an attic. Under this plan, TDF would claim that Collabora Online was really "always LibreOffice Online" and provide daily builds of LibreOffice integrated with the Collabora Online code. Others would be given the right to redistribute these builds under the LOOL name. Unsurprisingly, outgoing board member (and Collabora employee) Michael Meeks disliked this idea, characterizing it as "let's de-fund the developers". Focusing on Collabora Online, he said, has led to "a new and better model for everyone". While it would be a good thing if TDF could drive sales for its members, that "has been repeatedly shown to be structurally impossible for TDF"; he suggested that was not an entirely bad thing. Meanwhile others, including Vecchi and Behrens expressed opposition to redistributing Collabora Online under the TDF umbrella.

Keeping this from happening again

The discussion went on for quite some time, with numerous people trying to place the blame for the "loss" of LOOL according to their view of the events. For all that, though, it was a surprisingly polite conversation, most of the time. The participants seem to feel that, regardless who is to blame for what happened with LOOL, the important thing is to figure out how to keep TDF viable and avoid similar problems in the future.

From the point of view of at least one large commercial member of TDF, the solution is to make sure that member companies don't feel like they are funding an organization that is then sabotaging their business plans. So TDF would need to avoid taking actions that reduce the value of commercial offerings — actions like offering binary versions of LOOL. Otherwise, it looks like TDF is using the members' own donations to compete with them. Industry consortia often need to walk this sort of tightrope, but many TDF members don't seem to think of it as an industry consortium.

The opposing point of view is that member companies should not be able to use TDF's resources and support to build a product, only to remove it from TDF once it can stand alone. Supporters of this viewpoint would like to see additional agreements put in place with member companies to prevent this from happening, especially for projects where a single company dominates the development effort. TDF, to some members, is meant to provide a public benefit rather than services to corporate members, and they are not happy when they feel that this benefit has been compromised.

As it happens, TDF has just elected a new board including many of the participants in this discussion. It would seem that this board should have, at the top of its list of priorities, coming up with a solution to this disagreement that can keep all of the stakeholders, if not happy, at least out of a state of complete disgruntlement. The January 14 board meeting, with both the new and old members, discussed the attic proposal (and punted any action pending further discussion) but did not address the bigger question; the agenda for the January 28 meeting does not mention it at all. TDF is an important institution in our community; its governance needs to come up with a solution to this question.

Comments (5 posted)

Handling argc==0 in the kernel

By Jonathan Corbet
January 28, 2022

By now, most readers are likely to be familiar with the Polkit vulnerability known as CVE-2021-4034. The fix for Polkit is relatively straightforward and is being rolled out across the net. The root of this problem, though, lies in a misunderstanding about how programs are run on Unix-like systems. This problem is highly likely to exist in other programs, so it would be nice to find a more general solution. The best place to address this issue may be in the kernel, but properly working around this misunderstanding without causing regressions is not an easy task.

I'd like to have an argument, please

Most developers are familiar with the prototype of the main program as expressed in the C language:

    int main(int argc, char *argv[], char *envp[]);

The program is invoked with its command-line arguments in argv and the environment in envp; both are pointers to null-terminated arrays of char * strings. The number of non-null entries in argv is stored in argc. This API is a user-space creation, though; what happens when the kernel first runs a program is a little bit different: on Linux, that program is passed a single pointer to the argv array. The envp array begins immediately after the NULL value that terminates argv. Thus, in a C program, the following statement will be true on entry to main():

    envp == argv + argc + 1

By convention, argv[0] is the name of the program that is being executed, and many programs rely on that convention. As it happens, though, this convention is exactly that: a convention, but not a guarantee. The actual contents of argv are entirely under the control of whoever calls execve() to run the program in the first place, and that caller is not required to put the program name in argv[0].

Indeed, the caller is not required to provide argv[0] at all. If the argv array passed to execve() is empty (or the argv pointer is simply NULL), the first pointer in the new program's argv array will be NULL, and the envp array will start immediately thereafter. Unfortunately, Polkit (or, more specifically, the setuid pkexec utility) "knew" that argv[0] would always be present, so it performed its argument processing by iterating over the argv array starting at argv[1]. If there are no arguments at all, argv[1] is the same as envp, so pkexec was iterating through its environment variables instead. Throw in some in-place argument modification (pkexec overwrites its argv array), and pkexec could be convinced to rewrite its environment variables, thus bypassing the sanitizing of those variables done for setuid programs. At that point, the game was over.

This problem is not new, and neither is awareness of it. Ryan Mallon wrote about it in 2013, noting that "it does allow for some amusing behaviour from setuid binaries"; he also evidently sent a Polkit patch to address it, but that patch was never applied. Even further back, in 2007, Michael Kerrisk reported the kernel's behavior as a bug, but the report was closed with little discussion. So the problem persisted, culminating in the vulnerability administrators are scrambling to patch now.

Toward a more general fix

Fixing this issue is a simple matter of making pkexec check that argc is at least one. But there are surely other programs out there containing similar assumptions. Given the strength of the argv[0] convention, it is natural to ask whether it makes sense to allow programs to be run with an empty argv array at all. Perhaps it doesn't, but the current API has a lot of history and cannot be changed without a lot of thought.

Ariadne Conill started the linux-kernel discussion with a patch that would simply disallow calls to execve() without at least one argv entry. Offending callers would get an EFAULT return value instead. This would solve the problem by providing a guarantee that argv would not be empty, but at the potential cost of introducing problems of its own. One is that, as Kees Cook discovered, there is actually a fair amount of code out there that calls execve() with an empty argv array. Conill wrote those off as "lazily-written test cases which should be fixed", but regressions in lazily-written test cases are still regressions. Also, as Heikki Kallasjoki and Rich Felker both pointed out, an empty argv array is actually allowed by the POSIX standard.

Felker also suggested an alternative with less potential for regressions: only enforce a non-empty argv at privilege boundaries — when execve() is being called to run a setuid program, in other words. Cook said that he would rather avoid taking the privilege boundary into account if possible, though. He proposed a different solution to the problem: inject an extra null pointer at the end of an empty argv array so that even code that tries to skip argv[0] will notice that there is nothing there. This solution will not work either, as it turns out: the ABI promise is that envp starts immediately after argv, and the extra NULL breaks that promise. There are evidently programs that rely on that layout and would break if it were changed.

Yet another approach, first suggested by Eric Biederman, would be to replace an empty argv with one containing a single pointer to an empty string. This proposal had some support (though nobody has implemented it as of this writing), but also provoked some worries of its own. Perhaps there are programs out there that will respond badly to an empty-string argument, or which do something special when argc is zero. Changing the number of arguments passed to the program run by execve() just looks like it could create surprises.

Cook eventually summarized the situation this way:

Given the code we've found that depends on NULL argv, I think we likely can't make the change outright, so we're down this weird rabbit hole of trying to reject what we can and create work-around behaviors for the cases that currently exist.

That notwithstanding, he then went on to express a preference for the initial change (simply disallowing a zero-argument execve() call anyway, albeit with an EINVAL return rather than EFAULT), with a suggestion to fix a Valgrind test that is already known to break with that restriction. Conill responded with a new version of the original patch; this time it emits a warning before failing an empty-argv call to execve() with EINVAL. Cook acked the patch, saying: "Let's do it and see what breaks"; Biederman concurred: "Especially since you are signing up to help fix the tests".

That is where the discussion stands as of this writing, but it is far from clear that this is how the problem will eventually be addressed. This patch has, crucially, not yet survived its first encounter with Linus Torvalds, who may take a dim view of its potential for regressions. It is an ABI change, after all, and there may well be code out there that responds badly to it, though the fact that BSD systems already prohibit an empty argv will, with luck, have already shaken out most of those. Should unfortunate regressions arise anyway, a different solution will almost certainly need to be found.

Comments (90 posted)

Restartable sequences in glibc

By Jonathan Corbet
January 31, 2022

"Restartable sequences" are small segments of user-space code designed to access per-CPU data structures without the need for heavyweight locking. It is a relatively obscure feature, despite having been supported by the Linux kernel since the 4.18 release. Among other things, there is no support in the GNU C Library (glibc) for this feature. That is about to change with the upcoming glibc 2.35 release, though, so a look at the user-space API for this feature is warranted.

The kernel makes extensive use of per-CPU data structures to avoid locking. This technique works well if the kernel takes care to disable preemption while those data structures are being manipulated; as long as a task running in the kernel has exclusive access to the data, it can safely make changes. It would be nice to be able to use similar techniques in user space, but user-space code lacks the luxury of being able to disable preemption. So a different approach, which relies on detecting rather than preventing preemption, must be used.

A restartable-sequences refresher

That approach is restartable sequences, which were first proposed by Paul Turner in 2015, then later pursued by Mathieu Desnoyers and merged in 2018. Restartable sequences rely on a couple of simple rules for the creation of safe, lock-free critical sections. The first rule is that the critical section cannot make any changes to the protected data structure that are visible to other threads until the final instruction in that section. That last instruction will typically be a pointer assignment making the new state of things visible. The other rule is that the section can be interrupted at any time prior to that last instruction; when that happens, the code must be able recover and restart the operation from the beginning.

Using restartable sequences is a bit tricky because user space must be able to tell the kernel when such a sequence is running. Executing a system call would defeat the purpose of the entire exercise, though; at that point, the thread might as well just grab a lock. So, instead, restartable sequences are managed with a special region of memory shared between user space and the kernel. Specifically, user space sets up a special structure, struct rseq, and informs the kernel of this structure using the rseq() system call. The structure is a bit complex, but at its core is field called rseq_cs, which is a pointer to a structure also called rseq_cs, containing the description of a critical section:

    struct rseq_cs {
	__u32 version;
	__u32 flags;
	__u64 start_ip;
	__u64 post_commit_offset;
	__u64 abort_ip;
    };

To set up a critical section, a user-space thread fills in an rseq_cs structure, setting start_ip to the address of the first instruction in that section. The post_commit_offset is the length of the critical section in bytes; when added to start_ip the result is the first instruction after the end of the section. abort_ip, instead, is the address of the instruction to jump to if the sequence is interrupted (via preemption or CPU migration, for example) before it completes. version should be zero, and the flags field can be used to tweak restart behavior; some information on that can be found in this man page source.

Actually running the critical section is a matter of storing the address of the rseq_cs structure into the rseq structure that was registered with the kernel; this should be done just prior to entry into the section. Whenever the kernel preempts the thread, it will check the instruction pointer to see whether the critical section was executing at the time; if so, when the thread resumes execution, it will jump to the abort_ip address, at which point it should recover and try again.

One potential problem with the restartable-sequences ABI is that any given thread can only register a single rseq structure with the kernel. Even checking a single structure adds a bit of overhead to the hottest parts of the scheduler; checking a list of them would be unacceptable. The restriction makes sense, but it does pose a problem in situations where there might be more than one user of restartable sequences in a thread; some of them might be buried inside libraries and invisible to users of those libraries, perhaps several layers up the call stack. For restartable sequences to be a reliable mechanism, there must be a way to prevent these users from stepping on each other's toes.

The GNU C Library's approach

If glibc is to expose restartable sequences to its users, it must have a plausible answer to the sharing problem. The implementation put together by Florian Weimer takes the approach of putting glibc in the middle for users of this mechanism. Thus, the registration of the rseq structure with the rseq() system call is done by glibc itself during initialization; by the time user code runs, that setup will have already been performed. Should an application want to perform its own registration (and not use the glibc support at all), the glibc.pthread.rseq tunable can be used to disable the automatic registration.

Applications using restartable sequences via glibc should include <sys/rseq.h>. This header defines the rseq and rseq_cs structures and a few important variables, the first of which is __rseq_size. That will be the size of the rseq structure registered by the library, or zero if registration didn't happen for whatever reason (no support in the kernel or disabled, for example).

Finding the rseq structure registered by glibc is not quite as straightforward as one might think. It is stored in the thread control block (TCB) maintained by the library; specifically, it can be found at an offset of __rseq_offset bytes from the thread pointer. Actually getting at the thread pointer is an architecture-specific affair, though; GCC offers __builtin_thread_pointer() for some architectures but not all. As it happens, x86 is one of the exceptions; there the thread pointer is stored in the FS register and applications must fetch it themselves.

The glibc-registered rseq structure is shared by all users within a given thread, but each user should create its own rseq_cs structure describing its critical section. Immediately prior to entering its critical section, a thread should store the address of its rseq_cs structure into the rseq_cs field of the global rseq structure; it should reset that field to NULL on exit. This setup implies that critical sections cannot nest, but these sections are meant to be short and should not be calling into other code anyway, so that will not be a problem.

The code located at abort_ip must begin with the special RSEQ_SIG sentinel, which is defined in an architecture-dependent manner. Note that, if the abort code is invoked, the rseq_cs field will be zeroed by the kernel and must be assigned anew before reentering the critical section.

There is also an __rseq_flags variable containing the flags that were used when registering with the kernel; according to Weimer's documentation patch, that variable is always set to zero for now.

With that structure in place, applications using glibc can now use restartable sequences in a cooperative way. Unfortunately, there aren't really any useful examples of code using this new API to point to; this is all new stuff at this point.

As readers have likely understood by now, actually coding the critical section almost certainly requires resorting to assembly language. This is clearly not a feature that is intended for casual or frequent use, but it can evidently produce significant performance gains in systems with high scalability requirements. Support in the GNU C Library will make restartable sequences a bit more accessible, but it seems destined to remain a niche feature used by few developers.

Comments (20 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Briefs: GPU driver sans hardware; Kasper; Debian resolution process; Nitrux 2.0; LibreOffice 7.3; Quotes; ...
Announcements: Newsletters, conferences, security updates, patches, and more.

Next page: Brief items>>