LWN.net Weekly Edition for December 23, 2021
Welcome to the LWN.net Weekly Edition for December 23, 2021
Welcome to the final LWN.net Weekly Edition for 2021. As is our tradition, we will be taking the final week of the year off to rest and recuperate with family and friends — in a suitably socially distanced manner, of course.
The Weekly Edition will be back on January 6. Meanwhile we have a feature or two saved up for the off week, so wander by if you get a chance.
Best wishes to all LWN readers for a great holiday season and a promising beginning to the new year.
This edition contains the following feature content:
- LWN's 2021 retrospective: in which we look back at our January predictions and laugh.
- A farewell to LWN: a longtime LWN staff member retires.
- Lessons from Log4j: what does the ongoing series of Logj4 vulnerabilities tell us?
- Locked root and rescue mode: what to do with rescue mode is rendered inaccessible by the locked-root policy.
- SA_IMMUTABLE and the hazards of messing with signals: an apparent cleanup patch creates some surprising kernel regressions.
- Content blockers and Chrome's Manifest V3: why authors of content blockers are unhappy about Google's new API and associated policies.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
LWN's 2021 retrospective
It may have seemed questionable at times, but we have indeed survived yet another year — LWN's 22nd year of publication. That can only mean one thing: it is time to take a look back at our ill-advised attempt to make predictions in January and see how it all worked out. Shockingly, some of those predictions were at least partially on the mark. Others were ... not quite so good.
The predictions
The first prediction made in January was that the world would emerge from the depths of the pandemic, and that in-person events would return. Needless to say, things didn't quite work out that way. The pandemic is still very much with us (and possibly about to take another turn for the worse) and, while a few in-person gatherings did take place toward the end of the year, most Linux events are still being held online. The free-software community does still appear to be holding up well, though; it may be true that staying home and interacting with our screens is all we ever wanted to do in the first place.
The prediction that support for CentOS 8 would end was, obviously, obvious; that is still scheduled to happen at the end of this year. Tied to that prediction was a suggestion that, in fact, CentOS 8 Stream might turn out to be good enough for many users, and perhaps even better for some. The lack of "CentOS 8 Stream broke my production system" stories suggests that may have come true, at least to an extent, though it is hard to know for sure.
We also included the prediction that there would be attempts to recreate old-style CentOS 8; given that those attempts were already underway at the time, we cannot claim credit for a lot of foresight. We highlighted Rocky Linux as the highest-profile effort, but lamented its lack of public discussions. Rocky Linux is still out there, and some public mailing lists have been added, but anybody looking for insights in the rocky-devel archives will be disappointed. Meanwhile, AlmaLinux appears to have stolen the spotlight and seems to be doing well, though its communication channels are not particularly friendly to casual browsers. In any case, the prediction that "most or all" of the CentOS 8 recreation efforts were likely to fail does not appear to have been borne out, so far at least.
It is hard to say what became of the prediction that openSUSE would need to better define its relationship with the SUSE mothership; things have been rather silent on that front. The effort to create an independent foundation as the home for openSUSE appears to have stalled, though it would not be surprising to learn that private discussions are ongoing. From what can be seen publicly, nothing has really changed.
Did it become possible to submit kernel patches without touching an email client? In truth, that was possible even before with tools like git send-email. Since then, some further progress has been made; a tool to turn a GitHub pull request into an email series is one relatively prominent example. Coming in the near future will be enhancements to the b4 tool and a new web service that will further remove the necessity for email to submit patches.
So that prediction can be counted as being at least partially successful, but it missed the other half of the equation, which could be expressed as "it will become possible to receive and apply kernel patches without using email". This, of course, refers to the lore+lei work that was recently made public, along with the ongoing work on b4. The kernel community isn't moving away from email anytime soon, but it will become increasingly easy to avoid many of the more annoying aspects of email in kernel development.
Did the commercial side of BPF become more prominent, as predicted? The increasing number of developers working on BPF and the commercial support behind projects like Cilium suggest that the answer is "yes".
The new GNOME 40 interface did make an appearance as predicted, but the expected complaints, for the most part, did not. The changes were not all that disruptive in the end, and it seems certain that most desktop users have, at this point, either made their peace with GNOME or found another solution that is more to their liking.
Included in our predictions was the statement that Python developers would have to think about the future of the language and when it might actually be "done". If long email threads are any evidence, there is clearly some thinking going on, but it still seems to be focused on the shape the new language features should take rather than how many more new features are appropriate at all. At least the prediction that there would be no Python 4 seems to be holding for now.
We predicted that software supply-chain attacks would be a serious threat. That threat is always with us and, on occasion, malicious packages (UAParser, Great Suspender, PHP) were indeed discovered to have been injected into popular repositories. But, as the series of Log4j vulnerabilities makes it clear, our worst enemy in this regard is probably still ourselves. We are quite accomplished at injecting our own vulnerabilities and have no need to outsource that work to outside attackers.
There has been an increase in antitrust enforcement activities, both in the US and Europe, as predicted. As also predicted, things are moving slowly, and there has been little practical effect so far. Also as predicted, the influence of OpenStreetMap continues to grow; the predicted clashes with the hobbyist base appear to have calmed down somewhat instead, though.
What was missed
All told, our 2021 predictions did not go all that badly; arguably that is the result of having not gone too far out on a limb back in January. But there is the associated question of what was missed: what did we fail to predict that we should have perhaps foreseen?
Sometimes the most obvious things can be the hardest to anticipate; consider, for example, the case of minor revision numbers for stable kernel releases. A single byte was set aside for those numbers, since nobody ever thought there would be more than 255 releases of a single stable series. But we live in a different world, with fast-arriving stable updates and kernels that are supported for several years. It should have been possible for us, and for the developers involved, to not only predict that those numbers would overflow, but to make a good guess as to exactly when that would happen. We were all caught by surprise anyway.
The addition of structural pattern matching into Python was a long time in coming with a good chance of happening in 2021. We even mentioned it while talking about Python, but didn't think to predict its acceptance.
It didn't occur to us that Richard Stallman might return to the Free Software Foundation's board of directors, but perhaps it should have. The FSF has always struggled to have an identity separate from its association with Stallman — if, indeed, it wants that at all. So letting him back in may have seemed like the best path forward despite the loud public backlash that resulted.
Perhaps it was not easily predictable that the whole UMN episode, where university researchers intentionally tried to insert buggy patches into the kernel, would happen in 2021. But certainly something like that was going to happen sooner or later; some developers had been warning about the prospect for years. Happily, the kernel's processes worked and, aside from a lot of wasted time, no real harm was done.
Another thing we perhaps should have foreseen but didn't was the intersection of machine-learning technologies and software development. That, too, was only a matter of time; it came to light this year in the form of the GitHub Copilot offering. The controversy over whether Copilot is a violation of free-software licenses has faded over the months, but it seems likely to return as these technologies mature and proliferate.
We have often predicted that the realtime preemption code would be merged into the mainline kernel — and just as often had to make excuses for that prediction when things didn't work out that way. So it is entirely fitting that, in the year when the realtime code really was merged, we didn't even think about predicting it.
Other notes
We lost a number of members of our community this year, including Kent Fredric, Karsten Loesing, Fredrik Lundh, and Jörg Schilling. They will be missed.
This year we produced yet another 50 LWN Weekly Editions; those contained 260 feature articles, of which 227 were written in house. Over 7,200 security alerts went into the weekly editions, along with about 4,300 kernel patches. Our conference coverage reflects the pandemic era, of course, but we still had coverage of eight events in 2021, and assisted with the organization of three of those. It has been yet another busy year — as usual.
At the end of the year we say goodbye to Rebecca Sobol, who is headed off into a well-earned retirement. While she was not present at the very beginning of LWN, she joined us shortly thereafter as the first person who was actually paid to work on LWN, and has been here ever since, even through the times when things looked grim and there was no money for payroll. Over the years she has written articles, attended conferences, herded authors, and interfaced with our group subscription managers; see her goodbye note for details. LWN would not be what it is without her participation, and she, too, will be missed.
There will be some changes at LWN next year as we seek to replace her, but it's not yet clear what form those changes will take; stay tuned. Meanwhile, we wish the best of the year-end holidays for all of our readers; it remains a privilege and an honor to write for, and be supported by, all of you. We will see you all back here next year.
A farewell to LWN
Back at the beginning of 2020, it was predicted that retirements would increase during this decade. In 2021, the prediction was that retirements would increase over the next couple of years. It is happening and LWN is no exception. I am retiring at the end of this year after more than 20 years with LWN.So who am I and how did I get here? To some, I'm a name at the bottom of some LWN page. To a few, I'm the one that reminds them when their LWN group subscription is about to expire. You might have even met me at a conference. Not that I have been to very many. Mostly I tend to be quietly in the background watching the LWN mailbox, looking for brief items and quotes of the week (sorry I haven't found much lately), proofreading articles, managing subscriptions, and more. But I'm older than most of you and this is my last LWN weekly edition. Getting here is a bit of story.
I got my first paying job in 1968 when I was in my late teens. It had nothing to do with computers. It was 10 years later when I decided to study computers and programming. After graduating from high school I had various, low-paying odd jobs, until finally I was ready for more education. I started going to Colorado Mountain College, located near Glenwood Springs, in the mid-1970s. I took a lot of math and physics classes, skied in the winter, and rafted the Colorado River in the summer. Just before I graduated in 1978, I had a class where one assignment was to write a program in BASIC. I forget what kind of computer it was; an early type of PC that belonged to one of the professors. It was my first encounter with programming a computer and I wanted to learn more; something that could lead to a real career.
I decided to take a year off and then go to the University of Colorado (CU) and study computers. If I had any doubts about that decision, they were quenched after spending the winter shoveling snow in the little ski resort town of Snowmass Village. One week the high temperature was -20 F. Another week it snowed so much that all I did was shovel the same staircase over and over and couldn't keep up. Cold and snow was replaced by spring cleanup, when the snow melts away and reveals lots of trash and lost items; the $100 bill was a nice find, but mostly it's picking up trash. Then the boss offered me a job as a manager and in the conversation that followed he told me "no woman in the world is worth $5/hour, ever". While the two summers spent working for the Snowmass Village golf course were more pleasant and I did get a raise to $5/hour before I left, I wanted a better paying office job for the future.
Thus, in fall 1979, I moved to Boulder and started at CU, where I met Jon Corbet in our first computer class. We used punch cards and programmed in Pascal on the school's CDC mainframe. A couple of years later, we met Liz Coolbaugh, LWN's other founder, in an assembly-language programming class, using the mainframe. By this time, Jon had a printer terminal and a modem, so we did our assignments with reams of printer paper. I also worked on a project where I got access to a VAX 11/780 running Berkeley Unix and learned some Fortran.
After graduation, it was the Fortran that got me my first computer job. They had an HP 1000 Minicompter and a lot of old Fortran code. Over the years that followed there were other jobs where I learned some C and worked on Unix systems. I was systems administrator for a variety of computers with different operating systems, but these computers were not networked. As a sysadmin, I applied updates, or did fresh installs of some OS, backups, not much else. As programmer, there was number crunching, of course. I also wrote things like text-based user interfaces which interacted with databases that someone else wrote. Nothing fancy and not any kind of systems programming.
Around 1994, I got a job that came with an office and a Sun Workstation running SunOS with the Motif Window Manager, fully connected to the internet. I thought that was the best computer environment ever. I didn't have root and I didn't really miss it. I liked the Unix environment and the internet access, but one job led to another, and while it still came with a Sun running Solaris, eventually I was maintaining Fortran code. By the late 1990s I was tired of maintaining Fortran code.
In the (northern hemisphere) spring of 1999, Liz offered me a part-time job as her assistant at LWN. Eklektix, LWN's parent company, was running Linux System Administration classes and I attended one of the last of those classes. They were hauling around workstations for the classes and I took one home after the class, installed Red Hat Linux 5, and started working for LWN. Within a year Eklektix was solely focused on LWN and I became a part owner of Eklektix and a full-time employee; just in time for LWN to become a wholly owned subsidiary of Tucows.
The company grew to several employees with Tucows paying salaries. That lasted a little over a year before Tucows decided they didn't want us after all. It was the dot-com boom and bust. It took some time for Eklektix to become a private company once again and we were without funding. We thought that was the end, but within a week donations from our readers poured in. Until the credit card clearing house decided that something fraudulent was going on and put a stop to that. There were several lean months before we began the transition to the subscription model. That worked and LWN is fully funded by subscribers now.
Different people have come and gone from LWN over the years. Liz, Dennis Tenney, Forrest Cook, and others worked for LWN at one time, but moved on some years ago. Jake Edge joined 14+ years ago and now he and Jon will be the only ones left to run LWN. I hear they will be hiring.
I have enjoyed working for LWN, learning about free-software development, without actually developing any software myself. I'm in awe of the kernel community and what it has accomplished. Once it was news if some company used Linux in some way. Now it's commonplace. Linux has taken over the computing world. It's in supercomputers, Android phones, the cloud, embedded devices; it is almost everywhere. Thanks to you subscribers, LWN has been there to cover it. Back in 2002, when subscribing to LWN first became possible, I don't think we imagined that some of the largest tech companies in the world would have LWN group subscriptions.
So what's next? That remains to be seen. Once I thought programming was my career of the future, but that turned out to be LWN instead. Now retirement is unlikely to be quite what I imagine, but I'm looking forward to the change in pace. I'll spend more time outdoors when the weather is nice, choosing when to spend time outside based on the weather forecast rather than LWN's publishing schedule. We'll see about all those "I'll do this when I have more time" projects.
Thanks again to all LWN subscribers who helped make LWN the success that it is and helped fund my retirement. Stay safe, have a good holiday season, and a prosperous New Year.
Lessons from Log4j
By now, most readers will likely have seen something about the Log4j vulnerability that has been making life miserable for system administrators since its disclosure on December 9. This bug is relatively easy to exploit, results in remote code execution, and lurks on servers all across the net; it is not hyperbolic to call it one of the worst vulnerabilities that has been disclosed in some years. In a sense, the lessons from Log4j have little new to teach us, but this bug does highlight some problems in the free-software ecosystem in an unambiguous way.
What went wrong
There are a lot of articles describing the mechanics of this bug and how to exploit it in great detail; see this page for an extensive collection. In short: Log4j™ is a Java logging package trademarked and distributed by the Apache Software Foundation. It has found its way into many other projects and can be found all over the Internet. Indeed, according to this article, Log4j has been downloaded over 28 million times — in the last four months — and is a dependency for nearly 7,000 other projects. So a vulnerability in Log4j is likely to become a vulnerability in many other systems that happen to use it.
As the Apache Software Foundation proudly tweeted in June, it's even on the Ingenuity helicopter on Mars.
Normally, one thinks that a logging utility should accept data of interest and reliably log it. Log4j seemingly does this, but it also does something that, arguably, no logging system should do: it actively interprets the data to be logged and acts upon it. One thing it can do is query remote servers for data and include that in the log message. For example, it can obtain and incorporate data from an LDAP server, a feature that might be useful when one wishes to add data to the log that includes information about a user's account.
It turns out, though, that the remote directory server can supply other things, including serialized Java objects that will be reconstituted and executed. That turns this feature into a way to inject code into the running application that, presumably, only wanted to log some data. To exploit this opening, an attacker needs to do two things:
- Put up a server running a suitable protocol in a place where the target system can reach it. LDAP seems to be the protocol of choice at the moment, but others are possible; a grep of LWN's logs shows attempts to use DNS as well.
- Convince the target system to log an attacker-supplied string containing the incantation that will load and execute the object from the malicious server.
The second step above is often easier than it might seem; many systems will happily log user-supplied data. The hostile string may take the form of a user name that ends up in the log; the browser's user-agent string also seems to be a popular choice. Once the target takes the bait and logs the malicious string, the game is over.
This is, in other words, a case of interpreting unsanitized data supplied by the Internet, with predictable consequences; it is a failure that should have been caught in any reasonable review process. Note that the malicious strings can also be passed by front-end software to internal systems, which might then decide to log it. In other words, not being directly exposed to the Internet is not necessarily a sufficient defense for vulnerable systems. Every system using Log4j needs to be fixed, either by upgrading or by applying one of the other mitigations found in the above-linked article. Note that the initial fixes have proved to be insufficient to address all of the problems in Log4j; users will need to stay on top of the ongoing stream of updates.
The reaction to this vulnerability has been swift and strong. Some commenters are asserting that "open source is broken". Anybody who hadn't seen xkcd #2347 before has probably encountered it by now. Has our community failed as badly as some would have it? In short, there would appear to be two broad shortcomings highlighted by this episode, relating to dependencies and maintainers.
Dependencies galore
In the early days of free software, there simply was not much free code out there, so almost everything had to be written from scratch. At that time, thus, there were few vulnerable packages available for free download and use, so every project had to code up its own security bugs. The community rose to the challenge and, even in those more innocent days, security problems were in anything but short supply.
For as long as your editor has been in this field — rather longer than he cares to admit — developers and academics both have talked about the benefits of reusable software. Over the years, that dream has certainly been accomplished. Many language communities have accumulated a massive collection of modules for many common (and uncommon) tasks; writing a program often just becomes an exercise in finding the right modules and gluing them together correctly. Interfaces to repositories automate the process of fetching the necessary modules (and the modules they depend on). For those of us who, long ago, became used to the seemingly infinite loop of running configure then tracking down the next missing dependency, modern environments seem almost unfair. The challenge is gone.
This is a huge success for our community; we have created development environments that can be a joy to work within, and which allow us to work at a level of productivity that couldn't really be imagined some decades ago. There is a problem lurking here, though: this structure makes it easy for a project to accumulate dependencies on outside modules, each of which may bring some risks of its own. When you are, essentially, importing random code off the Internet into your own program, any of a number of things can happen. One of those modules could be overtly hostile (as happened with event-stream), it could simply vanish (left-pad), or it could just suffer from neglect, as appears to have happened with Log4j.
When the quality of the things one consumes is of concern, one tends to fall back to known brands. Log4j is developed under the Apache Software Foundation brand which, one might hope, would be an indicator of quality and active maintenance. Appearances can be deceiving, though; one need not look further than Apache OpenOffice, which continues to be downloaded and used despite having been almost entirely starved of development effort for years, for an example. OpenOffice users will be relieved to know, though, that (according to the project's October 2021 report) OpenOffice has finally managed to put together a new draft mission statement. Log4j is a bit more active than that, but it still depends on the free-time effort of unpaid maintainers. Apache brand or not, this project, which is widely depended on, has nobody paid to maintain it.
But, even if the brand signals were more reliable, the problem remains that it is hard to stay on top of hundreds of dependencies. A library that appeared solid and well maintained when it was adopted can look rather less appealing a year or two later, but projects lacking good maintenance often tend not to attract attention until something goes badly wrong. Users of such a project may not understand the increasing level of risk until it is too late. Our tooling makes adding dependencies easy (to the point that we may not even be aware of them); it is less helpful when it comes to assessing the maintenance state of our existing dependencies.
Maintainers
A related problem is lack of development and maintenance support for projects that are heavily depended on. The old comparison between free software and a free puppy remains on point; puppies are wonderful, but if somebody isn't paying attention they will surely pee on the carpet and chew up your shoes. It is easy to take advantage of the free-of-charge nature of free software to incorporate a wealth of capable code, but every one of those dependencies is a puppy that needs to be under somebody's watchful eye.
As a community, we are far better at acquiring puppies than we are at training them. Companies will happily take the software that is offered, without feeling the need to contribute back even to the most crucial components in their system. Actually, we all do that; there is no way for anybody to support every project that they depend on. We all get far more out of free software than we can possibly put back into it, and that is, of course, a good thing.
That said, there is also a case to be made that the corporate side of our ecosystem is too quick to take the bounty of free software for granted. If a company is building an important product or service on a piece of free software, it behooves that company to ensure that said software is well supported and, if need be, step up to make that happen. It is the right thing to do in general, but it is far from an altruistic act; the alternative is a continual stream of Log4j-like crises. Those, as many companies are currently discovering, are expensive.
"Stepping up" means supporting maintainers as well as developers; it is with maintainers that the problem is often most acute. Even a project like the Linux kernel, which has thousands of developers who are paid for their work, struggles to find support for maintainers. Companies, it seems, see maintainership work as overhead at best, helping competitors at worst, and somebody else's problem in any case. Few companies reward their employees for acting as maintainers, so many of them end up doing that work on their own time. The result is projects with millions of downloads whose maintenance is done in somebody's free time — if it is done at all.
These problems are not specific to free software; discovering that a piece of proprietary software is not as well supported as was claimed is far from unheard of. Free software, at least, can be fixed even in the absence of cooperation from its creators. But the sheer wealth of software created by our community makes some of these problems worse; there is a massive amount of code to maintain, and little incentive for many of its users to help make that happen. We will presumably get a handle on these issues at some point, but it's not entirely clear how; until that happens, we'll continue deploying minimally supported software to Mars (and beyond).
Locked root and rescue mode
Fedora is among the group of Linux distributions that, by default, lock out the root account such that it does not have a password and cannot be logged into. But, traditionally, "rescue mode" boots the system into single-user mode, which requires a root password—difficult to provide if it does not exist. A Fedora proposal to remove the need for the password in that case, and just drop into a root shell, does not seem likely to go far in that form, but it would seem to have pointed toward some better solutions for the underlying problem.
The proposal
for Fedora 36, "Make Rescue Mode Work With Locked Root
",
was posted on December 6 by Fedora program manager Ben Cotton on
behalf of the feature owners: Michel Alexandre Salim, Neal Gompa, and David
Duncan.
The problem is that the "out-of-the-box user
experience
" is poor for systems with a locked root if users have a
need to fix their systems via
single-user mode; they will be prompted for a password that they cannot
provide and have to resort to other means of booting their ailing system
(e.g. rescue boot media). Another option is to boot with a kernel
command-line option such as "init=/sysroot/bin/bash", but that is
not particularly user-friendly either.
The guts of the change would use the --force option to sulogin
to skip the password requirement when entering single-user mode if the root
password is not accessible or the root login is disabled. But, as that man
page warns, the option should only be used "if you are sure the console is
physically protected against unauthorized access
". The proposal
says that the change "does not pose an increased security
risk
", because attackers already have other means of bypassing the
password (e.g. init=) or compromising the system if they have
physical access. Those who want to enforce a password for single-user mode
can simply set the root password.
The implementation would be based on a similar feature (patch) in Fedora CoreOS. By default, Fedora 36 would install an RPM that changes the systemd configuration to bypass the password. Users or Fedora variants that do not want that behavior can remove or not install that RPM.
As might be guessed, the claim that there is no increased risk drew comments. Zbigniew Jędrzejewski-Szmek said that there are at least two cases where having physical access does not necessarily lead to the ability to compromise the system:
If the data is encrypted, then being able to override the init doesn't achieve anything, until the decryption has been performed. The second case is when the admin has actually locked down the kernel command line and relies on the normal authentication mechanisms to protect the system. In both cases your proposal creates an additional method of attack that activates at a later point where the system is already running and the integrity of the system must be maintained to protect unencrypted data. With the proposal, any mechanism which leads to the system entering emergency mode results in a compromise.
The problem being addressed does exist, he said, but the solution is not a good one:
Essentially, you are proposing a behaviour of "something is wrong, let's make everything open without authentication", which is good for debugging and development, but not acceptable for a real system.The correct solution is to enhance login mechanisms so that it is possible to authenticate using existing credentials also in the rescue mode. The fact that this is not possible right now is a bug that needs to be fixed.
He had some suggestions for ways forward, including potentially adding a password hash (using the stronger yescrypt hashing algorithm) to the initial ramdisk (initrd) or elsewhere that is accessible at boot time (e.g. EFI variables). In addition, the Trusted Platform Module (TPM) could be used to encrypt the password in a system-specific fashion.
Richard W.M. Jones wondered
about the threat model and how skipping the password would actually cause
problems; "On the flip side I have hit the problem described and it's incredibly
annoying - it makes rescue mode useless in the default case.
"
Jędrzejewski-Szmek acknowledged
that the problem is "annoying and real
", but noted that the
proposal violates some fundamental properties that govern system access:
There are many many different ways in which systems are installed, but the general principle is that [once] the system is up, you need valid credentials to log in. So protecting the system before it's running, i.e. protecting the data at rest, can be done in many different ways and is your responsibility, after it is up, you know that the normal system mechanisms apply. With the proposal this promise is broken.
One example he gave was for a kiosk-installed system, where users have
access to the keyboard; "if you can affect the system so that
it does not boot properly, even by causing a sufficient delay, [it] is
enough to get unrestricted access.
"
Vit Ondruch asked about any existing work toward the solution proposed. Jędrzejewski-Szmek said that recent versions of systemd have support for the low-level plumbing (i.e. encrypted-secret storing), but the higher-level pieces are still missing:
But we're missing the upper parts, i.e. how to actually use and update the passwords. I didn't even mention this, because we don't have a comprehensive story yet. I think it'd be necessary to write some pam module and/or authentication helper from scratch.
The "wheel" group on Fedora (and other Unix systems) is meant to contain administrative users, so it would make sense to accept any password for users in that group as equivalent to that of root, Lennart Poettering said. He agreed that there is still work to do to collect up those passwords, encrypt them with the TPM, and store them where they are accessible at boot time, but the net effect would be useful well beyond just booting into single-user mode:
With such a mechanism we would have quite nice semantics: if a user is designated to have admin privs, then that's sufficient to be able to log into the root account, no further manual work necessary, and it applies to the whole runtime of the OS: from initrd to regular system, to sudo.
Chris Murphy was concerned
about tying the solution to the existence of a TPM, since there are systems
that do not have the device or it is not supported by the kernel.
Poettering said that using
the TPM by default makes sense even if there is a need to "find
graceful fallbacks for environments that are more limited
".
But Chris Adams was in favor of
the original proposal;
administrators who want to further lock down their systems can simply do so
by removing the RPM or setting a root password. "Locking down a system beyond the
default requires changing a bunch of things, so I do not see adding this
to that list to be a problem.
" Jędrzejewski-Szmek said that was
a step backward:
I also don't think we should assume that the admin will do a series of "hardening steps". This is what we had in the 90's: you'd install a stock distro and then go over a checklist of basic steps to make things secure. Let's not go back to that.
Murphy also wondered
about the Fedora CoreOS change. If the idea of skipping password checking
is terrible: "Is it terrible enough that CoreOS should
revert?
" Jędrzejewski-Szmek said that it
makes the CoreOS image unsuitable for general-purpose use; "Maybe
it's OK in limited circumstances where 'physical' access is
only possible if you're the administrator on the host.
"
While the feature owners did not participate much in the discussion, Salim was obviously keeping track of the feedback. He said that he was inclined to not rush a full solution for Fedora 36 and to push that back until Fedora 37. He noted that the question of what CoreOS should do is a good one. Beyond that, the questions of using the wheel group to delimit which passwords are valid and whether or not to add TPM dependencies for the feature need to be resolved.
He also wondered about some kind of recovery password, which had been raised in the thread as means to work with disks that need to be installed in another system (with a different TPM). That problem was seen to be a separate issue. In addition, Björn Persson thought that a recovery password was more appropriate for scenarios where a different authentication mechanism (e.g. hardware token) is used—and could get lost or broken:
As long as users normally log in with a passphrase, I see no need to have a separate passphrase for rescue mode. Root's or a wheel user's usual passphrase should be fine.
To address the immediate needs for Fedora 36, Salim suggested just
removing rescue mode as a boot option if no root password is set. In
addition, if someone tries to invoked rescue mode from the command line in that scenario,
"it should display an error rather
than prompting for a non-existent password
". Persson thought that
all sounded reasonable. The Fedora Engineering Steering Committee was slated to discuss the
proposal at its December 20 meeting,
but that was canceled due to a lack of quorum; it will presumably be
rescheduled for the next meeting on January 3.
Overall, the idea of using wheel group passwords seemed to gain a fair amount of traction, as did not opening up new ways to get root without any authentication. Our systems are complex enough, and installed in so many different ways and environments, that doing so was always going to raise some eyebrows. On the other hand, though, prompting for a password that cannot be provided is a sure path to user frustration—at a time when said user is probably already worried about the functioning of their system and does not need further headaches. The path to a better solution for that problem seems fairly clear; with luck we will see it in Fedora 37, coming in late 2022.
SA_IMMUTABLE and the hazards of messing with signals
There are some parts of the kernel where even the most experienced and capable developers fear to tread; one of those is surely the code that implements signals. The nature of the signal API almost guarantees that any implementation will be full of subtle interactions and complexities, and the version in Linux doesn't disappoint. So the inclusion of a signal-handling change late in the 5.16 merge window might have been expected to have the potential for difficulties; it didn't disappoint either.
Forced signals
The signal API normally allows any process to control what happens on receipt of a signal; that can include catching the signal, masking it temporarily, or blocking it outright. There are a few exceptions, of course, including for SIGKILL, which cannot be blocked or caught. Within the kernel, there is a more subtle exception wherein a process can be forced to take a signal and exit regardless of any other arrangements that process may have made. Most of these situations come about in response to hardware problems that cannot be recovered from, but the signals sent by the seccomp() mechanism are also forced in this way. At times, a signal is so important that it simply cannot be ignored.
The kernel has a function to force a signal in this way called force_sig_info_to_task(). It will unblock the intended signal if need be and deliver the signal to the target process; it can also remove the process's handler for this signal, resetting it to the default action (which is normally to kill the process for the signals in question). Interestingly, though, this function wasn't always being used in forced-signal situations; instead, the kernel would just kill the target process and set its exit status to look like the signal had done it. In October, Eric Biederman sent out a patch set causing the kernel to do what it was pretending to do and actually deliver the signal rather than fake it.
This change works as intended — except for a small problem that was pointed out by Andy Lutomirski. It seems that, in current kernels, a call to sigaction() in the target process can re-block a signal in the window between when force_sig_info_to_task() unblocks it and the actual delivery of that signal. A call to ptrace() can also race with forced signals in this way. This race can cause misbehavior at best; in some cases (such as the delivery of a signal from seccomp()) it may be a security problem.
Not wanting to introduce potential vulnerabilities, Biederman set out to close this race; the solution took the form of a new flag (SA_IMMUTABLE) that can be applied to a process's internal signal-handling information. This flag is normally not set; if that changes, then any subsequent attempts to change the handling of the signal in question will fail with an EINVAL error. The flag is hidden from user space and can only be set by the kernel, and that only happens in force_sig_info_to_task(). There is no way to clear SA_IMMUTABLE once it has been set; the assumption is that the process is on its way out anyway. This change fixes the race in question and, since the flag is invisible to user space, there are no user-space ABI concerns. Or so it was thought.
This patch was posted on October 29, and found its way into the mainline on November 10 — near the end of the 5.16 merge window — as part of a set of "exit cleanups". Once the 5.16-rc1 kernel was released, the patch was picked up by the stable series and appeared in 5.15.3 on November 18.
Debugger bugs
Unfortunately, the day before the 5.15.3 release, Kyle Huey reported that the SA_IMMUTABLE change breaks debuggers. It is common for interactive debuggers to catch signals on behalf of the debugged process, including some of the signals that are forced by the kernel (not all of which are fatal). With this change, ptrace() is no longer able to change the handling of SA_IMMUTABLE signals and, in fact, is no longer even notified of them. The patch, Huey said, should be reverted. It was nonetheless shipped in 5.15.3 the next day.
After some discussion, it was concluded that the SA_IMMUTABLE change was indeed overly broad; it blocked some legitimate signal-related operations that were possible before. Biederman posted a pair of patches on the 18th to address the problems. The first of those reflects the conclusion that not all signals forced by the kernel should have the SA_IMMUTABLE flag set on them; instead, that should be restricted to situations where the kernel intends for the target process to exit. That intent is communicated via a new parameter to force_sig_info_to_task(); the call from within the seccomp() subsystem is changed to indicate that intent. The second patch adds a new function (force_exit_sig()) to be used in other places where a forced exit is intended, and adds a number of callers.
It's worth noting that, in the forced-exit case, ptrace() is still unable to trap the signal after these changes. But that is no different from the way things worked before all of these patches went in. The previous implementation, remember, bypassed the signal-delivery mechanism entirely, so there was never any opportunity for a debugger to influence things. The kernel, as seen from user space, now behaves as it did before.
These patches appear to have fixed the problem; they were merged for 5.16-rc2, then found their way into 5.15.5 on November 25. The regression caused by the original patch, in other words, existed for a full week in the 5.15-stable kernel. The rules state that no patch should go into a stable kernel until after it has appeared in a mainline -rc release. That rule was followed in this case; 5.16-rc1 came out between the patch landing in the mainline and it showing up in a stable update. That same rule may have delayed the inclusion of the fixes into the stable 5.15 releases; it could not be considered until after 5.16-rc2 came out on November 21.
The relevant question may well be: is the one-release rule enough to prevent this kind of regression in the stable kernels? That rule was added in response to previous problems, and may well have prevented some bugs from being backported, but some still clearly get through. There is an argument to be made that, for patches that reach into tricky subsystems like signals, a more cautious approach to backporting would make sense. In the absence of the developer resources to make those decisions, though, the current policy is unlikely to change and the occasional regression will make a (hopefully brief) appearance in stable kernels.
Content blockers and Chrome's Manifest V3
A clarion call from the Electronic Frontier Foundation (EFF) warning about upcoming changes to the Chrome browser's extension API was not the first such—from the EFF or from others. The time of the switch to Manifest V3, as the new API is known, is growing closer; privacy advocates are concerned that it will preclude a number of techniques that browser extensions use for features like ad and tracker blocking. Part of the concern stems from the fact that Google is both the developer of a popular web browser and the operator of an enormous advertising network so its incentives seem, at least, plausibly misaligned.
Manifest V3 was first proposed in late 2018 as an eventual replacement for Manifest V2, which is the current extension API that is supported by both Chrome and Firefox. These APIs provide the tools that extensions use to manipulate the browser state to customize the web-browsing experience in some fashion. Extensions can change the user interface in various ways, observe and modify the browser behavior for things like bookmarks and tabs, manipulate the requests (and their content) that the browser makes, and more.
There are two main changes that Manifest V3 makes which are problematic for various types of content blockers. The first is the removal of the ability for extensions to block requests that the browser makes using the webRequest API. Instead, extension developers would need to use the much more restrictive declarativeNetRequest API, which has limits on the number of rules that can be used for blocking sites. The EFF described the impact of that change in a mid-2019 post highlighting problems with Manifest V3:
Currently, an extension with the right permissions can review each request before it goes out, examine and modify the request however it wants, and then decide to complete the request or block it altogether. This enables a whole range of creative, innovative, and highly customizable extensions that give users nearly complete control over the requests that their browser makes.Manifest V3 replaces these capabilities with a narrowly-defined API (declarativeNetRequest) that will limit developers to a preset number of ways of modifying web requests. Extensions won't be able to modify most headers or make decisions about whether to block or redirect based on contextual data. This new API appears to be based on a simplified version of Adblock Plus. If your extension doesn't work just like Adblock Plus, you will find yourself trying to fit a square peg into a round hole.
[...] For developers of ad- and tracker-blocking extensions, flexible APIs aren't just nice to have, they are a requirement. When particular privacy protections gain popularity, ads and trackers evolve to evade them. As a result, the blocking extensions need to evolve too, or risk becoming irrelevant.
The second change is a switch away from background pages, which are currently used by extensions to perform tasks behind the scenes, to service workers, which are meant for a different use case than extensions. Content blockers currently use background pages to continue to monitor requests being made from pages for as long as the user still has the page open in a tab. In that way, ads and other undesirable content that gets requested well after the main page gets loaded can be handled based on the extension's requirements. But service workers are shut down after five minutes, which means that any initialization done for the extension is lost; meanwhile, any background processing stops too. The shutdown is meant to reclaim the memory being occupied by those worker processes, but extension developers sometimes need to keep it around. The EFF described some of the problems with all of that in a November blog post:
In short, Service Workers were meant for a sleep/wake cycle of web asset-to-user delivery—for example, caching consistent images and information so the user won't need to use a lot of resources when reconnecting to that website again with a limited connection. Web extensions need persistent communication between the extension and the browser, often based on user interaction, like being able to detect and block ad trackers as they load onto the web page in real time. This has resulted in a significant list of issues that will have to be addressed to cover many valid use cases.
The timeline for Manifest V2 support in Chrome, which was published in September, shows a limited runway for extension authors before they need to "upgrade". In January 2022, the Chrome Web Store will stop accepting new Manifest V2 extensions, except for those marked "private", which means they are only available to users in a specific organization. In June 2022, new private V2 extensions will be banned. Any existing V2 extensions can be updated until January 2023, but after that, Chrome is effectively V3-only.
Some of the changes in Manifest V3 seem unambiguously positive, however, including the removal of remotely hosted code for extensions. It is, of course, impossible to ensure that an extension does what it is supposed to, and does not, for example, send tracking information off to some dodgy site, if the code can be changed from afar.
The reasons behind Manifest V3 seem reasonable as well. In the announcement of its availability in Chrome beta versions, almost exactly a year ago, those reasons come down to security, performance, and privacy, though much of that is overblown or inaccurate according to the EFF. There seems to be a clear disconnect between how existing content-blocker extensions work—how their users want them to work—and how Google views the privacy threats inherent in the existing extension API:
For extensions that currently require passive access to web activity, we're introducing and continuing to iterate on new functionality that allows developers to deliver these use cases while preserving user privacy. For example, our new declarativeNetRequest API is designed to be a privacy-preserving method for extensions to block network requests without needing access to sensitive data.The declarativeNetRequest API is an example of how Chrome is working to enable extensions, including ad blockers, to continue delivering their core functionality without requiring the extension to have access to potentially sensitive user data. This will allow many of the powerful extensions in our ecosystem to continue to provide a seamless user experience while still respecting user privacy.
Of course, content blockers are perfectly positioned to have a lot of detailed information about what their users are doing on the web. That is, obviously, exactly the point. As long as those extensions only use that information locally, and for the benefit of their users—not the developers or corporate owners—all should be well. That's not to say we have not seen malware in web extensions, including ad blockers, but those are the kinds of problems best addressed through extension vetting, both by browser-extension stores and the user community. Restricting content blockers to the actions Google (or anyone) thinks they should be able to do might seem safer, perhaps it even is, but it seems to mean that the kinds of content blocking available to users is being drastically curtailed in Manifest V3.
The EFF had some suggestions for better ways to address the problems in its
November post. Not requiring a switch to declarativeNetRequest was first
on the list. Extensions that only need the functionality provided in that
API could make the switch, while: "Extensions that use the blocking
webRequest API, with its added power can be given extra scrutiny upon
submission review.
"
For its part, Mozilla has decided to take something of a middle course. It wants to support cross-browser extension development, so it will implement Manifest V3, but will not force the use of declarativeNetRequest (which Mozilla abbreviates as "DNR"), at least for now:
After discussing this with several content blocking extension developers, we have decided to implement DNR and continue maintaining support for blocking webRequest. Our initial goal for implementing DNR is to provide compatibility with Chrome so developers do not have to support multiple code bases if they do not want to. With both APIs supported in Firefox, developers can choose the approach that works best for them and their users.We will support blocking webRequest until there's a better solution which covers all use cases we consider important, since DNR as currently implemented by Chrome does not yet meet the needs of extension developers.
That Mozilla blog post is from May, but it is in keeping with Mozilla's position as described in its Manifest V3 FAQ from 2019. While the company wants to be as compatible with Chrome as it can be, Mozilla says that it will not blindly follow Google's lead if those changes negatively impact its users and developers. The question of background pages versus service workers is less clear, however, though Mozilla was still working on the feature as of May. It is being tracked in a Bugzilla bug, which mentions several use cases where service workers will not provide an adequate replacement.
In general, this seems like a fairly heavy-handed approach from Google. It has a dominant position in the browser market and plausible reasons to want to restrict ad blocking, so the claims that Chrome is simply making ad blockers safer is not completely believable, at least for some. In addition, extension authors have said that it will be difficult or impossible to accomplish their tasks using Manifest V3. For example, the EFF's Privacy Badger extension for blocking invisible trackers will be hampered:
It appears that Privacy Badger will no longer be able to dynamically learn to block trackers, report what it blocked on a page, block cookies from being set or sent, strip referrer headers, nor properly support EFF's Do Not Track policy.If you remove what makes Privacy Badger unique, replacing it with basic list-based blocking, what are you left with?
Similarly, the uBlock Origin ad-blocking extension will be unable to continue doing its job, as described in a GitHub issue. UBlock Origin developer Raymond Hill ("gorhill") has commented multiple times in the bug about problems stemming from the switch to Manifest V3. Most recently, he said:
Given how the deprecation of a blocking webRequest API put a lid on innovations (and regressions in capabilities in the case of uBO [uBlock Origin]) regarding content blocking, it does seem the move could be the "Not-Owned-But-Operated" strategy applied to content blocking -- the declarativeNetRequest API means the capabilities of (not-owned) content blockers are ultimately operated by Google through the limitations of the API the content blockers must use.
In a lengthy Reddit thread about the EFF's most recent Manifest V3 post (naturally there is another on Hacker News), "Brawl345" wrote that Manifest V3 made a lot of things easier for them, but did recognize that there are a number of shortcomings, including the ability to do dynamic filtering.
In general, Manifest v3 is a good idea but executed in the wrong way. I'm actually one of the few people in this thread (I bet) that actually ported some of my extensions to Manifest v3 and there are many good parts and some bad that make the whole experience kinda bad.
What seems clear, though, is that Chrome users will have less-capable content-blocking extensions available starting next year. That will, perhaps, decrease the capabilities of malicious extensions, thus increase the security of Chrome users from those kinds of threats. But privacy-conscious users, who seemingly only make up a tiny fraction of browser users anyway, will want to make a switch to Firefox—if they haven't already.
Hopefully, by 2023 (or sooner), Google will relent on the most onerous restrictions, but that hope may well be forlorn. On the other hand, there is good reason to believe that Mozilla will keep providing features that are useful for content blocking even if it means extension developers have to maintain two versions. And in the meantime, maybe "everyone" can come up with a standard for browser extensions that serves the needs of all web users, regardless of their interest in being the product. Hope springs eternal.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Briefs: Log4j impacts; GCompris 2.0; Copyleft trolls; LF diversity report; Quote; ...
- Announcements: Newsletters; conferences; security updates; kernel patches; ...