LWN.net Weekly Edition for March 10, 2022
Welcome to the LWN.net Weekly Edition for March 10, 2022
This edition contains the following feature content:
- Fedora considers curl-minimal: restricting the protocols supported by curl by default may increase security, but the cost may be too high.
- Belenios: a system for secret voting: a system that tries to achieve both secrecy and verifiability in elections.
- Generalized address-space isolation: attackers cannot exfiltrate data that is not accessible, but adding address-space isolation to the kernel is not easy.
- When and why to deprecate filesystems: when is filesystem deprecation appropriate?
- Fedora's missing Chromium updates: a free version of the Chrome browser is nice for a distribution to have — if it can meet the challenges of packaging Chromium.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Fedora considers curl-minimal
The curl utility is a command-line program (and associated library) for interacting with various network protocols; it is commonly used to do things like transferring data from a remote server over HTTP or HTTPS using a URL. But curl also supports a lot more protocols, some of which are probably rarely used, obsolete, deprecated, or all three. As a recent discussion on the Fedora devel mailing list shows, though, it is hard to find agreement that support for only some of those protocols should be installed by default, while others might be left in an optional package for those who need them.
A proposal
to install a minimal version of curl by default starting with
Fedora 37 was posted
to the list
on February 22. As is usual for feature proposals, it was posted on
behalf of the feature owners, Zbigniew Jędrzejewski-Szmek and Kamil Dudka,
by Fedora program manager Ben Cotton. The idea is to make the curl-minimal
package (and it companion libcurl-minimal) the default for installation on
Fedora systems, while allowing users to switch to the full curl package
(and libcurl) if they need it. The minimal variants "are compiled with various
semi-obsolete protocols and infrequently-used features disabled:
DICT, GOPHER, IMAP, LDAP, LDAPS, MQTT, NTLM, POP3, RTSP, SMB, SMTP,
SFTP, SCP, TELNET, TFTP, brotli compression, IDN2 names
", while both
packages support HTTP, HTTPS, and FTP.
There are two benefits for Fedora described in the proposal. The
infrequently used protocols are not as well tested as the others and
"are a source of security bugs
". Most people are not using
them anyway, so removing them reduces the attack surface for the default
installation. In addition, the minimal packages are smaller, saving 8MB,
which is a reduction of 12%.
The problem with having "extra" protocols available for curl is that they might be invoked unexpectedly. Even if a program is using a URL with an http scheme (i.e. protocol), the (possibly malicious) server could redirect to a different URL with a different protocol entirely, which would then invoke that code in curl if it is present. In addition, if user input is used for the URL, it could refer to an unexpected protocol, which curl will happily try to satisfy. Thus protocols that are installed, but not actually needed, increase the potential attack surface of the distribution.
IDN
Removing support for internationalized
domain name (IDN) handling could be a problem even for those who only need
the three protocols available in curl-minimal, Björn Persson said.
IDN domain names are internationalized names in Unicode that have been encoded
into ASCII using Punycode. That is so that they can
be used in the Domain Name System (DNS), which in practice can only handle
a subset of ASCII in domain names. The lack of support for IDN
"makes libcurl-minimal suited only for programs that only
communicate with a predefined set of servers in ASCII-only
domains
".
Jędrzejewski-Szmek wondered how
many domains actually use IDN, noting that he had added support for it to
systemd, but "realized
that I have _never_ once used an idn domain outside of testing
". He
also pointed out that this change would not affect other programs, like web
browsers, where IDN use might be more prevalent. Others in the thread
thought that IDN support was a must, at least for certain regions of the
world. Dudka said he was not
necessarily opposed to adding IDN support to curl-minimal, but that it had been
removed
from the universal
base image (UBI) for the ubi9 container images "and
nobody has complained about it so far
". It is not clear how much
exposure ubi9 images have actually gotten at this point, however.
The question of the prevalence of IDN domains was at least partly
answered by Jędrzejewski-Szmek himself; there are more in use than he expected. He
would be in favor of adding IDN support to curl-minimal "_if_ there
are people who'd actually use this for real
". Persson noted
that there are multiple programs that have incomplete support for IDN, including
OpenSSH, Nmap, and the BIND utilities, but that is a "problem that
you're about to make worse
". Having a default curl without such
support will "end up hampering the adoption of international domains even
more
". Dudka said that since
there is a demand for it, he had created a pull
request to add IDN in libcurl-minimal.
Security benefit
Chris Adams thought
that the security benefit of not shipping the other protocols was a
"poor argument
". If those protocols are still going to be
available, "they need to be maintained to the same level
".
Beyond that, most of the security problems reported for curl seem to be in
the protocols that would be retained:
Looking at the curl RPM changelog on F35, most CVE entries seem to be TLS and/or HTTP(S) related, with a couple of TELNET and one MQTT. Looking back to 2020, there were more TLS and a couple of FTP (which is staying in the minimal build).If TELNET/etc. is a problem and not being maintained upstream, then just drop TELNET. Don't shuffle it off to the side and ignore security issues in a package still in the repos.
But Demi Marie Obenour disagreed;
the purpose of the change is not to reduce the maintenance burden, but to
reduce the impact of vulnerabilities in the less-used parts of
curl. "Right now, a vulnerability in an obscure protocol impacts
most users. With this change, it will only impact users that have
installed the full version of curl.
" But Adams said that
using security concerns as a justification for the change was not
reasonable. If the code is prone to security problems, Fedora should
not be shipping it at all.
In a followup
message,
Obenour said that the change is about the attack surface: "Secure
enough to ship ≠ secure enough to enable by default.
"
Peter Robinson asked
about removing FTP as well, "with most browsers obsoleting the
protocol due to lack of security
". Dudka thought that was
premature, but suggested that the
day was coming: "it may
happen that FTP will be unavailable by default in a year or two
".
Richard W.M. Jones thought a
different approach was in order. While the minimal variants are smaller,
that is "a non-goal for almost everyone
". Beyond that, the
security benefit "will be immediately negated once everyone unbreaks
their Fedora by installing curl-full
". He suggested that
libcurl-using Fedora
packages should use the CURLOPT_PROTOCOLS
option to only allow the protocols they expect. In a message back
in October, he said that instead of creating a single minimal version of
curl, a more fine-grained approach could be pursued:
[...] my impression is that at a code level they [the protocols] are quite modular, so maybe upstream would be interested in turning them into real loadable modules. Then we could package each protocol ("curl-http.so") as a separate RPM which is really best of all worlds.
Dudka was in
favor of using CURLOPT_PROTOCOLS for Fedora packages that use
curl, "but it cannot be a replacement for libcurl-minimal because there is no
algorithmic way to decide whether all users of libcurl disable a problematic
protocol on all reachable code paths
". The switch to minimal is not
just for container images, where the installation footprint needs to be as
small as possible, as Jones had suggested, because there are other (unspecified) Fedora
installations where the size is also important. Dudka also reiterated the
attack-surface reduction as a benefit for those who do not need any of the extras.
Jones mentioned
the modularization of curl again, but said "I think this whole
business of minimizing Fedora is getting way
out of hand
". Dudka said that it had been
added as a wishlist item for curl, "but I do not remember anybody
working on it
". Neal Gompa agreed
with Jones that defaulting to a curl-minimal, instead of taking a modular
approach, would
cause more problems than it would solve, at least for many Fedora users:
This is a very big hammer that basically tells people that we're crippling curl by default for users and it has very large network effects across the entire distribution. It's quite one thing to use curl-minimal for containers where people expect tools to be broken in the endless pursuit of smaller base images, but when real people need to use real systems in complex configurations, having a reduced functionality curl by default is just going to lead to support nightmares and complaints about random breakages in applications on Fedora.
FESCo
After the mailing list discussion died down, inconclusively, the Fedora Engineering Steering Committee (FESCo) took up the proposal at its March 8 meeting (minutes, IRC log at the point it was discussed). After some discussion, much of it about the upgrade path for users who do want the full curl, the proposal was unanimously rejected as it stands, with an invitation to bring it back with some changes. It turns out that switching from the default curl-minimal to curl would not be done with the expected "dnf install" command or similar, but would need to use the less well-known "dnf swap" command. That was deemed confusing and surprising to users, so part of any re-submission will be changing the way the split into two packages was made.
In addition, some of the arguments for and against the benefit with respect to security concerns for the "extra" protocols were aired, but not resolved there either. It is not clear whether simply changing the packaging approach will be enough to get the feature over the line or not. To some, the overall benefit is low, while user confusion is clearly a possible outcome. The need for the feature to be the default distribution-wide seems unclear, as well; having container images and the like default to curl-minimal, while leaving other Fedora editions with the full curl package by default, might be a kind of middle ground.
Curl is used in lots of different ways within scripts and programs of various kinds, including in dnf itself. Since the curl upstream has not taken the modular approach, at least yet, any kind of attack-surface reduction for curl in Fedora is going to require this kind of "big hammer", where many of the protocols are shunted aside. It does seem like it leaves open the possibility of having to squeeze a few more protocols or features into curl-minimal, as with IDN support, if it is realized they are actually widely needed. For now, at least, curl-minimal will not become the default for Fedora 37, however.
Belenios: a system for secret voting
As part of the recent discussion on switching
to secret voting for Debian general resolutions (GRs), which has
resulted in a ongoing GR of its own, the
subject of voting systems that embody various attributes some would like to
see for voting in Debian has been brought up. One of the systems mentioned, Belenios, provides an
open-source "verifiable online voting system
". Whether or not
Debian chooses to switch to secret voting, Belenios would seem to provide what
other projects or organizations may be looking for as a mechanism to handle
their voting needs.
Background
As highlighted by the discussion of, and amendments to, the Debian GR, secret voting means different things to different people. Generally, though, people want trustworthy elections foremost, which means that voters (and those affected) need to understand and believe in the mechanisms used to cast their ballots and tabulate the results. There are cryptographic protocols that can be used to provide a technical solution to some or all of those problems, but there are social and other considerations that may render them unusable in real elections—at least those held by governments at various levels. Our coverage of a talk at linux.conf.au 2020 can help with some of the reasons for that, along with accounts of poorly written implementations of the cryptographic protocols for voting systems.
Debian currently has two types of votes: an annual election for the Debian project leader (DPL)—this year's DPL election process started on March 5—and for GRs that get the requisite support from Debian developers (DDs). Six developers can force a vote on any issue via a GR, one sponsor and five seconds is all that it takes. All of the voting is done via PGP-signed email and the ballots and voter lists are published for all to see after the vote. The difference is that DPL election results do not provide a mapping from voter to ballot, while GR voters and ballots are matched up, so everyone can see how each voter chose to rank the options on the ballot.
Debian uses the Condorcet voting system, which allows voters to rank their choices of candidates or proposals, with some specific mechanisms for dealing with the relatively unlikely circular results that can arise at times. It is, already, a fairly complicated scheme that requires some sophisticated thinking from voters in order to understand how it works. Various other forms of ranked voting have been used in governmental (and other) elections to try to avoid some of those complexities for elections with a less-technical electorate.
For both Debian election types, anyone can use the published ballots to verify that the reported results match the votes cast; any DD can also directly see that their recorded vote matches how they voted for GR elections. For DPL elections, the devotee vote-collecting system returns a secret code to voters when a correctly signed ballot is submitted via email; that code can be used with the hash value reported in the tally sheet to verify that their vote was included in the ballots. In both cases, developers who did not vote can check to ensure that no vote was recorded for them, either by checking the separate voter list for DPL elections or on the tally sheet for a resolution. All of that adds up to a level of transparency for Debian voting.
Meanwhile, some voters have been uncomfortable with having their ballots published along with their names for GRs, especially contentious resolutions such as the 2021 GR about a position statement on Richard Stallman's return to the FSF board. That led to the current resolution, which is meant to change resolution voting to be reported the same way as DPL elections are now, breaking the link between voters and their ballots.
One of the other pieces of the original secret-voting proposal (there are currently two other ballot options that will be voted on as part of the GR) is to remove the requirement of voting by email from the Debian Constitution. That would allow the project secretary to explore alternative voting mechanisms that might augment the current email-based system; some developers are worried that might also lead to eliminating the ability to vote by email, however. Longtime secretary Kurt Roeckx plans to look into Belenios if the email-voting requirement is removed.
Belenios
The Belenios system would seem to provide much of what the proponents of secret voting on GRs are looking for. It is developed by Stéphane Glondu, who is a Debian developer, and provides both vote privacy and verifiability using homomorphic encryption. Using that mechanism, no one can actually see the ballot of any particular voter; the outcome can be calculated without decrypting the ballot. The home page describes the verifiability of the systems as follows:
Every voter can check that her vote has been counted and only eligible voters may vote. [Verifiability] relies on the fact that the ballot box is public (voters can check that their ballots have been received) and on the fact that the tally is publicly verifiable (anyone can recount the votes). Moreover, ballots are signed by the voter credential (only eligible voters are able [to] vote).
The "how it works" page gives more detail on the process and the Belenios paper goes into even further detail. Voters have a set of login credentials for the web interface (username/password) as well as a separately communicated credential that is the private ElGamal key for the user. The server stores the associated public key; the voter makes their choices, then the private key is used client-side to encrypt the ballot, which is sent to the server.
The server uses the public key to validate the ballot, using a zero-knowledge proof supplied by the client that the ballot is valid (e.g. only contains votes for the proper number of options). A zero-knowledge proof is a way to prove an assertion without giving any additional knowledge to the entity who is verifying the proof, other than that the assertion is true. In this case, the validity of the ballot can be proved without giving the server any information about the actual vote choices in the ballot. The linked-to Wikipedia entry has some "real world" examples, along with the mathematical background for the idea of a zero-knowledge proof.
Once the server has validated the ballot, it then lists the "tracking number" (a hash of the encrypted ballot) for the ballot on the public information page for the election. Voters can check for their tracking number on that page and, using their private key locally, can see their actual vote recorded on the ballot; no one else can decrypt the ballot itself. The key pair is normally generated by the server and the private half is discarded after sending it to the user, but it is possible for another entity to generate the keys and communicate each half to its respective owner. Securely distributing the private key is clearly an important aspect; unencrypted email may not be the best choice.
The outcome of the election is determined by the tally process, which is described this way:
By default, the election server stores the decryption key. The encryption scheme has a particular property, called a homomorphism: from the encrypted ballots, anyone can compute the encryption of the result (the sum of the votes for each candidate) by combining the ciphertexts (without using any key). This way, only the final encrypted result needs to be decrypted, which guarantees vote privacy: the ballot of an individual voter is never decrypted.
The final encrypted result of the election can instead be decrypted by one or more election administrators who have been designated as decryption authorities; all of those authorities need to collaborate to decrypt the result. Another zero-knowledge proof is used to show that the encrypted outcome of the election is the result of the votes made, once again without decrypting the ballots. To a certain extent, it sounds like "magic cryptographic fairy dust", but the underlying principles are well-established and well-understood—at least within the cryptographic world.
Tools for auditing and verifying the process are part of the Belenios repository,
though the project welcomes others to create tools for working with the
system. The protocol
specification "should provide all the necessary details (but of
course, questions are welcome)
". The online Belenios server can be
used to run elections of up to 2500 voters, or, of course, the code can be
run elsewhere on a private server. The FAQ has information on doing so
(as well as answers on other topics). One thing that does not seem to be
mentioned on the site is whether there has been any security audit or
formal verification of the Belenios code; one suspects that may need to
happen before it can be adopted, at least for some kinds of votes.
The default mechanism for running the vote is a form of approval voting
where voters can choose one or more of the candidates they find acceptable;
"for example, she can select between 3 and 5 candidates among a list
of 10 candidates
". Support for other voting mechanisms
(including Condorcet voting) is under development.
Back to Debian
The plan to remove the "by email
" phrase from the
constitution, which is part of the original proposal by Sam Hartman
(Proposal A on the GR page) is one that does not sit well with some, even
if they are inclined toward moving to secret ballots. That has resulted
in Proposal B, which only adopts the secret-voting language, leaving
"email" alone and not adding the two provisions about overriding or
replacing the secretary that are part of Proposal A. There is also
Proposal C that would re-affirm public voting for resolutions, which was added, at
least in part, because the implementation details of a secret-voting system
are not specified in the other proposals.
It seems at least plausible that an email vote-collection mechanism could be added to Belenios if Proposal B carries the day. It could use Debian developers' public PGP keys on the distribution keyring to encrypt the ElGamal private key needed to create a properly signed ballot; those could be sent to each developer. Or, potentially, Belenios could be changed to use emails signed with developers' PGP keys directly. That might open up the ability for some voters to use the Belenios web application and others to use email as usual. Obviously, if Proposal C is the winner, none of that will be needed.
There are some known weaknesses in the system used for DPL elections; users could be forced to reveal their secret code, or a group of users could collaborate to unmask their votes, either of which would break the secrecy of the ballots. Those problems are not really alleviated with Belenios, as the ElGamal private key plays the same role as the devotee secret code, but the distribution has been living with those possible attacks for a long time now.
The discussion period on the secret-voting resolution runs until March 11, so other proposals could be added to the ballot before then. The two-week voting period will presumably start shortly thereafter. It seems at least possible, perhaps even likely, that Proposal A went a bit too far in terms of trying to clean up somewhat unrelated ambiguities (secretary overreach); in addition, eliminating the email language may also be a step too far for some. There would seem to be legitimate reasons to move away from public voting on GRs, even if the most contentious ones tend to be political, rather than technical or Debian-internal, questions that many DDs think shouldn't be voted on at all. In any case, we should know what the project thinks about secret voting by the end of March or so. If secret voting is chosen in some form, perhaps Belenios can be adapted to fit the bill.
Generalized address-space isolation
The disclosure of the Meltdown and Spectre vulnerabilities put a spotlight on the risks that come with sharing address spaces too widely. Even if the protection mechanisms provided by the hardware should prevent access to sensitive data, those vulnerabilities can often be used to leak that data anyway. So, from the beginning, mitigation strategies have included reducing the sharing of address spaces, but there is more that could be done and ongoing interest in doing so. Now, this patch set posted by Junaid Shahid (containing work from Ofir Weisse and inspired by earlier patches from Alexandre Chartre) shows what would be required to create a general address-space isolation (ASI) mechanism for the kernel.
Protecting data with ASI
Speculative-execution vulnerabilities come about when the CPU can be fooled into accessing arbitrary memory in a speculative mode, bypassing checks that (presumably) exist in the code to prevent such access. Whenever it becomes clear that the CPU has predicted wrongly, the effects of the speculative execution will be undone, but traces will be left behind in various hardware caches. Hostile code can look for those traces and use them to exfiltrate data that would otherwise not be accessible to an attacker.
These attacks cannot work, however, against memory that is not accessible when the attack is underway. That is why kernel page-table isolation is effective against Meltdown; if the kernel's memory is not mapped while an attacker's code can run, it cannot be exposed during speculative execution. Keeping the kernel's address space unmapped is sufficient to protect against Meltdown exploits running in user space.
Spectre attacks, instead, normally target the system while it is running in kernel mode and all of kernel memory is mapped, so the current kernel page-table isolation implementation offers no protection there. But the kernel almost never needs access to all of its address space, and often accesses almost none of it. Thus the appeal of greater use of address-space isolation: by walling the kernel off from even its own memory when there is no need for that memory, ASI can eliminate many possible attacks.
There are other ways of blocking Spectre attacks, some of which are implemented in the kernel now, but many of the current Spectre mitigations are expensive and incomplete. Flushing the memory caches on every return to user space will block many exploits, but at a significant run-time cost, for example. Administrators must also disable simultaneous multi-threading (SMT) — which can hurt performance badly — or leave open the possibility of attacks from a sibling CPU. If ASI can be made sufficiently effective at blocking attacks, it might be possible to dispense with the current mitigations and gain some CPU performance back.
One place where better address-space isolation could help is in the area of virtualization. Virtual machines running under KVM will often need to trap back into the host kernel to carry out various tasks, but those requests can usually be handled without access to most of the kernel's address space. Since virtual machines might well be running malicious code, and they may be running on mixed-tenant systems, protecting against Spectre attacks from that source has long been of special interest. So it is not surprising that the current ASI patches originate from cloud providers (Oracle originally, Google now) and address KVM in particular, even though the mechanism has been designed to be more general than that.
Sensitive and non-sensitive memory
The core idea behind this patch set is in the concept of "address-space-isolation classes", each of which describes a specific security context. The unrestricted class is for the kernel with full access to the entire address space — the way the kernel works now, in other words. The restricted classes are defined as a subset of the unrestricted class. Any page-table mapping that exists in the restricted classes is identical to the same mapping in the unrestricted class, but the restricted classes lack mappings for much of the sensitive data that is mapped in the unrestricted class.
A system running with ASI can be expected to be running in a restricted class just about any time that user-space code is running. The KVM-specific ASI class will be entered, for example, before giving control to the kernel running in the guest system. A kernel page-table isolation class (if it existed — the patch set describes the possibility but does not contain an implementation), instead, would be entered before returning to user space on the host system. But an important aspect of ASI is that restricted classes can also be used when running in the kernel if access to sensitive data is not required. Thus, for example, the kernel should be able to handle many KVM-related tasks without ever leaving the KVM ASI class.
There are three levels of sensitivity defined in the patch set for data stored in memory:
- "Sensitive" memory, which should never be leaked out of the kernel.
- "Locally non-sensitive" memory, which is harmless if leaked to the currently running process, but cannot be allowed to leak further.
- "Globally non-sensitive" memory can be leaked far and wide without unpleasant consequences.
When address-space isolation is in use, sensitive memory is only mapped while the kernel is running and, even then, only if the kernel actually needs it. Globally non-sensitive memory can remain mapped all the time. Unlike the other two classes, locally non-sensitive memory is different for every process; it can be mapped while the current process is running but is not mapped into any other process's address space.
These memory classifications apply for all of the restricted ASI classes; if there are many of those classes, they all restrict the kernel's address space in the same way. In other words, memory that is sensitive in one restricted ASI class is sensitive in all of them. The difference between ASI classes comes down to how much of user space is mapped and a set of hooks that are run whenever the kernel enters or exits one of those classes. Entry into the KVM class, for example, requires flushing the memory caches to frustrate any Spectre attack that may be running in the virtual machine. If the current kernel page-table isolation mechanism were implemented in this scheme, that class would not need to do a cache flush on entry, since removing the kernel's address-space mappings is sufficient.
There is an interesting decision built into this patch set: if the kernel tries to access sensitive data while running under a restricted ASI class, a processor trap will occur. One possible reaction at that point would be a kernel oops, which would certainly prevent speculative attacks against that data. The ASI patch set, instead, will exit the current ASI class and go into the unrestricted mode in this case, allowing the access to proceed. This response should be sufficient to block speculative access to that data (since speculative execution will simply stop rather than causing a trap) while, at the same time, not getting in the way of legitimate kernel accesses.
One reason for this approach should be reasonably clear: the kernel's address space is huge, and nobody could ever hope to properly determine the sensitivity of every data structure the kernel uses and mark it accordingly. Beyond that, somebody would have to find all of the places where the kernel must exit any restricted class to work with the sensitive data. Even if somebody did achieve all of that, there would be no hope of maintaining it going forward. This architecture thus embodies an acknowledgment that it will never be possible to properly mark all data and all places where sensitive data must be used, so it will always be necessary to make things work anyway.
Classifying memory
Still, an attempt must be made to try to properly mark at least the most sensitive and most frequently accessed data; much of the 47-part patch set is focused on that task. For dynamically allocated memory, there is a new set of GFP flags (__GFP_GLOBAL_NONSENSITIVE and __GFP_LOCAL_NONSENSITIVE) for marking allocations that do not hold sensitive contents. Calls to the page or slab allocator can use those flags to put the resulting memory into the desired class; memory allocated with vmalloc() can also be classified in this manner.
Internally, the kernel maintains two sets of page tables for its own address space. The unrestricted tables are the same as they appear in current kernels; included therein is the "direct map" that makes all of physical memory available in the kernel's address space. The restricted page tables start with almost no mappings at all. Whenever globally non-sensitive allocations are made, the mappings for the allocated pages are copied into that second set of page tables at the same addresses. When running in a restricted mode, the second page set of page tables is made active in place of the first, providing access to the non-sensitive data but not to any sensitive data.
Handling locally non-sensitive allocations is a bit trickier. When the kernel is running in a restricted mode, only the globally non-sensitive part of the direct map is actually present in the page tables. Since this data is globally non-sensitive, there is a single restricted table that is used for all processes. But the locally non-sensitive mappings must be unique to each process, so they cannot live in a single, global page table. That raises the question of where, in the address space, those mappings can go.
The solution that was chosen was to duplicate the direct mapping, so that now the kernel has two complete mappings for all of physical memory. In a restricted mode, the first mapping contains the globally non-sensitive data, as before. Locally non-sensitive allocations, instead, are set up using the second mapping. Once again, only the pages that have been specifically allocated as locally non-sensitive are mapped in the restricted version of this range, and each process has its own version of this mapping. As a result, the locally non-sensitive data for the running process is accessible even in the restricted mode, but it is not accessible from any other process.
That solves the problem, at the cost of halving the amount of physical memory that can be managed by the kernel, since each physical page must now be mapped twice.
For static variables, there is a separate set of flags with names like __asi_not_sensitive and __asi_not_sensitive_readmostly that can be added to the declaration. As one might expect, there is no way to declare static variables as being locally non-sensitive. Static per-CPU variables add another level of complexity, leading to declaration macros with concise names like DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE().
The developers have not attempted to mark every allocation and declaration properly; as noted above, that is not a task that a rational person (or even a kernel developer) would attempt. But they have spent some time running with the patch set and making note of which accesses caused traps and forced an exit into the unrestricted mode. By identifying and marking the busiest, non-sensitive data structures, they were able to reduce the overall performance impact of this mechanism.
Isolating KVM
With all that infrastructure in place, along with a mechanism to allow controlled mapping of user-space memory into the restricted context, it becomes possible to set up address-space isolation for KVM in particular, and to simultaneously disable some of the expensive Spectre mitigations. Whenever the kernel gives control to the guest, it ensures that the KVM ASI mode is enabled; on return to the kernel, an exit to the unrestricted mode may or may not be performed, depending on what the kernel has to do. If that exit can be avoided (because no sensitive data need be accessed), the cost of cache flushes can be avoided. Meanwhile, with luck, even a hostile guest system will be unable to exploit Spectre vulnerabilities in the host kernel.
For this protection to be complete, though, the kernel must protect itself against not only the guest, but also against a hostile process running on an SMT sibling. In the full implementation, if the cost of simply disabling SMT is to be avoided, that means "stunning" the sibling — suspending its execution — while the kernel is running in the unrestricted address space. When the kernel returns to KVM, the sibling CPU must then be resumed ("unstunned"); the kernel must also flush the memory caches to prevent any data leakage. The implementation of sibling stunning is not part of this patch set; as with the stunning of siblings in the real world, there are potential consequences that must be considered first. In this case, the interaction between stunning and the scheduler has not yet been fully reviewed, so this work has not yet been posted.
The end result of all this is a first look at a form of ASI that could increase both safety and performance for systems running guests with KVM, and which could be extended to cover any number of other situations where ASI might be indicated. As of this writing, the patch series has received no review comments at all, which may be a result of the size and complexity of the work as a whole. It does represent an interesting set of tradeoffs; it can improve both performance and security, but at the cost of a lot of code churn and ongoing maintenance of sensitivity annotations. In a world where one might hope that, before too long, hardware will no longer contain Spectre vulnerabilities, this cost may well appear to be too high. If, instead, we believe that Spectre may haunt us for a long time yet, though, the calculation could well be different.
When and why to deprecate filesystems
It is a good bet that a significant amount of code in the kernel is entirely unused. Even so, that code must still be maintained and shipped, posing an ongoing cost to the development community. What should be done with code that is unmaintained and, possibly, unused? Answering that question requires understanding which users still exist, if any, and taking a hard look at what the future support requirements for that code will be. The kernel community has recently discussed this problem in the context of filesystems, and the Reiserfs filesystem in particular, with a focus on the approaching 2038 deadline.Removing support for old hardware is difficult enough, but there does often come a point where it becomes possible. If a particular device has been unavailable for years and nobody can point to one in operation, it may be time to remove the support from the kernel. Another strong sign is a complete lack of maintainer interest; that led to the recent decision to remove support for the NDS architecture, for example. Filesystems can be harder, though; they are independent of the hardware and can thus live far longer than any particular device type. Users can hold onto a familiar filesystem type for a long time after most of the world has moved on.
Reiserfs is certainly a case in point; this filesystem was first covered in LWN in 1999; it found its way into the 2.4.1 kernel the next year despite a fair amount of opposition based on the allegedly stable nature of 2.4 releases. There were a number of reasons for the inclusion of Reiserfs; chief among them, perhaps, was that it was the first Linux filesystem to support journaling. This filesystem attracted a fair amount of interest in its early days and some distributions adopted it as the default choice, but its own developers quickly moved on to other things; by 2004, Hans Reiser was arguing against enhancing Reiserfs, saying instead that his new Reiser4 filesystem should be adopted instead. In the end, Reiser4 was never merged, but Reiserfs lives on in the kernel.
Recently, Matthew Wilcox observed that maintenance of Reiserfs appears to have stopped. So he naturally wondered if there were still any active Reiserfs users, or whether perhaps the filesystem could be removed. Keeping it around has costs associated with it, after all, and it is getting in the way of some enhancements he would like to make.
As noted above, there is not normally a natural point where kernel developers can conclude that there is no value in keeping a filesystem implementation in the kernel. Reiserfs still works for any users still running it, and they would be within their rights to see its removal as causing just the sort of regression that the kernel community so loudly disallows. So the best that can usually be done is to place a prominent deprecation warning in the kernel itself and wait a few years; if opposition to the removal does not materialize during that time, it is probably safe to do. A patch adding that deprecation has been posted and seems likely to be merged for the 5.18 kernel release.
During the discussion, Byron Stanoszek did surface to confess his ongoing use of Reiserfs and desire to see it supported for a bit longer. Jan Kara responded by noting the limited nature of the support Reiserfs gets now:
Frankly the reality of reiserfs is that it gets practically no development time and little testing. Now this would not be a big problem on its own because what used to work should keep working but the rest of the common filesystem infrastructure keeps moving (e.g. with Matthew's page cache changes, new mount API, ...) and so it can happen that with a lack of testing & development reiserfs will break without us noticing. So I would not consider reiserfs a particularly safe choice these days and rather consider migration to some other filesystem.
Kara did offer to extend the deprecation period for Reiserfs, though, if it were really necessary.
As it happens, Stanoszek raised an issue that plays into the timing of the deprecation of Reiserfs, and highlights one of the issues associated with deprecation in general. Even the most dedicated users of Reiserfs will eventually find themselves wanting to move on because that filesystem has a year-2038 problem. In January of that year, the timestamps used within Reiserfs will overflow, leading to overall confusion. Since these timestamps are buried deeply within the on-disk filesystem format, they can't be fixed with a simple code tweak. Evacuating any data from Reiserfs filesystems before then seems like a prudent thing to do.
This might not seem like an urgent problem, since the deadline is over 15 years away. But there are reasons to take action now, which is why Dave Chinner asserted that the time had come to deprecate all filesystems that are not 2038-ready. He later explained in more detail:
With that in mind, this is why we've already deprecated non-y2038k compliant functionality in XFS so that enterprise kernels can mark it deprecated in their next major (N + 1) release which will be supported for 10 years. They can then remove that support it in the N+2 major release after that (which is probably at least 5 years down the track) so that the support window for non-compliant functionality does not run past y2038k.
The point is that deprecating a filesystem in the mainline kernel does not change the fact that it was included in recent long-term-support kernels. The 5.15 kernel will, if past patterns hold, be supported until 2028; it will have Reiserfs in it that whole time. As Wilcox pointed out, that serves as a sort of lifeline for users who cannot move away from the filesystem immediately, which may be a good thing. But it also poses an ongoing problem for developers charged with maintaining those kernels.
That is because supporting deprecated code in a long-term-stable kernel will be increasingly difficult. Ted Ts'o noted that maintainers of stable kernels will not be able to rely on upstream for fixes anymore, and any fixes that are made may not propagate to all kernels due to the lack of a central location for them. So doing high-quality maintenance of Reiserfs for six years, after it has been removed from the mainline, will be a challenge. Any enterprise kernels based on that stable release may include Reiserfs for longer than that. Enterprise distributors have to think in terms of supporting current kernel features for as long as 15 years; if they want to avoid the additional challenge of supporting year-2038-incapable filesystems after that date, the deprecation process needs to start now.
The implication is that we are likely to see other filesystem deprecations before too long. The NFSv3 protocol, for example, uses 32-bit time values and will break in 2038. The ext3 filesystem has similar problems; in this case, the data can be read as an ext4 filesystem, but the on-disk format simply cannot represent times correctly. So, like the XFSv4 on-disk format, which is also not year-2038 capable, ext3 needs to be migrated away from.
These deprecations will likely cause some unhappiness among users who have stuck with a working solution for years; switching to a new filesystem type can make people nervous. But the alternative is worse. Year 2038 provides a nice (and legitimate) excuse for developers to remove some old and unloved filesystems a bit more quickly than they might otherwise be able to. Once that passes, deciding when old filesystems should go will, once again, be a more difficult problem.
Fedora's missing Chromium updates
Google's Chrome browser seemingly dominates the Internet at this point, but that does not mean that everybody wants to run it. Chrome, of course, is built on an open-source project called Chromium but is not an open-source product itself; it includes a number of proprietary add-ons. But the Chromium source is out there and can, with some effort, be used to build a working, open-source browser; a number of distributors do so. But Chromium is famously hard to package, and distributors have, at times, struggled to keep up with it; a recent discussion in the Fedora community has brought new attention to this problem.Comparisons between Chrome and Chromium often focus on what the latter browser lacks. It doesn't have Google's automatic updates, for example, and it is missing a number of codecs for problematic media formats. Chromium's ability to use the Google bookmark-synchronization feature was taken away in 2021. But Chromium users can also point to what is gained, starting with the fact that it is free software. Chromium lacks many of the data-reporting mechanisms found in Chrome and is rather less insistent about using one's Google ID with random web sites. Distributors can also add their own features as well.
The problem with Chromium is that it is a huge and messy program to build. The source tarball (compressed) weighs in at well over 1GB. The list of dependencies is long; some of those are bundled with the browser source, while others must be provided by the operating system. The result is that even an out-of-the-box build can be challenging; if the distributor has to make changes to meet its own requirements, the problem gets harder yet.
Fedora does have its own requirements. As a general rule, bundled libraries are not acceptable; packages are expected to use the shared libraries provided by the distribution. Chromium, like other applications, is expected to integrate with the rest of the Fedora environment — working well with the Wayland display system, for example. Red Hat's legal team places its own requirements on software that can be shipped, meaning that some of the code (codecs, primarily) that is part of Chromium must be excluded from the build. And, just because that all isn't challenging enough, Fedora builds the browser with GCC, despite the fact that the Chromium developers use LLVM.
It all comes down to a difficult impedance-matching task for anybody who works to build Chromium for Fedora. This is not a new situation; it first cropped up on LWN back in 2009, when including Chromium in Fedora seemed like an impossible task. Chromium was finally able to enter the Fedora repository in 2013 and has been there ever since. The task of packaging Chromium has gotten easier for distributors since, but "easier" is not the same as "easy".
One of the other advantages claimed for Chromium over Chrome is its faster
update cycle. But source updates and updated packages in distribution
repositories are two different things. The current Chromium release is
99.0.4844.48 but, as of this writing, Fedora is shipping version
96.0.4664.110, which was released in December. Jonathan Schleifer took
to the Fedora development list to complain about this lag. He included
a long list of CVE numbers that have been addressed since Fedora last
shipped a Chromium release, noting that a number of them are said to be
actively exploited on the net. Fedora Chromium users, he said, should
"stop using it NOW
", and Fedora should consider using the
RPM Fusion version, which is currently
at 98.0.4758.102.
Demi Marie Obenour went further, saying
that Fedora should perhaps loosen its standards for the Chromium package:
"In the case of something like Chromium, a sloppy package that gets
timely updates is better than a fully conforming package that does
not
". That led to a strong response
from Neal Gompa, who expressed his disappointment with anybody "who
thinks it's okay to do less than a good job on shipping software
".
Beyond addressing the integration issues (both for users and lawyers) with
Fedora, he said, the Fedora build brings a number of advantages:
For example, Fedora's Chromium will attempt to use Wayland by default on a Wayland desktop. Upstream Chrom(e|ium) is not ready for that yet. We ship VA-API integration, which Google doesn't offer. We have working screencasting on Wayland, which upstream doesn't have right now by default. We can enable security features that upstream refuses to (CaBLE, for example). And so on.
Falling back to "sloppy packaging
", he said, would lose those
benefits. Tom Seewald responded
by saying that, if keeping Chromium current is too much work, the browser
should simply be removed from the repository.
Tom Callaway, who has done the bulk of the work to maintain Chromium in
Fedora for all these years, jumped
in to say that, due to family issues, he has been short of time to work
on open-source projects. He defended the work that Fedora does with
Chromium, though, and said that he had finally managed to get past some
build failures that had been holding things up. The results of that work
can be seen in the Rawhide distribution, which currently ships version
98.0.4758.102. Not that the problem is solved: "Of course, Google
released a new major version this morning, so the terrifying carousel spins
anew
".
Fedora will likely get an updated Chromium in the near future. Meanwhile, the criticisms of its maintenance of the Fedora Chromium package may have encountered strong pushback on the mailing list, but they are not entirely without merit. Shipping an Internet-exposed application with known security holes is the sort of thing that distributors like Fedora normally go far out of their way to avoid, and any users who are compromised by one of those vulnerabilities will take little comfort from the otherwise high quality of the Chromium package. An outdated and vulnerable Chromium package falls short of the standards that the Fedora community sets for itself.
While there are developers who are paid to work on Fedora, the distribution also depends on volunteers working on their own time. The Chromium package may have suffered from this; it looks a lot like a heroic, one-person effort that could benefit from some extra help. Perhaps it is time for both Red Hat and the Fedora community to try to provide that help. Even users of other browsers benefit from a solid Chromium package for all of those "Chrome-only" web sites; failing to provide that could prove harmful for Fedora in the long run.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Briefs: Firefox 0-days; Spectre BHI; Dirty pipe; DENT 2.0; Blender 3.1; Firefox 98; PipeWire; Quote; ...
- Announcements: Newsletters, conferences, security updates, patches, and more.