LWN.net Weekly Edition for October 29, 2020

Welcome to the LWN.net Weekly Edition for October 29, 2020

This edition contains the following feature content:

The recurring request for keyword indexing in Python: function calls can have keyword arguments; why not object indexes as well?
Constant-action bitmaps for seccomp(): speeding up a common seccomp() use case.
The rest of the 5.10 merge window: what was merged in the latter part of the 5.10 merge window.
Two address-space-isolation patches get closer: security through making memory invisible.
Rejuvenating Autoconf: an unloved but widely used tool gets some much-needed attention.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

The recurring request for keyword indexing in Python

By Jake Edge
October 28, 2020

Python has keyword arguments for functions that is a useful (and popular) feature; it can make reading the code more clear and eliminate the possibility of passing arguments in the wrong order. Python can also index an object in various ways to refer to a subset or an aspect of the object. Bringing the idea of keywords to indexing would provide a way to get the clarity benefit for indexing operations; doing so has been discussed in Python circles for a long time. Some renewed interest, in the form of lengthy discussions on the python-ideas mailing list and a new Python enhancement proposal (PEP), look like they just might take keyword indexing over the finish line.

Back in 2014, PEP 472 ("Support for indexing with keyword arguments") was created to push the idea, but it was rejected in 2019, largely because it failed to "gain any traction" in the interim. Since then, it has featured twice in lengthy python-ideas threads. Caleb Donovick raised it roughly a year ago, which generated a lively thread. Then, in May, Andras Tantos brought it up again, though he did not get much of a response for several months. But when Stefano Borini, one of the PEP 472 authors, replied in July, the ball started rolling.

Keyword indexing

The idea is fairly straightforward; it simply applies the concept of keyword arguments to indexing:

    print(matrix[row=4, col=17])
    some_obj[1, 2, a=43, b=47] = 23
    del(grid[x=1, y=0, z=0])

Those examples exercise the three separate operations for indexed objects: getting, setting, and deleting a value. Those operations are implemented using special double-underscore (or "dunder") methods on objects: __getitem__(), __setitem__(), and __delitem__(). Python classes often have their own implementations of those operations, which could be extended to handle keyword indexes if the language allowed it. There are a number of reasons one might want the ability to use keywords; making the code more readable by documenting which indexes are being referred to is at the top of the list. But Tantos raised a use case that did not exist when PEP 472 was written: type hints. The keyword indexes could also be used in type hints to document what the types are being used for; keyword-index support in the parser would allow better type hints. As he put it:

For example, instead of doing the following:

     def func(in: Dict[str, int])

one could write:

     def func(in: Dict[key=str, value=int])

The general idea was not particularly controversial, even though it was unclear to many whether the feature was really needed. But, as is so often the case, the "corner cases" are the area that generated most of the discussion. In particular, there is the question of how to handle indexing an object that has both positional and keyword indexes. Jonathan Fine laid out a vision of the semantics of keyword indexing. In it, he noted that the way the dunder methods are currently called could make some things problematic.

In particular, when multiple positional indexes are used, they are all collected up into a tuple that is passed to the methods as a single argument. The slice notation (e.g. x[1:3] to indicate the second and third elements of x) is passed as a slice object like those returned from slice(). Fine described how current Python could be used to prototype the solution by creating a key (or K) class that would use function syntax to work around the fact that keywords in indexing are a syntax error right now:

This introduction of K allows collections to be created, that can be used today with the syntax

    >>> value = d[K(1, 2, 3, a=4, b=5)]
    >>> d[K(1, 2, 3, a=4, b=5)] = value

Using a tuple to collect up the positional indexes has some downsides, but it is not something that can be changed at this point. It might be easier to work with positional indexes passed as individual arguments to the dunder methods, but that ship has sailed, as Steven D'Aprano pointed out:

[...] there are serious backwards compatibility concerns regarding the use of multiple parameters, since they are currently legal and bound up in a tuple argument:

    # Current Python
    myobj[1, 2]  # calls __getitem__ with a single tuple argument (1, 2)

    # How to distinguish the above from this?
    myobj[1, 2]  # call __getitem__ with two int arguments?

However, there is no such concern regarding keyword arguments.

In addition, D'Aprano said that there is no need to use the K class envisioned by Fine, except for experimentation; adding keywords to indexing could be done in a fairly straightforward way:

[...] those who wanted keyword args could add them, with or without defaults:

     def __getitem__(self, item, *, spam=None)
     def __setitem__(self, item, value, *, spam, eggs=None)

I trust the expected behaviour is obvious, but in case it's not:

    myobj[1, 2, spam=3]  # calls __getitem__((1, 2), spam=3)

    myobj[1, 2, spam=3]  = 999
    # calls __setitem__((1, 2), 999, spam=3, eggs=None)

(And similarly for delitem of course.)

I must admit I like the look of this, but I don't know what I would use it for.

He used the "*" in the argument list to make spam and eggs be keyword-only arguments, so those parameters can never be passed positionally.

After Guido van Rossum expressed interest in the idea, encouraging participants to work out the API and then to draft a PEP, Fine presented some further ideas and asked:

So what should be the new behaviour of:

    >>> d = dict()
    >>> d[x=1, y=2] = 3

He argued that it should add a new entry to the dictionary, with a key of "x=1, y=2" and a value of three. But D'Aprano thought that it should raise a TypeError instead. "Just because something is syntactically allowed doesn't mean it has to be given a meaning in all circumstances." His idea would be that the existing object types would not change, but that objects defined in Python programs could use the facility if it was deemed useful.

In early August, Fine announced his kwkey module that is meant to be used to work out the design of the keyword-indexing feature. In particular, it implements Fine's vision of how things should work, as well as D'Aprano's, so that people can experiment with them. While the differences between them are somewhat subtle, they boil down to how to treat the positional indexes; Python currently collects them up into a tuple—that would need to be maintained for objects that do not support keyword indexing—but it could change for objects that accept keyword indexes. As Fine put it:

[...] At present

    d[1, 2]
    d[(1, 2)]

are semantically equivalent.

There is a proposal, that

    d[1, 2, a=3, b=4]
    d[(1, 2), a=3, b=4]

be semantically equivalent.

I find this troubling, for example because

   fn(1, 2, a=3, b=4)
   fn((1, 2), a=3, b=4)

are semantically different.

The decision made long ago to collect up the positional parameters into a tuple for __getitem__() and friends is really the crux of the matter. As Greg Ewing pointed out, that behavioral difference may be "messing up everyone's intuition on how indexing should be extended to incorporate keyword args, or even whether this should be done at all". That led Ricky Teachey to wonder if it made sense to add new get/set/del dunder methods with different semantics regarding positional indexes. It was discussed at some length, and recurred several times, though it seems unlikely to go anywhere, at least in part because of the run-time cost of choosing between the new and old mechanisms.

Toward the end of August, Fine suggested adding a dunder attribute to distinguish between the current style of positional-index handling and a new style that was more in keeping with handling arguments to functions. Chris Angelico thought there had already been more than enough different proposals, but that Fine had not shown "any advantage over just allowing the current dunder to receive kwargs".

A PEP arrives

Borini, on the other hand, said that perhaps it was time for he and Fine (and possibly others) to come up with a PEP, either by resurrecting PEP 472 or by starting a new one. D'Aprano asked for some direction on the python-dev mailing list; since PEP 472 had been rejected, a new PEP was suggested as the way forward. D'Aprano is a core developer, so he is the sponsor of PEP 637 ("Support for indexing with keyword arguments"). Borini announced the initial version, with Fine as co-author, on September 21.

The PEP outlines a few more use cases, including ways that the feature could clarify some workarounds that are being used by popular packages such as pandas and xarray. For example, the PEP shows the following simplification for xarray:

>>> # old syntax
>>> ds["empty"].loc[dict(lon=slice(1, 5), lat=slice(3, None))] = 10
>>> # new syntax
>>> ds["empty"][lon=1:5, lat=6:] = 10

The proposal effectively applies the normal keyword-handling mechanisms to indexing, but there are some twists. For one thing, all indexes must support slice notation (i.e. lower:upper:stride) and the ellipsis object; current indexes support both of these, so keyword indexes are simply treated the same way. There is no change to the bundling of all positional parameters into a tuple—which would be effectively impossible for backward compatibility reasons—but indexing with only keyword parameters is allowed and an empty tuple will be passed for the positional parameters.

There are some "gotchas", especially with regard to mixing keyword and positional parameters. Most of those problems can be avoided by the use of keyword-only or positional-only parameters specified for the dunder methods. None of the built-in Python types (e.g. dictionaries, lists) will add keyword indexing, but developers who want to use it for their objects can provide appropriate dunder methods. Another potentially confusing behavior is avoided by maintaining the current practice that a single positional index is not turned into a tuple, regardless of any keywords:

As we saw, a single value followed by a keyword argument will not be changed into a tuple, i.e.: d[1, a=3] is treated as __getitem__(1, a=3), NOT __getitem__((1,), a=3). It would be extremely confusing if adding keyword arguments were to change the type of the passed index. In other words, adding a keyword to a single-valued subscript will not change it into a tuple. For those cases where an actual tuple needs to be passed, a proper syntax will have to be used:

obj[(1,), a=3]  # calls __getitem__((1,), a=3)

In this case, the call is passing a single element (which is passed as is, as from rule above), only that the single element happens to be a tuple.

Though he was one of the authors of the PEP, Fine obviously was not entirely happy with the direction it had taken. In early October, he sought some examples of places where the existing language cannot support behavior that PEP 637 would allow. His kwkey package, which was updated and changed along the way, is meant to serve as the example of what can be done currently in Python. But it was not entirely clear what his purpose actually was, as D'Aprano pointed out. D'Aprano encouraged Fine to write a competing PEP or perhaps to oppose the PEP, since some PEPs are written as "the canonical record of why the feature has been rejected", if that is what he was after.

Fine did not reply, but it is clear that there were some back-channel discussions, which culminated in Borini asking Fine to step down as author, which Fine graciously accommodated. In a later posting, Fine seems to indicate that he will be working on a separate PEP to address the inconsistencies he sees in handling keyword arguments between PEP 637 and the existing function call syntax (among other things).

That is where things stand at this point. This incarnation of the discussion has been sprawling, with multiple mega-threads spread out over four months or so. As noted, it is far from the first time it has come up, but it would seem that we are approaching closure of some sort on the idea. Since PEP 472 died from lack of attention, some additional use cases have come to light, and the idea has been ever more thoroughly dissected and discussed. It would seem likely that either some PEP will be accepted (637 or one from Fine) or they will all be rejected and the idea can be put to rest for good. Stay tuned ...

Comments (9 posted)

Constant-action bitmaps for seccomp()

By Jonathan Corbet
October 22, 2020

The seccomp() system call allows user space to load one or more (classic) BPF programs to be run whenever the calling process invokes a system call. Those programs can examine (to an extent) the arguments to each call and inform the kernel whether the call should be allowed to proceed or not. This feature is used in a number of containerization solutions (and beyond) as a way of reducing the kernel's attack surface. In some situations, though, using seccomp() can result in a significant performance reduction. There are currently two patch sets in circulation that are aimed at reducing the overhead of seccomp() for one common use case.

The argument-inspection feature of seccomp() is useful in a number of settings; it can, for example, block a write() call to any file descriptor other than the standard output. But many real-world use cases do not take advantage of this capability; instead, they make decisions based only on which system call is being invoked while paying no attention to the arguments to those calls. It turns out that the BPF mechanism is far from optimal for this case, which must be implemented as a long series of comparisons against the system-call number. The overhead of these comparisons can be reduced by using smarter algorithms (checking for the most commonly used system calls first, for example), but there are limits to how fast it can be. This overhead makes every system call slower.

Much of this work is wasted. If a seccomp() configuration of this type allows read() once, it will allow it every time, but the kernel must work it out the hard way each time regardless. If there were some way of knowing that a given seccomp() filter program allows or denies specific system calls without looking at their arguments, it would be possible to implement those decisions much more quickly.

Optimizing `seccomp()`

In June, Kees Cook posted a patch implementing this sort of optimization. It creates three bitmaps (called allow, kill_thread, and kill_process) within a process; they are indexed by the system-call number. When a system call is intercepted by seccomp(), the associated number is used to consult those bitmaps; if the relevant bit is set in a bitmap, the associated action is taken without ever actually running the BPF program. Thus, the bits for always-allowed system calls can be set in the allow bitmap; they will then execute far more quickly.

The trick is setting those bits in the first place. Cook's patch set works by actually executing the loaded BPF program(s) at load time for every supported system call and watching what happens. If, for a given system call, the BPF code does not access the system-call arguments, the kernel can conclude that the result will always be the same for that call and set a bit in the appropriate bitmap. If, instead, the arguments are accessed, the bit for that system call is cleared in all bitmaps; the BPF program will thus be executed on every invocation of that call.

There is another challenge here: observing whether the BPF program does, in fact, access the system-call arguments. The first version of the patch set did that by placing the arguments in a separate page, running the BPF code, then looking at the page-table entry to see whether the page had been referenced or not. This mechanism worked, but relied on some complex memory-management trickery.

Jann Horn had a better idea: simply emulate the execution of the BPF program and watch what it does directly. The key observation was that the emulator need not be complete, since programs that only compare system-call numbers tend to be quite simple. Only a small subset of the available instructions would need to be emulated; anything that the emulator does not recognize can be taken as an indication that more complex logic is involved and the bitmap cannot be used.

On September 21, YiFei Zhu showed up with a patch series implementing a constant-action seccomp() bitmap using an emulator to determine whether the system-call arguments were being accessed or not. There were a number of other differences from Cook's approach; for example, only the "allow" bitmap is implemented on the understanding that the "deny" cases do not really need to be optimized. Two days later, Cook posted a new version with a rather simpler emulator that is closer to the design first suggested by Horn. Less than one day after that, Zhu returned with a revised series with a simplified emulator that borrowed some ideas from Cook's version.

Cook described Zhu's initial patch set as "significantly over-engineered" and said that he had rushed out his updated version to show "how small I would like the emulator to be" and how the architecture support could be improved. Since then, it would appear that many of the ideas from Cook's implementation have found their way into Zhu's. Version 5 of Zhu's patch set, posted on October 11, adds 292 lines of code to kernel/seccomp.c — compared to 400 in the initial version — while supporting more functionality. Cook has not reposted his work since, suggesting that Zhu's version may be the one that is ultimately merged.

Paths not taken

There is an interesting question to be considered here. Emulating BPF execution and watching what happens does not seem like the most elegant solution to the problem; there are at least two other approaches that could be considered:

The developers writing the seccomp() programs surely know what the desired behavior is. A new seccomp() API could be created to allow user space to pass the bitmap in directly rather than having to reverse-engineer it in the kernel.
seccomp() is one of the few places in the kernel still using the classic BPF dialect. Switching to extended BPF would allow the writing of programs that could make these decisions much more quickly, again without the need to add code to the kernel to guess what the programs do.

The question of why these approaches have not been explored has seen relatively little discussion as these patch sets were considered. Horn did note that creating a new API would require changes in user space to take advantage of it, while the current patch sets will simply make existing programs run more quickly with no changes required. The proposed patches also make no changes to the user-space API for seccomp(), meaning that the kernel community is not committing to anything new by adopting them. A new API for loading the bitmask, instead, would have to be supported forever.

With regard to eBPF, adding that support to seccomp() has come up a few times in the past; the last such would appear to be this patch series posted by Sargun Dhillon in 2018. There are a number of obstacles to be overcome before this support will ever make it into the mainline, though. The BPF maintainers are concerned that use of eBPF in seccomp() could constrain the future development of eBPF itself. Security-oriented developers, instead, worry about the extra capabilities and attack surface provided by eBPF; it would not be hard to introduce new vulnerabilities by putting seccomp() and eBPF together. There is also the little problem that seccomp() filters can be loaded by unprivileged processes, and giving unprivileged code the ability to load eBPF programs is an idea that has fallen on hard times.

The end result is that the current patch sets would appear to be the best that is on offer for improved seccomp() performance anytime soon. The performance increase that comes with using the bitmap is real, according to some benchmarks included with Zhu's patch set. So one should expect to see this optimization merged, presumably for the 5.11 development cycle.

Comments (6 posted)

The rest of the 5.10 merge window

By Jonathan Corbet
October 26, 2020

Linus Torvalds released 5.10-rc1 and closed the 5.10 merge window on October 25; by that time, 13,903 non-merge changesets had been pulled into the mainline repository. Of those, over 6,700 were merged since LWN's summary of the first half of the merge window. A fair number of interesting features found their way into the kernel among those commits; read on to catch up with what's coming in 5.10.

Architecture-specific

The s390 architecture has never quite gotten the hang of leap seconds; when a leap second happens, the system must be rebooted to make its time reflect the new reality. This, as noted in this commit, "is not desired". As of 5.10, s390 systems will be able to adjust to leap seconds on the fly.
The MIPS architecture can now boot kernels compressed with the zstd algorithm.
The RISC-V architecture has gained support for booting on systems with EFI firmware.
Support for non-devicetree i.MX platforms has been removed after having seen no activity for some years.

Core kernel

Some types of BPF programs can now sleep during their execution. This feature is limited to tracing and security-module programs for now.
The new BPF_PROG_BIND_MAP command for the bpf() system call binds a map to a loaded program; its purpose is to facilitate the storage of metadata that the program involved does not use directly.
BPF programs can now access per-CPU variables with the bpf_per_cpu_ptr() and bpf_this_cpu_ptr() helper functions.
The process_madvise() system call has been added; it allows one process to perform an madvise() call on behalf of another. process_madvise() was covered in this January 2020 article but the API has changed since then; see this commit for the merged version.

Filesystems and block I/O

The overlay filesystem has a new "volatile" mode that causes it to ignore all forms of fsync() calls. That is, of course, a dangerous mode to operate in, but it is evidently helpful for tasks like image builds where, should the system die in the middle, one can just start over. See this commit for some details.
The zonefs filesystem has gained a new explicit-open mount option. If that option is present, opening a file for writing will force the associated zone to be made active on the device. This guarantees that a zone that is successfully opened can be successfully written to later. This documentation patch has a little more information.
The XFS V4 filesystem format has been deprecated; users are expected to upgrade to the more-capable V5 format. That said, V4 will be supported until at least 2030, so users have some time to make this change.
The ext4 filesystem has a new "fast commits" mode that can significantly reduce the latency of many file operations. The claimed performance improvements are large; see this patch posting for some benchmark results along with a discussion of the fast commits feature in general.
The new nosymfollow mount option prevents path resolution from following symbolic links on the mounted filesystem. This option does not prevent applications from using readlink() and following symbolic links themselves, though.

Hardware support

Clock: Micro Crystal RV3032 realtime clocks, R-Car V3U clocks, Allwinner A100 clock control units, MediaTek MT8167 clock controllers, and Qualcomm SM8150 and SM8250 display clock controllers.
Miscellaneous: Vivaldi keyboards, Renesas RPC-IF HyperBus controllers, Ricoh RN5T618 charger/fuel gauges, TI BQ25980 battery chargers, Mellanox BlueField I2C controllers, joysticks connected via analog-to-digital converters, Zinitix touchscreens, Toshiba Visconti watchdog timers, and TI R5F remote processor subsystems.
Networking: MediaTek MT7531 Ethernet switches, Marvell Prestera switches, and Microchip MCP25xxFD SPI CAN controllers.

Networking

It is now possible to load a BPF program that can modify TCP header options on packets as they pass through the system. See the changelog in this commit for some information.
The merging of multipath TCP support continues; 5.10 will have the ability to transmit data on multiple flows simultaneously.
The IGMPv3/MLDv2 multicast protocol (RFC 4604) is now supported.
The ISO 15765-2:2016 CAN transport protocol is now supported.

Security-related

The SafeSetID security module has gained the ability to control group-ID changes as well.

Virtualization and containers

The KVM hypervisor can now defer to a user-space process to handle accesses to unknown model-specific registers (MSRs). See this commit for some more information and this commit for a filtering mechanism that gives more control over MSR handling.

Internal kernel changes

The contiguous memory allocator has gained optional NUMA awareness; using it requires setting the DMA_PERNUMA_CMA configuration option and booting with the cma_pernuma= command-line option to specify the size of the per-NUMA space. The DMA mapping layer has been updated to use this feature if it is enabled.
There is a new API for allocating non-coherent DMA areas; see this documentation patch for more information. There is also a new function (dma_direct_alloc_pages()) for obtaining DMA-addressable memory directly from the page allocator.
Changes have been made to prandom_u32(), as discussed in August, to address some theoretical security issues there. The new code uses a variant of the SipHash hashing function to generate pseudo-random numbers and has added some internal entropy sources.

Now the time has come to find and fix the remaining bugs in all of that code. That process will continue over the next seven or eight weeks, culminating on a final 5.10 release on December 13 or 20. As the final release for 2020, 5.10 will probably become the next long-term-support kernel as well so, in a real sense, the work on 5.10 will only be beginning when that release happens in December.

Comments (7 posted)

Two address-space-isolation patches get closer

By Jonathan Corbet
October 27, 2020

Address-space isolation is the technique of removing a range of memory from one or more address spaces as a way of preventing accidental or malicious access to that memory. Since the disclosure of the Meltdown and Spectre vulnerabilities, the kernel has used one form of address-space isolation to make kernel memory completely inaccessible to user-space processes, for example. There has been a steady level of interest in using similar techniques to protect memory in other contexts; two patches implementing new isolation mechanisms are getting closer to being ready for merging into the mainline kernel.

`memfd_secret()`

The first of these is the memfd_secret() patch set from Mike Rapoport, which has been covered here before, so this overview will be relatively brief; see that article for more background. The purpose of this work is to allow a user-space process to create a "secret" memory area that is as inaccessible as possible outside of the process. Intended users include cryptographic libraries, which can use a secret area to hold cryptographic keys and keep them safe from prying eyes.

This functionality has, in recent revisions of the patch set, been moved into a separate system call:

    int memfd_secret(unsigned long flags);

The return value will be a file descriptor that can then be passed to mmap() to map an actual range of memory. For the most part, that memory will look (to the mapping process) like any other memory area, but there will be a couple of differences:

Pages of memory in this range will be removed from the kernel's direct map — the portion of the address space that lets the kernel access (almost) any physical page in the system. This makes it much harder for the kernel to access this memory, either intentionally or by way of an exploit.
If flags includes SECRETMEM_UNCACHED, then the memory will be mapped uncached if the underlying architecture supports it. Uncached memory will be far slower to access, but it is also immune to disclosure via many speculative-execution vulnerabilities.

Memory in a secret area is locked into RAM and unable to be swapped. As such, it is counted against the owning process's locked-memory limit.

One ongoing problem with features like this is that removal of pages from the kernel's direct map is an expensive operation. The direct map uses huge pages, minimizing its impact on the system's translation lookaside buffer (TLB). Removing random pages from the map breaks up those huge pages, significantly increasing TLB pressure. In order to minimize this impact, the memfd_secret() patch set maintains a separate cache of physically contiguous pages to use for this purpose.

The rate of change for this patch set has been slowing for some time, so it may be close to being ready for inclusion. One never knows for sure with memory-management patches, though, until the patches are actually applied.

KVM protected memory

While memory-disclosure vulnerabilities are unwanted on any system, the stakes are often higher on systems that are running virtualized guests. Such machines may be running workloads from unrelated groups that are unwilling to share their secrets with each other in ordinary circumstances; the possibility of sharing a physical system with a guest that is under the control of an attacker makes memory protection an even more urgent problem. As a way of hardening these systems, CPU vendors have been adding memory-encryption mechanisms that make guest memory inaccessible to the kernel and to other guests. These features have their own cost, though, and support in hardware is far from universal at this point.

Kirill Shutemov has drawn an interesting conclusion from these technologies, though: the fact that systems using them still work means that access to that memory from the kernel or the hypervisor is not actually needed most of the time. So he has put together a patch set that takes a fully software-based approach. Rather than encrypt guest memory, systems running this code just unmap it. Using this feature requires support on the part of both the kernel and the guest.

Specifically, a KVM hypercall is added that allows guests to request that their memory be made inaccessible. The host kernel will respond by removing any memory allocated to the guest from the direct map, taking away its own ability to access that memory. In user space the approach is a bit different: any memory belonging to the guest remains mapped but is marked with PROT_NONE protections, again making it inaccessible. This will affect processes like the QEMU emulator, which will lose direct access to guest memory. The lack of mappings will naturally impede attacks coming from other guests as well. Within the guest, the guest kernel controls memory permissions as usual.

The resulting isolation protects guest memory from unwanted access by way of vulnerabilities in components like the kernel or QEMU. It is not a complete protection, though; if the host kernel is compromised to the level of arbitrary code execution, it can remap the pages and pillage them at leisure. For the wide range of vulnerabilities that depend on getting the kernel to access a stray pointer — or speculative-execution vulnerabilities — though, this unmapping should significantly raise the bar for any exploit attempt.

Of course, there are times when the kernel must access memory within guests to perform normal kernel functions. A second hypercall has been added for guests to indicate which memory they need to open up to the host kernel; those ranges will be mapped back into the host kernel's address space. DMA buffers for virtualized devices are one example of the type of memory that a guest would want to share with the host kernel in this way.

This work looks interesting, but there are a number of loose ends that need to be tied down before it can be considered ready. Unlike memfd_secret(), this work has no mechanism for avoiding direct-map fragmentation as pages are removed; since the amount of memory involved is rather larger in this case, the fragmentation problems are likely to be that much more severe. Unmapped guest memory cannot be migrated, which defeats the kernel's mechanisms for defragmenting memory. That is likely to cause all sorts of problems over time; Shutemov has acknowledged that this problem will need to be fixed before the patches can be merged. It is also currently not possible to reboot a guest with protected memory; Shutemov has suggested that this case could just be declared "unsupported", an idea that has already drawn complaints in the discussion.

The length of this list of issues implies that the KVM protected memory work is not something that will be seen in the mainline kernel in the near future. Both of these patch sets are a likely indicator of the direction things are going, though. Sharing as much as possible may improve performance, but it seems increasingly clear that the associated security problems are anything but easy to address. Separating address spaces as much as possible looks like a relatively straightforward way to sidestep many of those problems.

Comments (7 posted)

Rejuvenating Autoconf

October 23, 2020

This article was contributed by Sumana Harihareswara

GNU Autoconf, a widely used build tool that shines at compatibility with a variety of Unixes, has accumulated many improvements since its last release in 2012 — and there are patches awaiting review. While many projects have switched to other build systems, interest in Autoconf remains. Now, a small team (disclaimer: including me) is rejuvenating it, working through some deferred maintenance and code review. A testable beta is now out, a new stable release is due in early November, and interested parties can build on this momentum to further refresh the rest of the GNU Build System (also known as Autotools).

A widely used default

GNU Autoconf is a tool for producing configure scripts for building, installing and packaging software on POSIX systems. It is a core component of the GNU Build System. When a user installs a software package on the command line by compiling it from source, they are often instructed to run:

    $ ./configure; make; make install

Those steps do the following:

configure: test system features with attention to portability concerns, prepare and generate appropriate files (including a makefile) and directories, etc.
make: use the makefile as instructions to build the package, performing any necessary compilation steps
make install: place the built binary and any other needed files into the appropriate location

configure is a portable shell script that must run on many platforms. Writing a configure script by hand can be tedious and difficult, so Autoconf helps automate this process. A software developer writes a configure.ac file specifying system features the software will need (e.g. "is the X Window System installed, and if so, where?"). Each test for a system feature is a macro written in the GNU M4 language. Autoconf comes with many macros that developers will likely need, and a library of add-on macros ("autoconf-archive") (source) provides dozens more.

Thus, in the base case, a programmer wanting to distribute code to be built with the GNU Build System needs to write only a bit of M4 in configure.ac, and would likely only need to use one or two additional macros from autoconf-archive. They do need to learn more M4 if they need configure to detect a system feature for which there is not an existing macro.

Autoconf has built-in support for various compiled languages: C, C++, Objective C, Objective C++, Fortran, Erlang, and Go. More crucially, it performs feature detection with knowledge of a wide variety of POSIX platforms. If you are building new software that has few arcane dependencies and your users are all on modern Linuxes plus FreeBSD, or if you want to make Ninja build files, perhaps you'd be better served using alternatives such as CMake, SCons, or Meson — and indeed many projects have switched away from the GNU Build System over the years, including GTK+ and KDE. Complaints that the GNU Build System is slow, complex, and hard to use have been aired (including in LWN's comment threads) for years. However, if your customers need to be able to build a shared library on Solaris, AIX, HP-UX, IRIX, and all the BSDs, then Autoconf will come in handy.

From 1991 to the present

Autoconf's founding in 1991 and immediate subsequent history is chronicled in its manual and in the book The GNU Project Build Tools. Its function in the 1990s and early 2000s was to smooth over differences among the proliferating Unix variants. Autoconf's last big change was the version jump from 2.13 to 2.50 in 2001, which broke many existing configure scripts and required several follow-up point releases. Version 2.50 extensively overhauled several components, including autoupdate, and changed cross-compilation defaults; it was such a disruptive release that some users are still using 2.13 so as not to have to port their old scripts.

However, in recent years, Autoconf's star has faded. Linux's ascendance has made it easier for developers to get away with ignoring portability among Unixes — and the GNU Build System's balky Windows integration doesn't help those who need to deliver software to all three major desktop operating systems. But older, more complex projects include legacy code that already depends on Autoconf; converting it would be risky and expensive. In addition, competing build systems don't cover all of the edge cases that Autoconf does.

The rise of languages that use their own package management (such as Python, Perl, Go, and Rust) means that developers writing single-language code bases can avoid system-level build tools entirely. On the other hand, if you're writing software that combines C++, Fortran, Python, Perl, and Erlang, the GNU Build System can make those multiple languages play well together. It is more language-independent than, say, setup.py, and you can use the built-in macros plus the autoconf-archive macros to say: "I need to be able to use the 2011 dialect of C++, and I need this particular Python module installed".

Users of the GNU Build System need stability, multi-language compilation, and cross-language compatibility, so the incremental improvements and bug fixes in post-2.50 versions of Autoconf have supported those goals. Autoconf's users have lived with version 2.69 since 2012; there have been no stable releases since then. However, development has not stopped; commits to the Git repository continued. Users also submit patches using the autoconf-patches mailing list and Savannah; by our estimation, as of mid-2020, there were hundreds of these patches awaiting review. (There are fewer now, but we'll get to that.) Maintainer Eric Blake had been aiming to make a release but hasn't had time; as he said in 2016: "The biggest time sink is digging through the mail archives to ensure that all posted patches that are still relevant have been applied".

Fresh momentum and work in progress

My involvement in Autoconf started when Keith Bostic emailed the autoconf mailing list in January, asking: "is there someone we could pay to do a new release of the autoconf toolset?" Zack Weinberg, an Autoconf contributor, forwarded the note to me.

Bostic was interested in Autoconf's future because one of his projects used it. He funded Weinberg and me to assess the work remaining; as we did that, we talked with Autoconf's maintainers (including Blake and Paul Eggert) and they agreed that we could do further release work. Then, starting a few months ago, Bostic — along with Bloomberg and the GNU Toolchain Fund of the FSF — has further funded our work so that we can work toward a 2.70 release in early November.

Weinberg released a testable beta version in July (even though this is a beta version of 2.70, the beta is labeled 2.69b) and a second beta, 2.69c, in September. We are now partway through our goals for this funded project:

Along with other users, we've started testing the upcoming release against real Autoconf scripts for complex projects, but haven't yet put it through its paces with Emacs, GCC, and CPython.
Since Autoconf has no continuous integration (CI) at present, we're going to set up proper CI system to find regressions, at least on Linux, probably at sourcehut.
We've gotten a fraction of the hundreds of disorganized patches and bug reports filed, so Autoconf contributors can prioritize and assess our backlog; unfortunately, we don't have enough time to organize even half of them.
We've reviewed several high-priority patches that downstream redistributors (such as Arch Linux and the Yocto Project) already carry and merged them into the mainline repository.
We've started working with existing maintainers, contributors, and users to get the project on a more sustainable path.

These activities, fortunately, have gathered more momentum with testing and review help from existing maintainers and contributors, plus new volunteers. And the new scrutiny and testing have also led to fixes in related tools, such as the portability library Gnulib.

Speedups, bugfixes, and stricter parsing

The 2.70 release notes/NEWS file, which is in progress at the time of this writing, discusses speedups, several small new features, and many bug fixes that will be in the new release. The bug fixes alone are an appealing reason to upgrade. For instance, configure scripts generated by the new Autoconf will work correctly with the current generation of C and C++ compilers, and their contents no longer depend on anything outside the source tree (this is important for build reproducibility).

2.70 does, unfortunately, include a few places where backward compatibility with 2.69 could not be preserved. In particular, Autoconf is now more strict about M4 quotation syntax (a perennial headache for Autoconf users) and some macros do not perform as many checks as they used to, which speeds up the configuration process but can break configure scripts that assumed that some shell variable was set without actually calling the macro that sets it. In addition, more configure scripts now require the helper scripts config.sub, config.guess, and install-sh. (See the release notes for the complete list.)

Maintainers of complex Autoconf scripts will find it well worth their time to test the beta releases and report any problems encountered to the Autoconf mailing list.

Beyond this release: future resilience

In October and early November, Weinberg and I will likely use up the last of the funding we received. We intend to solicit more funding and to get more corporate contributors to commit to helping with testing, code review, and so on. After all, a big open question is: who will commit to serving as release manager for Autoconf 2.71? It might make sense to schedule that release for around 12-18 months from now. After Autoconf 2.50, a steady stream of people reported problems that contributors fixed in the next several releases. If we have someone motivated to triage bugs and prioritize and review patches, it may make sense to do that again, especially since, after 2.70, there will almost certainly be new bug reports, including for bugs introduced by the release but not found during beta testing.

There's also an open question as to who will work to organize the multiple backlogs of patches and bug reports, so that maintainers can properly assess, prioritize, and delegate work. Even once we finish the work that we've already received funds to perform, there will still remain scores of patches languishing in the various mailing lists and/or in patch sets currently carried by distributions (such as OpenEmbedded and the BSDs) but not yet merged back into the mainline. Getting all of those into Savannah, or the new GNU forge when that shows up, would help contributors, as would proper CI on multiple operating systems and environments. Gathering all of the submitted patches into one forge will also help downstream distributors cherry-pick specific fixes to carry in between Autoconf releases.

Autoconf has only been able to revive itself because of the funding from our sponsors. Conversations in the coming months will reveal whether and to what extent they and other enterprise users want to invest to keep Autoconf on a stable footing. This is certainly not the only piece of old software that free software depends on as infrastructure and that has significant deferred maintenance that needs doing; there are closely related projects that could also stand to be revitalized. Automake is one example; Libtool could be deprecated, and have its features refactored into the faster and more integrated functionality in Automake.

Regardless, in this case, it has been gratifying to help break a bottleneck so that users of a widely used, even crucial part of the open-source ecology can benefit from eight years' worth of improvements — and get Autoconf in better shape to make future release cycles better too.

[I would like to thank Zack Weinberg for reviewing this article.]

Comments (165 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Briefs: Linux 5.10-rc1; Arm32 page tables; Fedora 33; Ubuntu 20.10; GDB 10.1; Quotes; ...
Announcements: Newsletters; conferences; security updates; kernel patches; ...

Next page: Brief items>>