Leading items

Welcome to the LWN.net Weekly Edition for November 14, 2024

This edition contains the following feature content:

Progress on toolchain security features: an update on compiler additions to improve the security of the kernel (and beyond).
Pondering systemd-homed for Fedora: distribution developers look at a different approach to the management of home directories.
The top open-source security events in 2024: an overview of this year's significant security incidents.
The trouble with struct sockaddr's fake flexible array: an ancient Unix interface gets in the way of kernel-hardening efforts.
Truly portable C applications: the Cosmopolitan Libc project seeks to build truly universal executables.
Back In Time back from the dead: a user-friendly backup tool is restored to full maintenance.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Progress on toolchain security features

By Jake Edge
November 12, 2024

LPC

Over the years, there has been steady progress in adding security features to compilers and other tools to assist with hardening the Linux kernel (and, of course, other programs). In something of a tradition in the toolchains track at the Linux Plumbers Conference, Kees Cook and Qing Zhao have led a session on that progress and further plans; this year, they were joined by Justin Stitt (YouTube video).

Rust

Cook said that he would begin by talking about Rust, rather than sprinkling it throughout the talk, as he has in the past. It seemed easier, he said, to handle all of the Rust information on a single slide (slides). It is important to maintain parity between the security features of the GCC, Clang, and Rust compilers in order to avoid cross-language attacks. To that end, the arm64 software-based shadow call stack for Rust is getting close to being ready to merge into the kernel. Kernel control-flow-integrity (CFI) support for Rust is also in progress; it is forward-edge protection against subverting indirect calls.

There are several things that have not really been looked at yet, including zeroing registers used in calls and making structure layout randomization work with Rust. Cook said that the __counted_by() attribute, which is used to provide bounds information for flexible arrays, needs some investigation to see how it interacts with Rust code. He thinks that the information provided by __counted_by() will already be represented in the Rust bindings, so there will not be a need for any explicit handling on the Rust side, but that should be confirmed.

While Rust has native arithmetic-overflow handling, there is still not parity with the behavior of the C code. When an overflow occurs, the undefined behavior sanitizer (UBSAN) gives a different result than Rust does; "it would be nice to have one result".

Counted by

Moving on, the "big news from the last year" was all of the work done for __counted_by(), which identifies a structure member that is tracking the size of a flexible array in the structure. Once the support for the attribute landed in GCC and Clang, annotations needed to be added to the kernel, which has been done for 391 structures. People are adding more of these all the time, he said, and he hopes that it is becoming the default for any new flexible arrays.

Something that was "kind of a footnote" in last year's update (YouTube video) has been fixed in both GCC and Clang. Due to some odd differences between the C language specification and the compilers, unions could contain flexible arrays, as long as they specified [0] for the size, because of GCC and Clang extensions to handle that case. "Sort of accidentally", flexible arrays are not supposed to be allowed in unions at all, according to the specification, though. Meanwhile, removing the zero from the size specification (leaving just []), in order to modernize the code, has been part of the flexible-array cleanup work, but doing so in unions would not compile because of the specification. Now, modern flexible-array declarations will be accepted by the extensions, which "will simplify some really horrible hacks" in the kernel to work around the problem.

Next up was a slide showing the progress on getting flags for setting the stack-protector-guard location added to GCC and Clang for five different architectures; this will allow having different stack canaries for each process. He had planned to talk about the lack of progress for Clang on the RISC-V and PowerPC architectures over the last four years, as all of the other architecture-compiler combinations have been completed, but he had "accidentally nerd-sniped" Keith Packard into fixing those holes, so that work is now in progress. "My goal is to not have to show this slide next year", he said with a chuckle.

Control-flow integrity

There is work needed on the forward-edge CFI support, since no progress has been made on that over the last year. The support for hardware CFI protection is basically done, and has been for a while, but the more fine-grained, per-function-prototype software-based CFI protection needs to be added to GCC. Zhao said that she had talked with some RISC-V developers at GNU Tools Cauldron who are working on GCC support, starting from the arm64 patch set that Cook had mentioned; some of that work may be applicable beyond RISC-V. Packard suggested that the arm32 pointer authentication code (PAC) extension could be used for CFI protection on that architecture; Cook was not opposed, but arm32 is not a major area of focus for him.

Similarly, backward-edge CFI has stalled since last year. The hardware support exists for both x86 and arm64. Getting x86 hardware support for shadow call stacks to be used internally by the kernel looks difficult, though Peter Zijlstra said that the Intel Flexible Return and Event Delivery (FRED) feature might provide a mechanism for that. So far, Cook said, there has been no work on creating a software-based hash-checking scheme for backward-edge CFI, similar to KCFI that is used for forward-edge protection.

Pointers and bounds

There is work needed in order to add __counted_by() for general pointer values; both GCC and Clang have started working on support. It is somewhat related to the -fbounds-safety proposal from Apple that he would go into more detail about later in the session.

Zhao came forward to talk about the work being done on pointer bounds in GCC. There are two main cases, pointers inside structures, where the bound is contained in another structure field, and pointers passed as arguments, where the bound is also passed as an argument. The second case is already handled in GCC by the access attribute, so GCC developers will focus on the first case. She has discussed adding __counted_by() for pointers in structures with the GCC maintainers, who are amenable to that approach. Nick Alcock said that access is not currently used to generate warnings for exceeding the bounds, however; it is, instead, "a promise to the optimizer" that the bounds will not be exceeded, which is somewhat different. Zhao agreed that a different attribute might be needed for checking the bounds on pointer arguments.

Cook returned to talk about a problem that has been found as the __counted_by() annotations have been added to the kernel. In many cases, the structures holding a flexible array are allocated at run time, but there is a need to ensure that the field being used for bounds checking gets initialized at the same time that the allocation is done. He does not like to have manually repeated information that must be kept in sync in two places in the kernel, so he is working with the GCC and Clang developers to get another to extension to __counted_by() so that the allocators can set the counter without knowing which field, if any, is being used for bounds.

The __builtin_counted_by_ref() intrinsic function will return a pointer to the field used in __counted_by() or NULL if there is no bounds-checking annotation. That allows wrappers to be written for allocators that will initialize the count if there is one at the time the allocation is done. So the __counted_by() annotations can be made "without also having to go and check all of the allocation sites to make sure that the counter has been set before we are accessing the array". That means a wrapper can be used for all structures that have flexible arrays and when __counted_by() gets added to the structure, "it magically gets the size added as well"; both GCC and Clang are working on adding the feature.

Zhao said that the GCC developers have been discussing the return type for __builtin_counted_by_ref(), in particular for the case where it returns NULL because there is no __counted_by() information. Originally the idea was to return a NULL size_t pointer, but it was decided that a (void *)NULL would ease the use of the feature, since some counting fields may not be size_t. That has implications for the example wrapper that he showed, Cook said, because the void pointer cannot be used to set the count when there is a __counted_by() attribute. So "a little bit more trickery" needs to be added, which has been done, and works; it "generates better code but it's a little ugly to read".

The Apple -fbounds-safety proposal that Cook mentioned earlier has "a huge number of things covering all aspects of gaps in C's bounds safety", he said, including annotations for arrays that are counted by elements or bytes, as well as those terminated by some constant (e.g. NUL-terminated strings), and more. The proposal came up when the request was made to the Clang developers for a way to annotate flexible arrays. It is much more ambitious than just that; Cook thinks the kernel and GCC will want to adopt more of those annotations, but he has been focusing on the low-hanging fruit.

There is also work needed to clarify the -Warray-bounds warnings issued by GCC, so that the flag can be used in kernel builds. The kernel "unintentionally constructs code that the compiler sees as obviously incorrect", but the warnings do not really help clarify what the problem is. Zhao said that due to GCC value tracking (which is done for optimization purposes), the array-bounds checking can find real problems in the code, but the current diagnostics make them look like false positives. She has done some work to make the warnings more understandable for developers.

Arithmetic overflow

There is a question about what to do for unexpected arithmetic wraparound (or overflow) in the kernel, Stitt said. If you filter CVEs based on overflow and wraparound problems, there are multiple entries for the kernel; "if we could turn on the overflow sanitizers, then we're increasing protection". But the kernel essentially makes signed integer overflow into defined behavior with the -fno-strict-overflow flag, so UBSAN does not flag overflows. In Clang 19, the signed-integer-overflow sanitizer has been changed so that it works with -fwrapv (which is enabled by -fno-strict-overflow), though Zijlstra called that a bug. Stitt said that one could argue that it is a bug, since overflow is not an undefined behavior for the kernel, but the change was made with an eye toward detecting unexpected overflows, which often cause problems.

The kernel has "what I'll call 'overflow-dependent code patterns'", such as code that is explicitly checking for overflow (e.g. a + b < a) in order to handle it in some fashion. The sanitizer will complain about that, even though it is not a real problem, so there are "idiom exclusions" that have been added to Clang 19 to ignore certain types of code. For example, -1UL "will always overflow", so it would cause a complaint, making the sanitizer useless for the kernel; the Clang overflow pattern exclusions will tell the sanitizer to step out of the way for three specific patterns.

Zijlstra said that he would like to see a qualifier akin to const or volatile that could be attached to variables that should not wrap. Stitt said he agreed but that the compiler developers are moving away from that solution. Cook noted that he and Zijlstra have a fundamental disagreement on how to get coverage for unexpected overflow; he thinks "we have to mark the expected places where we're wrapping so that all the rest will get caught", though Zijlstra disagrees with that approach.

One way to proceed that works around the (somewhat conflicting) objections from compiler and kernel developers would be to treat certain types, size_t for example, differently in the sanitizer. Zijlstra did not like turning certain types into "magic" types; having a qualifier would allow specifying that behavior more widely. Cook would like to try to make some progress given the current constraints and noted that the type filtering is not mutually exclusive with adding a qualifier later once the usefulness of the feature can be shown. Stitt pointed out that the Clang feature (possibly slated for Clang 20) is not hard-coded in the compiler; it will, instead, be some kind of configuration for the build system. So it will not appear at the source-code level, but will still be controlled by the kernel community.

Track organizer José Marchesi noted that the unexpected overflow items were listed as "needed" for GCC, so he wondered if they had all been added to the project's Bugzilla. Cook said that those items had not been added due to "some existing fundamental disagreements about the word 'undefined', because 'undefined' has a very well defined meaning", he said to chuckles. Some of that needs to be resolved before progress can be made for GCC. He would also like to get some proof of the feature's usefulness, which will also help smooth the path.

Stitt said that GCC still needs an unsigned-integer-overflow sanitizer. Zhao said that she has raised the overflow/sanitizer problems in the GCC community, so the developers are aware of the issues. The major problem is with the idiom exclusions; she talked about it at Cauldron and there was a lot of resistance to the feature. She agrees: "I don't like it, I think it's a hack". She has some ideas for other ways to approach the problem, but needs to think some more on them. Cook closed the session by noting that the purpose of these sessions is to get the discussion going; he does not claim that the right solutions have been found, but hopes to make some progress on addressing the known problem areas.

[ I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Vienna for the Linux Plumbers Conference. ]

Comments (23 posted)

Pondering systemd-homed for Fedora

By Joe Brockmeier
November 8, 2024

Fedora Linux, as a rule, handles version upgrades reasonably well. However, there are times when users may want to do a fresh installation rather than an upgrade but preserve existing users and data under /home. This is a scenario that the Fedora installer, currently, does not address. Users can maintain a separate /home partition, of course, but the installer does not incorporate existing users into the new install—that is an exercise left to the user to handle. One solution might be to use systemd-homed, a systemd service for managing users and home directories. However, a discussion proposing the use systemd-homed as part of Fedora installation uncovered some hurdles, such as trying to blend its approach to managing users with tools that centralize user management.

systemd-homed

Systemd-homed was introduced in the systemd v245 release in 2020, and is available in Fedora and other Linux distributions today, but is not integrated with the installer or other system-management tools. (User "richiedaze" supplied a brief guide to using it with Fedora last January on the Fedora Discussion forum.)

As the name suggests, it is a systemd service that is designed for managing the home directories of regular users (as opposed to accounts created for system services and the like), provide built-in encryption, and make home directories portable between systems.

Typically, regular users on Linux desktop systems like Fedora Linux are created with a utility such as useradd, and information about the user's shell, home directory, user ID (UID), and group ID (GID) are stored in /etc/passwd. That scheme has been around with little modification for decades, and it doesn't necessarily map well with the way that many Linux users work with their systems today. In addition to the upgrade use case, users might want to (for example) keep their home directory on an external drive and use it on more than one system. That is, of course, possible today—but it can be complicated with regard to managing UIDs/GIDs and such. (The systemd-homed man page explains how to migrate home directories to new hosts.)

With systemd-homed, users are created and managed using the homectl command, separately from standard Linux user accounts. When a user is created using "homectl create" there will be an identity file for the user, with its public and private key pair, that is stored in /var/lib/systemd/home/user.identity rather than creating an account in /etc/passwd. The homectl utility will also create the user's home directory using the storage mechanism specified with --storage, along with an ~/.identity file that follows the extensible JSON user records format. This allows the user record to store much more information than the standard /etc/passwd file, such as a user's public SSH key to allow login via SSH without access to the user's home directory.

Systemd-homed supports four storage mechanisms for home directories; a plain directory or Btrfs subvolume, fscrypt‑encrypted directory, LUKS2 volume, or home directory mounted via CIFS. With the LUKS2 and fscrypt storage mechanisms, a user's home directory is only decrypted and accessible when the user is logged in. The CIFS and plain directory storage mechanisms still provide the portability benefits of systemd-homed, but without the same security benefits. However, some users are already happy with using full-disk encryption, but would simply like to get the portability benefits.

Exploring systemd-homed for Fedora

Zbigniew Jędrzejewski-Szmek is one of them. He recently started a discussion on the fedora-devel mailing list about the complexity of retaining user directories after a reinstall. It is possible but "only with some manual tinkering", and even then the existing user information is not preserved in /etc/passwd. He suggested that systemd-homed could be a solution, with a little extra work, and asked if it would be something worth looking into.

Neal Gompa pointed out that systemd-homed use had been explored before, but it was found to be incompatible with centralized login systems such as SSSD, FreeIPA, and Active Directory (AD). "If this has changed, then it would make sense to revisit." This led to a discussion of standard practices for managing user authentication and information.

Gompa observed that, historically, systems either had local users with local storage, or centrally-managed users with remote storage. That no longer holds true: laptops issued to employees by organizations often have user accounts where everything is stored locally, but user login is managed centrally. Jędrzejewski-Szmek said that Fedora should simply continue to use normal local users for that use case, "there would be no benefit from somehow shoehorning remotely-defined users" into systemd-homed. But Gompa wanted an all-in approach, and Chris Murphy said that it seemed like duplicated effort to implement local encrypted user data for centralized authentication systems when it was a feature of systemd-homed.

Merging local and centralized user management would require systemd-homed to be something it isn't, Lennart Poettering said. "It's about locally managed stuff, not about networked accounts." He added that there is a philosophical gap between how he thought home directories should be managed versus the way it's done by FreeIPA, SSSD, or AD:

One of the reason homed exists is to break with the concept of trying to manage UIDs organization‑wise. And there's a lot more, for example it's fundamental to homed that access to home dirs is unavailable unless the user is logged in, and authentication keys for the data itself are provided. That breaks with fundamental assumptions baked into much UNIX software, which however is stuff the classic centralized enterprise world really cares about.

So yes, from a distanced view you might think: both ipa and homed manage home dirs, so why not make that one. But conceptually, philosophically the two things are *so* different I really don't think one should attempt to make them one thing.

Systemd does have other components (systemd-logind, and systemd-userdbd) for managing users and sessions that could be used with SSSD and FreeIPA to provide the same features that systemd-homed has, but there has been little interest in that, he said.

However, he followed up later to say that he was interested in having support for "Chromebook-like behavior" where users would authenticate with OAuth 2.0 or OpenID Connect and automatically generate a home directory. That would be a bigger project, though, and there were more pressing issues to tackle before that.

A substantial part of the conversation focused on managing UIDs between various providers of user accounts. Specifically, the cases where UIDs/GIDs may overlap if a system has users managed by traditional Linux tooling, systemd-homed, and FreeIPA or another centralized provider. Everything in the range 1000 to 65533, and from 65536 to 4294967294, should be available to regular users. (With 65534 being reserved for the "nobody" user.)

However, Poettering complained that FreeIPA takes most of the available range and "leaves no space for anyone else in the 16bit range really". Simo Sorce said that this was incorrect:

FreeIPA has a fixed range it picks from, but to allow *multiple* domains to interoperate it picks a subrange from that big one fixed range, which is high up in the "millions* (I forget the exact range but I think 1M-2M).

He acknowledged that FreeIPA could use the range below 65K, for compatibility reasons such as older NFS and Unix systems that were limited to 65535 IDs.

Sorce also had questions about potential security issues with systemd-homed. For example, he wondered if plugging in a disk with a home directory managed by systemd-homed would "suddenly allow a stranger to just login into the machine?" or if there could be UID and GID conflicts. The answer to the first question is no, a user has to be enabled on a system by an administrator before that user can log in. Poettering replied about the potential for UID/GID conflicts, and said that the files on disk were owned by the "nobody" user and then dynamically mapped to the right UID/GID for the local machine using ID mapping.

Ultimately, Sorce suggested that systemd-homed needed a broader security analysis and buy-in from the Anaconda developers before any changes could be adopted.

Other methods?

David Cantrell said that he had always split the /home directory from the rest of the system, and would rather see Fedora's installer (Anaconda) modified to account for existing users than to adopt systemd-homed. That is a possible approach, Jędrzejewski-Szmek said:

But I think it's actually quite complicated to make this work reliably. Traditional UNIX accounts spread the information about the user over a bunch of files. Consistency must be maintained, UIDs and GIDs on disk must match, etc. We _could_ add the smarts to cover all that in Anaconda, but Anaconda developers are trying to simplify it, not add new complicated code.

OTOH, homed was created with the idea of self-contained "homes" from the beginning, and systemd upstream is dedicating resources to make it work. (E.g., currently, a full-time developer working on integration of systemd-homed and GNOME on a grant from [the] German [Sovereign Tech Fund].) So I think it's much more maintainable to just make use of this and let systemd upstream help with any bugs that we discover.

The discussion, at least for now, has tapered off without any concrete plans for using systemd-homed for retaining user directories during a reinstall as an official Fedora feature. Given the amount of work that would be required, and the fact that few users seem to be clamoring for portable home directories, it may be some time before we see any progress in that area.

Systemd-homed has a lot of interesting features, but Fedora developers don't seem to be sold on doing the lifting required to adopt it for now. It is, of course, still possible to use systemd-homed or some other custom solution (as many of us have for years) for retaining and migrating home directories, but it would be nice addition to Fedora if it ever arrives.

Comments (16 posted)

The top open-source security events in 2024

By Jonathan Corbet
November 11, 2024

OSS Japan

What have been the most significant security-related incidents for the open-source community in 2024 (so far)? Marta Rybczyńska recently ran a poll and got some interesting results. At the 2024 Open Source Summit Japan, she presented those results along with some commentary of her own. The events in question are unlikely to be a surprise to LWN readers, but the overall picture that was presented was worth a look.

Fun with CVE numbers

A relatively low-scoring (but still significant) episode in her poll had to do with the handling of CVE numbers. There has never been, she began, a machine-readable database of CVE numbers until earlier this year; instead, they have been managed as free-form text. She cited CVE-2009-1377 as an example, the entry for which contains a single sentence describing the vulnerability. There is no easy way to extract the relevant information — the vulnerable package, version numbers, the nature or severity of the vulnerability, etc. — from this entry. The National Vulnerability Database (NVD) was created in response to this problem; it was designed to absorb CVE data with additional metadata. Compare, for example, the NVD entry for CVE-2009-1377 with the original. The NVD database is now heavily used by vulnerability scanners and other types of security software.

The "NVD crisis" hit in February of this year, she said, when the NVD suddenly stopped adding new entries; it is still not clear why that happened. CVE numbers were still being assigned, but they were not making their way into the NVD. The process restarted a few months later, but the addition of NVD entries remains slow, and the backlog is large. This has created problems for software that relies on NVD data.

On the CVE side, the assignment of numbers is proceeding more quickly than ever. A number of prominent projects, including curl, the Linux kernel, and the WikiMedia Foundation, have set themselves up as CVE numbering authorities (CNAs) — but they are late to the party, since most large projects had already made that move. Some of these projects, such as the kernel, are issuing a lot of CVE numbers. There is a new JSON-based format for CVE entries that is being rolled out now; it should help with the automated processing of CVE numbers in the future.

As a response to the NVD crisis, security developers have been asking for the creation of an open security database that is not dependent on any single-vendor solution, she said. But this database will only be feasible to create and maintain if all CVE entries are machine readable. There is new legislation that will force automated processing of vulnerability information, and the increasing flow of CVE numbers from the new CNAs is creating scalability problems. Whether a new vulnerability database should be created or an existing one improved is an open question. There are various databases out there, including OSV and the GitHub Advisory Database, but they are single-vendor solutions. Another open question is whether vulnerability databases with a regional or national focus are needed.

Meanwhile, the NVD backlog is still high, and the CVE program is still working to get the CNAs to properly encode their entries. There is a full set of CVE data available from GitHub in JSON format, but it is a read-only database. Nobody can submit a pull request to fix or improve an entry; instead, all such changes have to go through the appropriate CNA.

Trends

Turning to trends that have made themselves felt in 2024, Rybczyńska mentioned the work to enable the use of Rust for writing kernel code. The kernel is not the only place where that sort of change is happening, though. Agencies like the US Cybersecurity and Infrastructure Security Agency have been pushing hard in that direction. The Android project has been moving toward Rust since 2019, she said, and that has already resulted in a significant reduction in bugs.

Meanwhile, compilers for languages like C and C++ are gaining new warnings that are intended to head off vulnerabilities. It has reached the point, she said, where it is difficult to write a buffer-overflow vulnerability in C without generating a warning. The static analyzer in the GCC 14 release has also gained the ability to detect and illustrate a number of types of bugs, including buffer overflows and infinite loops.

On the legislative front, the big news is the European Cyber Resilience Act (CRA), which adds mandatory security requirements for all products sold there. By default, she said, vendors must perform a self-assessment of their compliance with those requirements, though products deemed important or critical require a higher degree of scrutiny. Vendors must offer security updates, free of charge, for a minimum support period of five years. They are required to fix vulnerabilities, including those introduced by dependencies incorporated into their products. There is also a requirement to exercise due diligence with incorporated software and to report security incidents.

The CRA will likely be published in November, she said, and will go into full force three years later. The effect of this legislation on open-source projects has been the subject of a lot of conversation; that has resulted in numerous modifications to the CRA over time. There are protections in place for contributors to projects; the obligations land on those who monetize a project rather than those who contribute patches to it. "Stewards" that support open-source projects are also protected, but they are required to have security-related processes in place.

Open-source software, she said, is now far too big to be overlooked by legislators. This can be seen in areas beyond the CRA; she mentioned the recent removal of some Russian kernel maintainers as an example. The CRA is one of the biggest examples, though; it is "a big deal", and three years is not a long time to prepare for it.

There have been some initiatives for the funding of open-source security work, she said. The OpenSSF Alpha-Omega project and Sovereign Tech Agency are a couple of prominent examples; the latter has been explicitly directing grants toward maintainers. The economic climate is becoming more difficult, though, and it is not entirely clear that those resources will continue to be available.

With regard to software bills of materials (SBOMs), she noted that SPDX 3.0 was released in April; it is a complete rewrite of that standard that includes vulnerability reporting. There was a new release of the CycloneDX standard as well. The generation of SBOMs is growing, she said, but the actual use of that data is lagging behind.

Other minor incidents

There were a couple of specific vulnerabilities that drew attention over the year as well. The XZ backdoor attempt was one of those; it was a two-year effort to insert malicious code into a piece of little-known (but omnipresent) software. This attack nearly succeeded; like the Log4j vulnerability, it highlights the risks that come with single-maintainer projects. In both cases, the problem arose in a project that does not normally look like a security risk. And it raises the question of just how we can develop and maintain trust in the developers who write and maintain our software.

XZ was not the most significant security event of the year, though, according to her poll; that was, instead, the CrowdStrike incident. But, she asked, who cares since it only affected Windows? There is a Linux version of CrowdStrike's software that didn't break; perhaps that is a result of a better architecture (including the use of BPF rather than kernel modules) on the Linux side. But it is also a matter of luck, which was in our favor this time around.

Had the XZ attack been detected later, she concluded, it would have resulted in a backdoor that affected something like half of the SSH servers on the net. That is easily an incident on the same scale as the CrowdStrike fiasco — or worse. The lesson to take away from these events is that a security failure of that magnitude can happen to the open-source community as well.

A video of this talk is available on YouTube.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our travel to this event.]

Comments (10 posted)

The trouble with struct sockaddr's fake flexible array

By Jonathan Corbet
November 7, 2024

Flexible arrays — arrays that are declared as the final member of a structure and which have a size determined at run time — have long drawn the attention of developers seeking to harden the kernel against buffer-overflow vulnerabilities. These arrays have reliably been a source of bugs, so anything that can be done to ensure that operations on them stay within bounds is a welcome improvement. While many improvements, including the recent counted-by work, have been made, one of the most difficult cases remains. Now, however, developers who are interested in using recent compiler bounds-checking features are trying to get a handle on struct sockaddr.

The many faces of `struct sockaddr`

The sockaddr structure dates back to the beginning of the BSD socket API; it is used to hold an address corresponding to one side of a network connection. The 4.2 BSD networking implementation notes from 1983 give its format as:

    struct sockaddr {
        short sa_family;
	char sa_data[14];
    };

The sa_family field describes which address family is in use — AF_INET for an IPv4 address, for example. sa_data holds the actual address, the format of which will vary depending on the family. The implementation notes say that: "the size of the data field, 14 bytes, was selected based on a study of current address formats". In other words, 14 bytes — much longer than the four needed for an IPv4 address — should really be enough for anybody.

Need it be said that 14 bytes turned out not to be enough? As new protocols came along, they brought address formats that were longer than would fit in sa_data. But the sockaddr structure was firmly set in stone as user-space API and could not be changed. It appears in essentially the same form in any modern Unix-like system; on Linux the type of sa_family is now sa_family_t, but otherwise the structure is the same.

The result was one of the (many) historic kludges of the Unix API. New protocol implementations typically brought with them a variant of struct sockaddr that was suitably sized for the addresses in use; struct sockaddr_in6 for IPv6 addresses, for example, or struct sockaddr_ax25 for AX.25. All of the socket API interfaces still specified struct sockaddr, but implementations on both sides would use the appropriate structure for the protocol in use. Code on both sides of the API would cast pointers to and from struct sockaddr as needed.

Even now, the documented APIs for system calls like connect() and library functions like getaddrinfo() use struct sockaddr. As a result, both user-space programs and the kernel contain a whole set of casts between that type and the type they are (hopefully) actually using. Needless to say, these casts can be error prone; casting a pointer between different structure types is also deemed to be undefined behavior in current C. But that's the price we pay for API compatibility.

The advent of IPv6 also brought another type: struct sockaddr_storage; it is defined as starting with the same sa_family field, but being large enough to hold any of the other sockaddr variants. Code dealing with network addresses can allocate a structure of this type and be sure of having enough space to store any address. This structure is now what is often allocated, but it never appears explicitly in the system-call interface.

Making the flexible array explicit

The C language has accumulated a few idioms for the declaration of flexible arrays over the years; specifying a dimension of zero or one are both common (though deprecated) examples. The syntax blessed by the language standard, though, is to omit the dimension entirely:

    struct something {
        /* ... */
	int flex_member[]; /* A flexible array */
    };

This syntax makes it clear that a flexible array is in use and that the type declaration cannot be used, on its own, to check for overflows of that array. In no convention is it deemed reasonable to use a dimension of 14 for a flexible array, but that is exactly what now happens with struct sockaddr. The actual length of sa_data is not known, and has a good chance of being larger than the declared size. It is a flexible array disguised as an ordinary array.

That usage complicates checking of struct sockaddr usage for overflows, but the effects go beyond that; it makes detection of flexible arrays harder across the kernel. As Kees Cook noted in this 2022 patch:

One of the worst offenders of "fake flexible arrays" is struct sockaddr, as it is the classic example of why GCC and Clang have been traditionally forced to treat all trailing arrays as fake flexible arrays: in the distant misty past, sa_data became too small, and code started just treating it as a flexible array, even though it was fixed-size.

As long as this usage remains, the checking tools built into both compilers must treat any trailing array in a structure as if it were flexible; that can disable overflow checking on that array entirely.

It would be nice to change this usage but, as was noted above, the layout of struct sockaddr is wired deeply into the socket interface and cannot be changed without breaking applications. But that doesn't mean that the kernel must treat sa_data as anything but a flexible array. To enable that without changing the binary interface, Cook redefined struct sockaddr within the kernel to:

    struct sockaddr {
        sa_family_t	sa_family;
	union {
	    char sa_data_min[14];
	    DECLARE_FLEX_ARRAY(char, sa_data);
	};
    };

(The DECLARE_FLEX_ARRAY() macro jumps through some hoops needed to declare a flexible array within a union). This change made it clear that sa_data is a flexible array, which helped, in turn, in the goal of allowing the compilers to treat trailing arrays as non-flexible unless they are explicitly declared as such.

This patch was merged for the 6.2 release, and all seemed to be well. But, as Gustavo A. R. Silva pointed out in this patch series, there is a problem with this approach. There are many places in the kernel where struct sockaddr is embedded within another structure, usually not at the end. That has the result of placing a flexible array in the middle of the embedding structure, which is problematic for fairly obvious reasons; the compiler no longer knows what the offsets to the members after struct sockaddr should be. That has resulted in "thousands of warnings" when the suitable check is enabled in the compiler.

Silva's solution was to introduce yet another variant with a familiar form:

    struct sockaddr_legacy {
        sa_family_t	sa_family;
	char		sa_data[14];
    };

This structure, which lacks the flexible-array member, was then embedded in the other structures, making the warning go away. Since the embedding cases did not use sa_data as a flexible array (otherwise things would never have worked in the first place), this change was deemed safe to make.

Networking maintainer Jakub Kicinski was not convinced about this change, though. He suggested that perhaps Cook's patch should be reverted instead, and a new type should be added for places where a flexible array is actually needed. Cook acknowledged this suggestion as "a pretty reasonable position" and started to ponder on alternatives. He concluded: "Now, if we want to get to a place with the least ambiguity, we need to abolish sockaddr from the kernel internally, and I think that might take a while."

Leaving `struct sockaddr` behind

In early November, Cook returned with a brief patch series meant to show what that approach would look like. It begins by reverting the 2022 patch, returning struct sockaddr to its original non-flexible form. There is a patch adding comments to places in the networking code that are known to use this structure within its original bounds; they do not need to be changed, and do not need sa_data to be flexible. But that still leaves many uses of struct sockaddr where the data area may, in reality, be larger than 14 bytes.

The solution for many of those places is just to use struct sockaddr_storage instead. Indeed, parts of the network stack already use that structure, but then cast pointers to struct sockaddr for functions that expect that type. One example is inet_addr_is_any(), which takes a struct sockaddr * argument, but is only called by functions using struct sockaddr_storage. In this case, the solution is to change the prototype of the function to match what is really being passed to it and remove the casts from the callers.

Some changes will require more churn, even if they are conceptually simple. The getname() callback (in the proto_ops structure) has long expected a pointer to a sockaddr_storage structure, but its prototype was never changed to match. The patch eliminating the use of struct sockaddr for getname() mostly consists of name changes and cast removal, but it touches 66 files. It also, as Cook noted in the cover letter, is still lying to the compiler in cases where the backing structure is actually smaller than struct sockaddr_storage, "these remain just as safe as they used to be. :)"

This series shows that truly eliminating the use of this structure's sa_data field as a flexible array in disguise will involve a fair amount of work and code churn. Even so, Kicinski commented that it "feels like the right direction". So, while struct sockaddr will likely remain part of the kernel's system-call API forever, its use within the kernel can be expected to fade away over time. A design miscalculation made over 40 years ago may finally stop impeding the use of modern memory-safety tools.

Comments (40 posted)

Truly portable C applications

By Daroc Alden
November 13, 2024

Note: This topic was chosen based on the technical merit of the project before we were aware of its author's political views and controversies. Our coverage of technical projects is never an endorsement of the developers' political views. The moderation of comments here is not meant to defend, or defame, anybody, but is in keeping with our longstanding policy against personal attacks. We could certainly have handled both topic selection and moderation better, and will endeavor to do so going forward.

Programming language polyglots are files that are valid programs in multiple languages, and do different things in each. While polyglots are normally nothing more than a curiosity, the Cosmopolitan Libc project has been trying to put them to a novel use: producing native, multi-platform binaries that run directly on several operating systems and architectures. There are still some rough edges with the project's approach, but it is generally possible to build C programs into a polyglot format with with minimal tweaking.

Actually portable executables

Justine Tunney, the creator of the project, calls the polyglot format she put together "αcτµαlly pδrταblε εxεcµταblεs" (APEs). Every program compiled to this format starts with a header that can simultaneously be interpreted as a shell script, a BIOS boot sector, and macOS or Windows executables. This lets APEs run across Linux, macOS, FreeBSD, OpenBSD, NetBSD, Windows, and bare metal on both x86_64 and Arm64 chips. When interpreted as a shell script, the program detects which architecture and operating system it is running on, and (by default) overwrites the first few bytes of the program on disk with an ELF header pointing to the code for that architecture and system, before re-executing itself. This does mean that the binary is no longer portable, but it can be restored by overwriting the ELF header again.

By building separate versions of the program for different architectures and then combining them with this polyglot trick, the project promises portability between different architectures without the overhead of an emulator or bytecode virtual machine. This approach does have downsides — APEs are larger than normal C binaries, although still smaller than many produced by other languages — but for projects where the portability is a benefit, the tradeoff may be worth it.

Still, at the cost of increased compile time and binary size, APEs nearly hold up their promise of running across all major operating systems. There are some systems where they don't run correctly. The biggest source of problems is, ironically, one of Linux's tools for handling foreign binaries: binfmt_misc. In typical configurations, the Linux kernel knows how to run two kinds of binaries: ELF files and shell scripts. The binfmt_misc mechanism lets the user register additional types of program to be run with special helpers. When correctly configured, binfmt_misc can let users transparently run programs for other architectures under QEMU, or run Windows programs under Wine — which is precisely the problem. On Linux systems configured like that, APE programs get run under Wine, instead of running natively.

Luckily, the workaround is quite simple: just add a more specific binfmt_misc rule for APE binaries. This also has the advantage of saving the extra startup latency of first invoking the program as a shell script and then again as a binary. Another problem for APE programs is that some older shells (such as Zsh before version 5.9, released in 2022) don't handle binary data being part of shell scripts well. Because of these problems, the documentation actually recommends installing a binfmt_misc handler, even though it is not strictly necessary on most systems.

The tooling

Producing programs in APE format is relatively straightforward. Stock GCC and Clang can both be configured to do so, with the appropriate headers and linker scripts. But there are a number of configuration flags to get right, so the Cosmopolitan Libc project has also created cosmocc, a wrapper around both GCC and Clang that takes care of building for multiple architectures and linking them together into an APE.

Cosmocc is, itself, an APE. So interested readers can download it and experiment with compiling their own software. Like all APEs, it is statically linked, and so can just be unpacked and run in place. Most C programs should build relatively painlessly under cosmocc. One potential stumbling block is that any dependencies also need to be built for multiple architectures, so relying on distribution packages is unlikely to work.

The Cosmopolitan Libc project does include several common dependencies, such as ncurses, zstd, zlib, SQLite, and others in its repository. So if a project only uses common third-party dependencies, it is usually fairly straightforward to build the necessary libraries. For projects that use GNU Autotools, the superconfigure project provides tools for integrating Cosmopolitan Libc into the build process.

One obvious question when building programs that should run across multiple operating systems is how to translate system calls between platforms. Unfortunately, a complete solution is simply not possible — some operating systems support operations that just aren't available on others — but Cosmopolitan Libc does a good job of covering the common operations. The documentation lists exactly which functions are available on which platforms. Generally, most common POSIX operations are usable without issues. Any more recent, specialized, or obscure APIs may require the use of the C preprocessor to select the right implementation. That is, of course, the normal state of affairs for portable software. But at least with Cosmopolitan's polyfills, programs can focus on only doing that for the small portion of functions where it is truly necessary. These functions are where Cosmopolitan Libc gets the "Libc" portion of its name from; while the tooling does support linking programs with the GNU C library, musl libc, or any other local C library, the usual case is to link to Cosmopolitan's Libc.

I attempted to build several pieces of software (GNU Bash, TCC, and some software of my own) with the Cosmopolitan toolchain in the course of writing this article. Overall, it was remarkably straightforward. I ran into a few confusing compiler errors until I found the correct set of configuration flags, but after configuring the projects to expect static linking and to use cosmocc and its associated linker, everything did work as promised.

Additional facilities

The fact that Cosmopolitan has to be involved at every step of the building and linking process in order to produce APE files allows it to pull a few other tricks, as well. In addition to being an executable and a shell script, every APE file is also a Zip archive. This is due to the fact that Zip archives store their directory information at the end of the file, so the information can be appended to the APE binary without interfering with the already-overloaded first few bytes of the format.

A few of Tunney's other projects take advantage of this, including redbean, llamafile, and others. Redbean is a web server that serves dynamic web sites made using Lua and SQLite from its own Zip archive; the project lets programmers distribute these web sites as self-contained portable executables, suitable to be run by end users. Llamafile, which LWN covered earlier this year, is a project that packages LLM weights alongside the code required to run them, for ease of distribution and archiving.

Other projects have used the ability to treat APE executables as Zip archives to bundle data files like a copy of the timezone database as a sort of super-static-linking that ensures the program always has the resources that it needs to run. Data files aren't the only thing required to be distributed alongside programs, however — many licenses also require distributing copies of the license, or copyright attributions.

Cosmopolitan Libc's tooling automates compliance with those licenses by making sure that licenses and attributions for all of the Libc code or (supported) third-party libraries that are linked into the final APE file are embedded in a special linker section. Cosmopolitan Libc itself is licensed under the ISC license (a simplification of the MIT/BSD licenses), but some of the third-party components that Cosmopolitan either uses directly or makes available to compiled programs are licensed under the BSD, MIT, or Apache 2.0 licenses. The Cosmopolitan compiler itself includes GPL-licensed code from GCC, but all of the code linked into finished binaries is licensed under a permissive license (unless one chooses to statically link the GNU C library), so the final result is not covered under the GPL.

Cosmopolitan Libc is a fairly active project; it has regular, small releases. The most recent is v3.9.6. As might be expected for any project with such ambitious compatibility goals, there are usually a number of small bugs affecting different platforms open at any given time. The project is fairly stable, however, with the basic ability to compile programs across multiple systems unchanged since the initial release.

So, while the project does still have its rough edges, it's a promising tool for people who want to write highly portable software. The automated license compliance, static compilation, and easy bundling of non-software dependencies are nice additions to the already compelling idea of running the same binaries on multiple platforms.

Comments (35 posted)

Back In Time back from the dead

By Daroc Alden
November 8, 2024

Back In Time is a GPL-2.0-licensed backup tool based on rsync and written in Python. It has both graphical and command-line interfaces, and supports backups to local disks or over SSH. Back In Time was originally written by Oprea Dan and released in 2009. The tool has been through some rough patches over the years, and is currently on its third set of maintainers. Christian Buhtz, one of the current maintainers, explained to me how he and his co-maintainers had revived the project, as well as why he thought Back In Time stood out from all of the existing backup solutions.

The tool

On first starting up, Back In Time offers to restore a previous configuration from an existing backup on another computer — a nice touch, since it means that restoring backups onto a new computer is as simple as starting the tool and pointing it at the right location. If one does not have an existing configuration, the tool prompts for the information needed to create a new profile. This includes all of the normal options that backup software offers — local and remote backups, encryption, folders to include and exclude, number of backups to keep, etc. — but it also includes some features that are less common, such as the ability to run a backup whenever a particular hard drive is connected. It's also possible, under "Expert Options", to change the priority given to the backup process, to restrict how much network bandwidth it uses, and so on. Despite its other features, Back In Time does not support backing up to cloud storage, or via any protocol other than SSH.

Once configured, Back In Time will (by default) automatically add itself to the user's crontab file, to run backups at the specified frequency. If the user disables crontab-handling, they can instead run Back In Time's command-line tool by some other means. Users who spend all their time in the terminal can avoid the GUI entirely by writing their own configuration to ~/.config/backintime/config, although the project's documentation doesn't really cover that case.

The actual process of making a backup is unremarkable, as one would hope. All of the actual transferring of files is done by rsync, but Back In Time handles the problem of configuring it correctly. When I asked Buhtz what he thought made Back In Time different from other backup software, he said that he thought the use of rsync's --hardlinks flag to reduce the space used by snapshots was of critical importance. "Conceptually rsync does full backups. But technically, considering the storage space used, these are differential backups." That wasn't the only thing he emphasized, however:

The second main feature is that the resulting backup [...] is not a proprietary or project-specific container-like format. It is just a folder in the file system and can be explored and accessed with every other tool.

So restoring files from a Back-In-Time snapshot, browsing existing snapshots, and other related activities are easy to perform with just a file browser or from the command line. Back In Time does support encrypted backups, which complicates these activities slightly, but those use EncFS, so it is still possible to browse a mounted backup volume using the normal tools. Other backup solutions tend to use custom formats, which make this difficult.

Overall, Back In Time has a good deal of thought put into its features. For example, it can handle resuming backups that were interrupted by hibernation. It also tracks the owner of files by name, instead of by ID, so that restoring files to a new computer with different user IDs doesn't cause problems. The tool gives the impression that a lot of its sharp edges have been smoothed over by time. Back In Time is packaged for most major distributions. The latest release, version 1.5.2, is available in the project's GitHub repository.

Maintainership

Buhtz has been a user of Back In Time himself since approximately 2015. In the normal course of using the software, he filed several bug reports. But at the time, Germar Reitze, the then-maintainer "was not able to improve [Back In Time] anymore. He was in maintenance mode and only tried to fix some urgent bugs", Buhtz said. In 2022, Michael Büker opened an issue asking about the future of the project. In a comment on that issue, Reitze said "I'm sorry to say, that I don't find the time for working on [Back in Time] anymore. And I also kind of lost the interest on working on it, too." At the end of the discussion, Jürgen Altfeld, Büker, and Buhtz had all agreed to join the project as maintainers. It took them some time to contact Reitze, but they were eventually able to begin triaging the open tickets and gain commit and merge permissions on the project's repository.

When I asked about Buhtz's motivation for stepping up as maintainer, he said "I am still not sure what my motivation is", but thought that self-improvement and just ensuring that Back In Time itself continued to operate were part of it. "For me, and maybe the other two, there is no alternative to BIT. This might be because of its features and also because of our laziness migrating our backup strategy to an alternative tool."

Buhtz said that Büker, Altfeld, and he work well together. "We never met in person but there was some kind of chemistry that kept us running." He particularly values having someone else to help review his own code. Working without that "feels like working without a safety net or backup". Buhtz feels a lot of responsibility for making sure that the many users of Back In Time can rely on the software not to corrupt their backups.

The three of them have been slowly working through the items in Back In Time's strategy outline for improving the code. Like many open-source projects, however, they could use some help.

We are not only open for contributors but we will try to mentoring them. Contributors don't need to be experts. We all started somewhere. Beside providing code there is also the possibility to improve the translation (47 languages) of the GUI and testing the latest release candidate or development version.

Since taking over maintenance of the project, Buhtz, Büker, and Altfeld have made seven releases (and, on October 20, a release candidate for an eighth). These releases have been larger and more comprehensive than the last few releases Reitze made. Altogether, it seems like Back In Time is in a better place now than it was a few years ago.

Comments (49 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>