Leading items

Welcome to the LWN.net Weekly Edition for February 11, 2021

This edition contains the following feature content:

Visiting another world: Gemini as a simpler alternative to the World Wide Web.
ioctl() for io_uring: bringing one of Unix's most disliked system calls to the io_uring subsystem.
The imminent stable-version apocalypse: 255 minor releases should be enough for anybody.
The burstable CFS bandwidth controller: adjusting the bandwidth controller to be nicer to bursty workloads.
Python cryptography, Rust, and Gentoo: Rust offers some real benefits — except on architectures that don't support it.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Visiting another world

By Jake Edge
February 10, 2021

The world wide web is truly a wondrous invention, but it is not without flaws. There are massive privacy woes that stem from its standards and implementation; it is also so fiendishly complex that few can truly grok all of its expanse. That complexity affords enormous flexibility, for good or ill. Those who are looking for a simpler way to exchange information—or hearken back to web prehistory—may find the Gemini project worth a look.

At its core, Gemini provides a simple, text-based way to present information for others on the internet. It is positioned in between Gopher, from before the advent of the web, and the HTTP and HTML of the web that fairly quickly supplanted Gopher for most people. Gemini "explores the space inbetween gopher and the web, striving to address (perceived) limitations of one while avoiding the (undeniable) pitfalls of the other". The intent is that Gemini will coexist with the other two; Gopher has seen something of a resurgence of late, while the web just keeps growing, of course.

The goal of Gopher, which Gemini has also adopted, is to provide the means to create an interconnected library of information. It is similar to the concept of hypertext, which underlies Hypertext Transfer Protocol (HTTP) and Hypertext Markup Language (HTML), but far less flexible and malleable. In particular, Gemini lacks any support for inline images, which obviously has some downsides in terms of visualization, but provides a number of benefits, not least removing the possibility of image-based advertising.

Gemini is built on a foundation that one network request for a resource simply returns the entire resource or an error. There are no multi-request documents, where various pieces are collected up from (potentially) multiple servers a la HTML. Each Gemini document (a .gmi or .gemini file) is a collection of text in a Markdown-derived markup language, along with links to other resources, but each appears as the only item on its own line. Other resources can be returned, identified by their Multipurpose Internet Mail Extension (MIME) media type, which clients can display or defer to another type of application depending on the type.

One could imagine stripping down the web to a tiny subset of its features, instead of coming up with something entirely new, but there are problems with that approach as the Gemini FAQ describes:

Many people are confused as to why it's worth creating a new protocol to address perceived problems with optional, non-essential features of the web. Just because websites can track users and run CPU-hogging Javsacript and pull in useless multi-megabyte header images or even larger autoplaying videos, doesn't mean they have to. Why not just build non-evil websites using the existing technology?

Of course, this is possible. "The Gemini experience" is roughly equivalent to HTTP where the only request header is "Host" and the only response header is "Content-type" and HTML where the only tags are <p>, <pre>, <a>, <h1> through <h3>, <ul> and <li> and <blockquote> - and the https://gemini.circumlunar.space website offers pretty much this experience. We know it can be done.

The problem is that deciding upon a strictly limited subset of HTTP and HTML, slapping a label on it and calling it a day would do almost nothing to create a clearly demarcated space where people can go to consume only that kind of content in only that kind of way. It's impossible to know in advance whether what's on the other side of a https:// URL will be within the subset or outside it.

Beyond that, it is difficult to create a browser that only supports the limited subset and ignores all of the other content; a Gemini client is much easier to write. Even if the limited browser existed, it would be hard to know which web sites support it. Gemini has a clear philosophical mission that is best served by going its own way:

Alternative, simple-by-design protocols like Gopher and Gemini create alternative, simple-by-design spaces with obvious boundaries and hard restrictions. You know for sure when you enter Geminispace, and you can know for sure and in advance when following a certain link will cause you leave it. While you're there, you know for sure and in advance that everybody else there is playing by the same rules. You can relax and get on with your browsing, and follow links to sites you've never heard of before, which just popped up yesterday, and be confident that they won't try to track you or serve you garbage because they can't. You can do all this with a client you wrote yourself, so you know you can trust it. It's a very different, much more liberating and much more empowering experience than trying to carve out a tiny, invisible sub-sub-sub-sub-space of the web.

Experience

In order to take a peek into Geminispace, I grabbed the AV-98 client that was created by the founder of Gemini, "Solderpunk". AV-98 is a 1500-line, terminal-based, Python program with no required dependencies beyond the Python standard library. When run, it gives a command prompt; the AV-98 "lightning introduction" suggested "go gemini.circumlunar.space", so that's what I did. I was greeted with the project's home page in Geminispace, which is the counterpart to the web-based home page linked above.

It gives a terse bulleted list for a project overview, along with several other categories (Resources, Web proxies, Search engines, Geminispace aggregators, etc.), each of which had entries preceded by a number. Those numbers can be typed to follow that link, thus retrieve the document described. For example, following the "Users with Gemini content on this server" to the entry for "solderpunk", gives the following document:

Solderpunk's Gemini capsule

About

Howdy, I'm Solderpunk - founder and de facto BDFL of the Gemini protocol and the
circumlunar.space universe, Gopher phlogger and general grumpy digitial
malcontent.  This is where I "eat my own dogfood", as they say, with respect to
Gemini.

Contact

• Email: solderpunk@posteo.net
• XMPP: solderpunk@xmpp.circumlunar.space
• Fediverse: @solderpunktilde.zone

Gemlogs (Gemini logs)

[1] My gemlog - verbose ramblings
[2] My "pikkulog" - shorter and less focussed

Non-Gemini presence

Before the arrival of Gemini, my main online presence was my gopherhole at the
Mare Tranquilitatis People's Circumlunar Zaibatsu.  Nowadays I cross-post most
of my long form content to both my gemlog and my phlog, so people can consume it
via whichever protocol they prefer.  But there's three years of pre-Gemini
content available only via Gopher - check it out!

[3] My gopherhole at the Zaibatsu
[4] My minimal website at the Circumlunar web outpost

From that document, entering a "1" leads to the gemlog, which contains a list of post titles and dates, each with their own number that can be used to read more. Incidentally, the "phlog" Solderpunk referred to is a Gopher log, which is the analogue to the gemlog in Gopherspace. The "Non-Gemini presence" section of Solderpunk's home page links to a Gopher document at "gopher://zaibatsu.circumlunar.space/1/~solderpunk/" and a web page, which will open an appropriate browser (if available) when followed. It is, obviously, an intensely textual experience.

Using Gemini feels like you have entered a strange, new world—to a certain extent, that's clearly true. But for those of us who are sufficiently old, it is highly reminiscent of various previous worlds, such as bulletin board systems (BBSes), Archie, Veronica, and, of course, Gopher itself. Beyond that, it might also remind one of text-based adventures, multi-user dungeons (MUDs), and so on. It is, thus, no real surprise that many Gemini and Gopher participants are retrocomputing enthusiasts of various sorts as well.

Protocol

While the Gemini specification describes an almost trivial request-response protocol, its requirement for TLS connections might well be unexpected. TLS is, of course, far from trivial, but it does provide encryption for more secure communication. TLS 1.2 is still grudgingly supported, since OpenSSL is the only major library with good TLS 1.3 support, as the best practices document points out: "requiring TLS 1.3 or higher would discourage the use of libraries like LibreSSL or BearSSL, which otherwise have much to recommend them over OpenSSL". The spec makes it clear that TLS 1.2 support will be phased out when that situation changes.

TLS can also be used to authenticate remote sites, though the recommended "trust on first use" (TOFU) model blunts that ability somewhat. TOFU means that whatever certificate a remote site presents the first time it is visited is accepted and stored away. If said certificate changes before its expiration date on a subsequent visit, the user is supposed to be alerted to that fact. That TOFU acceptance explicitly extends to self-signed certificates, which are considered first-class citizens by the spec. Clients can choose to go a different route, though; for example, AV-98 has modes for both TOFU and certificate validation using certificate authorities (CAs).

Gemini uses uniform resource identifiers (URIs) with the "gemini" scheme. URIs are closely related to the URLs used by the web, a superset in fact; the URI of the Gemini home page is, thus: "gemini://gemini.circumlunar.space/". The URI for the page shown above is: "gemini://gemini.circumlunar.space/users/solderpunk/".

A Gemini request is made by connecting to the server (on port 1965 by default), successfully negotiating a TLS connection, then sending a URI, followed by a carriage return and linefeed (CRLF). The response is a two-digit status code, a space character, a "meta" field of up to 1024 bytes, and a CRLF. The contents of the meta field are status-code-dependent. For success codes (those starting with a "2", so 2x), the meta field is a MIME media type that describes the type of the response, which directly follows the CRLF as the response body. The MIME "charset" parameter can be used to specify an encoding other than the UTF-8 default for the response body, but that is mainly meant for legacy documents; everything else in requests and responses is always in UTF-8.

The server closes the connection after it sends a response; the client can determine whether the response body is complete based on whether the TLS connection has been cleanly shut down or not. There are a few other status codes beyond the 2x success codes, including 4x and 5x error codes (temporary and permanent, respectively), 3x redirect, and 1x input required; the latter is how interactive Gemini applications prompt for user input (e.g. search terms). Beyond that, the 6x codes request a client TLS certificate (or indicate that the one provided was rejected). Using client certificates is a way to restrict access to a resource or to voluntarily establish a server-side session without requiring passwords, cookies, or the like.

Another interesting piece of the protocol is that it is explicitly designed to be non-extensible. There is no version number in the protocol, and the response layout was carefully constructed to make extending it hard:

To minimise the risk of Gemini slowly mutating into something more web-like, it was decided to [include] one and exactly one piece of information in the response header for successful requests. Including two pieces of information with a specified delimiter would provide a very obvious path for later adding a third piece - just use the same delimiter again. There is basically no stable position between one piece of information and arbitrarily many pieces of information, so Gemini sticks hard to the former option, even if it means having to sacrifice some nice and seemingly harmless functionality.

As one might guess, there are no plans to smoothly upgrade to a "better" protocol down the road, so a version number was left out on purpose. That may strike folks as shortsighted or unwise, but Solderpunk and crew have looked at the history of Gopher and concluded that non-extensibility is the right path for Gemini:

The Gopher specification has not been changed in about 30 years, and only a very small number of quite minor unofficial changes to that spec are in common use in today's Gopherspace, which is actually growing in popularity. Gemini combines mature, ubiquitous internet primitives like URIs, MIME media types and TLS in a very straightforward way, and seeks to foster a culture of working within - and even embracing - carefully chosen limitations, rather than removing each constraint as it is encountered to make anything possible. There are plenty of things that Gemini is useful for and good at right now, and there is no reason to think it won't be useful for and good at those same things decades from now.

Wrapping up

Geminispace is not for everyone, but it is an interesting niche. The goal that servers and clients can be written in a weekend project is certainly laudable—and, seemingly, doable. There is a small but growing community, so it is easy to have an impact with a new Gemini project of some sort. That has not been true of the web for quite some time at this point.

The name "Gemini" comes from the US space program. Gemini was the project in between Project Mercury, which put US astronauts into space for the first time in the early 1960s, and Apollo, which put the first (and so far only) humans on the moon. The NASA Gemini project bridged the gap between Mercury's proof-of-concept spacecraft and Apollo's moon-landing craft in an analogous way to how the Gemini protocol fits between Gopher and the web. Overloading the Gemini name, though, does make it somewhat difficult to search for information on the non-space project—at least on the web.

To get started exploring, one can start by getting a client program or by using a web proxy, such as the one at Mozz.us. In either case, visiting the Gemini home page (Mozz.us proxy link) gives a bit of a different view than the HTML version linked above; other links can be followed from there. There are search engines, Gemini Universal Search (GUS) and Houston, though the results can be a little maddening to work with, Gemini mirrors of web resources (e.g. CNN, Wikipedia), Gemini documentation, and more. It is quite possible to start exploring and get lost in a maze of twisty passages, all different.

In the final analysis, one's interest level in Gemini is likely to depend on whether the current web is meeting your needs—or simply irritating the heck out of you. It is also dependent on the kinds of content available on Gemini and whether it appeals or not. A certain level of curmudgeonliness and/or tendency toward neo-Luddism may cause one to lean toward Gemini as well. None of that is meant to be pejorative, as it can certainly be self-applied; in truth, Gemini seems like something rather fun to mess with when time permits.

Comments (12 posted)

ioctl() for io_uring

By Jonathan Corbet
February 4, 2021

Of all the system calls in the Unix tradition, few are as maligned as ioctl(). But ioctl() exists for a reason — for many reasons, in truth — and cannot be expected to go away anytime soon. It is thus unsurprising that there is interest in providing ioctl()-like functionality in the io_uring subsystem. A recent RFC patch set from Jens Axboe shows the form that this feature might take in the io_uring context.

The ioctl() name comes from "I/O control"; this system call was added as a way of performing operations on peripheral devices that went beyond reading and writing data. It could be used to rewind a tape drive, set the baud rate of a serial port, or eject a removable disk, for example. Over the years, uses of ioctl() have grown far beyond such simple applications, with some APIs (media, for example) providing hundreds of operations.

The criticism of ioctl() comes from its multiplexed and device-dependent nature; almost anything that can be represented by a file descriptor supports ioctl(), but the actual operations supported vary from one to the next. While system calls are (in theory, at least) closely scrutinized before being added to the kernel, ioctl() commands often receive close to no review at all. So nobody really knows everything that can be done with ioctl(). For added fun, there is some overlap in the command space, meaning that an ioctl() call made to the wrong file descriptor could have unexpected and highly unpleasant results. Attempts have been made to avoid this problem, but they have not been completely successful.

After dealing with these problems for years, some developers would like to see ioctl() disappear completely, but nobody has ever come up with a replacement that looks materially better. Adding a new system call for every function that might be implemented with ioctl() is a non-starter; having device drivers interpret command streams sent with write() is even worse. There probably is no better way to, for example, tell a camera sensor which color space to use.

It is natural to want to support ioctl() in io_uring; it is not uncommon to mix ioctl() calls with regular I/O, and it would be useful to be able to do everything asynchronously. But every ioctl() call is different, and none of them were designed for asynchronous execution, so an ioctl() implementation within io_uring would have no choice but to execute every call in a separate thread. That might be better than nothing, but it is not anywhere near as efficient as it could be, especially for calls that can be executed right away. Doing ioctl() right for io_uring essentially calls for reinventing the ioctl() interface.

Operations in io_uring are communicated from user space to the kernel via a ring buffer; each is represented as an instance of the somewhat complex io_uring_sqe structure. The new command mechanism is invoked by setting opcode in that structure to IORING_OP_URING_CMD; the fd field must, as usual, contain the file descriptor to operate on. The rest of the structure, though (starting with the off field) is overlaid with something completely different:

    struct io_uring_pdu {
	__u64 data[4];	/* available for free use */
	__u64 reserved;	/* can't be used by application! */
	__u64 data2;	/* available for free use */
    };

The reserved field overlays user_data in the original structure, which is needed for other purposes; thus, no data relevant to the command can be stored there. Applications are unlikely to see this structure, though; it will be overlaid yet again with a structure specific to the command to be executed. For block-subsystem commands, for example, this structure becomes:

    struct block_uring_cmd {
	__u16 	op;
	__u16	pad;
	union {
	    __u32	size;
	    __u32	ioctl_cmd;
	};
	__u64	addr;
	__u64	unused[2];
	__u64	reserved;	/* can never be used */
	__u64	unused2;
    };

Deep down within this structure is ioctl_cmd, which the application should set to the ioctl() command code of interest; the op field should be BLOCK_URING_OP_IOCTL (for now; in the future there could be operations that are not tied to an ioctl() call). In the patch set, the only supported command is BLKBSZGET, which returns the block size of the underlying block device — something that can clearly be done without performing actual I/O or sleeping. The patch set also implements a couple of networking commands using a different structure.

Within the kernel, any subsystem that wants to support io_uring operations must add yet another field to the forever-growing file_operations structure:

    struct io_uring_cmd {
	struct file *file;
	struct io_uring_pdu pdu;
	void (*done)(struct io_uring_cmd *, ssize_t);
    };

    int (*uring_cmd)(struct io_uring_cmd *, enum io_uring_cmd_flags);

Needless to say, any handlers for io_uring IORING_OP_URING_CMD operations should not block. Instead, they can complete the operation immediately, return an error indicating that the operation would block, or run the operation asynchronously and signal completion by calling the given done() function.

This is an initial posting of a change that could have long-term implications, so it would not be surprising to see significant changes before it makes it into the mainline. Indeed, in response to a comment from Darrick Wong, Axboe tweaked the interface to provide eight more bytes of space in struct io_uring_pdu — something that Wong said would be highly useful to be able to submit the "millions upon millions of ioctl calls" created by the xfs_scrub utility.

Whether the addition of an ioctl()-like interface to io_uring — which is rapidly evolving into a sort of shadow, asynchronous system-call interface for Linux — will generate controversy remains to be seen; there has been none in response to the initial posting. Axboe expressed hope that the new commands will be "a lot more sane and useful" than the existing ioctl() commands, but there doesn't seem to be any way to enforce that. As with ioctl(), the addition of new io_uring commands will happen entirely within other subsystems, and the level of scrutiny those additions receive will vary. But io_uring needs this sort of "miscellaneous command" capability in the same way that the system as a whole needs ioctl(), so it would be surprising if this feature were not eventually merged in some form.

Comments (27 posted)

The imminent stable-version apocalypse

By Jonathan Corbet
February 5, 2021

As has often been pointed out, the stable-kernel releases are meant to be stable; that means they should be even more averse to ABI breaks than mainline releases, if that is possible. This may be a hard promise to keep for the next set of stable kernels, though, for the most mundane of reasons: nobody thought that there would be more than 255 minor updates to any given kernel release.

For most of the existence of the kernel project, few developers within the project itself have maintained any given kernel release for more than a couple years or so, and maintenance releases were relatively rare. There were some exceptions; the 2.4 release happened at the beginning of 2001, and Willy Tarreau finally stopped maintaining it more than eleven years later. Even then, the final version was 2.4.37, though one could perhaps call it 2.4.48 after the final set of eleven small "fixup" releases. Releases for kernels maintained for the long term were relatively few and far apart.

In recent years, though, that situation has changed, with some older kernels receiving much more long-term-maintenance attention. Thus, February 3 saw the release of the 4.9.255 and 4.4.255 updates. Those kernels have received 18,765 and 16,986 patches, respectively, and there is no sign of things slowing down. The current posted plan is to maintain 4.9 through January 2023 and 4.4 through February 2022.

These kernel-release numbers are now a problem, as was pointed out by Jari Ruusu. There are a couple of macros defined within the kernel relating to version codes; these can be found in include/generated/uapi/linux/version.h in a built kernel:

    #define LINUX_VERSION_CODE 330496
    #define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))

The first macro, LINUX_VERSION_CODE, is calculated in the top-level makefile; it is the result of:

    (5 << 16) + (11 << 8) + 0

That number (which is 0x50b00) identifies this as a 5.11-rc kernel; it is the same result one gets from KERNEL_VERSION(5,11,0).

One does not have to look long to see that neither of these macros is going to generate the expected result once the minor version ("c" in the KERNEL_VERSION() macro) exceeds 255. Running that macro on a 4.9.255 kernel yields 0x409ff, but on 4.9.256 it will instead return 0x40a00 — which looks like 4.10.0. That might just cause some confusion in the user community.

This problem does not come as a complete surprise to the stable-kernel maintainers; Sasha Levin posted this patch in mid-January in an attempt to fix it. It changes both LINUX_VERSION_CODE and KERNEL_VERSION() to use 16 bits for the minor version, thus eliminating the overflow. This patch got into linux-next, but seems unlikely to stay there; as Jiri Slaby noted, these macros are used by user space and constitute a part of the kernel's ABI. He added that both the GNU C Library and the GCC compiler (the BPF code in particular) use the kernel version code in its current form and would not handle a change well. There are also many other places in the kernel that exchange these version codes with user space; see this media ioctl() command, for example. Changing the kernel's idea of how KERNEL_VERSION() works will break programs compiled with the older macro, which is not something that is allowed.

So what is to be done? As of this writing that has not yet been worked out, but there are a couple of options on the table:

Ruusu's note pointing out the problem suggested that stable releases could start incrementing the EXTRAVERSION field instead; this is the field that normally contains strings like -rc7 (for mainline test releases), or a Git commit ID. The minor version would presumably remain at 255. This would avoid breaking ABI, but would also make it harder for user-space code to distinguish between stable releases after 255. It might also create minor trouble for distributors who are using that field to identify their own builds.
Stable maintainer Greg Kroah-Hartman suggested that he could "leave it alone and just see what happens". But, as Slaby pointed out, that will create the wrapping problem described above, which could confuse user space. If this is done, he said, it would be necessary to mask the minor version to eight bits, causing it to wrap back around to zero; whether that would cause confusion is another question. Version numbers are normally expected to increase monotonically.

The most likely outcome can be seen in the kernel's history, though. Once upon a time, mainline kernel releases had three significant numbers rather than two — 2.6.30, for example. In those days, the minor version field wasn't available for stable updates, so the EXTRAVERSION field was used instead. Looking at the 2.6.30.3 makefile, one sees:

    VERSION = 2
    PATCHLEVEL = 6
    SUBLEVEL = 30
    EXTRAVERSION = .3
    NAME = Man-Eating Seals of Antiquity

That solution worked for years, so there should be no real reason why it wouldn't work now as well. Most likely SUBLEVEL would remain stuck at 255, with EXTRAVERSION indicating the real release number.

It is evidently Leon Trotsky who once said that "old age is the most unexpected of all things that can happen to a man". Perhaps similar forces are at play here; running out of bits is the most unexpected of things that can happen to a kernel developer. This version-number overflow could have been foreseen some time ago, and the date of its occurrence forecast with reasonable certainty. But now some sort of solution has to be found before the next stable-kernel release can be made. Happily, the problem should be easier to resolve than that of old age.

Update: Kroah-Hartman appears to have chosen the "do nothing" option with the release of 4.9.256 and 4.4.256, both of which increment the version number but make no other change. "I'll try to hold off on doing a 'real' 4.9.y release for a week to give everyone a chance to test this out and get back to me. The pending patches in the 4.9.y queue are pretty serious, so I am loath to wait longer than that, consider yourself warned..."

Update 2: In the end, it appears that the clamping solution will be taken, with the minor number fixed at 255 going forward.

Comments (32 posted)

The burstable CFS bandwidth controller

By Jonathan Corbet
February 8, 2021

The kernel's CFS bandwidth controller is an effective way of controlling just how much CPU time is available to each control group. It can keep processes from consuming too much CPU time and ensure that adequate time is available for all processes that need it. That said, it's not entirely surprising that the bandwidth controller is not perfect for every workload out there. This patch set from Huaixin Chang aims to make it work better for bursty, latency-sensitive workloads.

The bandwidth controller only applies to "completely fair scheduling" (CFS) tasks (otherwise known as "normal processes"); the CPU usage of realtime tasks is handled by other means. This controller provides two parameters to manage the limits applied to any given control group:

cpu.cfs_quota_us is the amount of CPU time (in microseconds) available to the group during each accounting period.
cpu.cfs_period_us is the length of the accounting period, also in microseconds.

Thus, for example, setting cpu.cfs_quota_us to 50000 and cpu.cfs_period_us to 100000 will enable the group to consume 50ms of CPU time in every 100ms period. Halving those values (setting cpu.cfs_quota_us to 25000 and cpu.cfs_period_us 50000) allows 25ms of CPU time every 50ms. In both cases, the group has been empowered to consume 50% of one CPU, but in the latter case that time will come more frequently, in smaller chunks.

The distinction between those two cases is important here. Imagine a control group containing a single process that needs to run for 30ms. In the first case, 30ms is less than the allowed 50ms, so the process will be able to complete its task without being throttled. In the second case, the process will be cut off after running for 25ms; it will then have to wait for the next 50ms period to start before it can finish its job. If the workload is sensitive to latency, the bandwidth-controller parameters need to be set with care.

This mechanism works reasonably well for workloads that consistently require a specific amount of CPU time. It can be a bit more awkward, though, for bursty workloads. A given process may use far less than its quota during most periods, but occasionally a burst of work may come along that requires more CPU time than the quota allows. In cases where latency doesn't matter, making that process wait for the next period to finish its work may not be a problem; if latency does matter, though, this delay can be a real concern.

There are ways to try to work around this issue. One, of course, is to just give the process in question a quota that is large enough to handle the workload bursts, but doing that will enable the process to consume more CPU time overall. System administrators may not like that result, especially if there is money involved and only so much time is actually being paid for. The alternative would be to increase both the quota and the period, but that, too, can increase latency if the process ends up waiting for the next period anyway.

Chang's patch set enables a different approach: allow control groups to carry over some of their unused quota from one period to the next. A new parameter, cpu.cfs_burst_us, sets the maximum amount of time that can be accumulated that way. As an example, let's return to the group with a quota of 25ms and a period of 50ms. If cpu.cfs_burst_us is set to 40000 (40ms), then processes in that group can run for up to 40ms in a given period, but only if they have carried over the 15ms beyond their normal quota from previous periods. This allows the group to respond to a burst of work while still keeping it within the quota in the longer term.

Another way of looking at this situation is that, when cpu.cfs_burst_us is in use, the quota is interpreted differently than before. Rather than being an absolute limit, the quota is an amount of CPU time that is deposited into the group's CPU-time account every period, with the burst value capping the value of that account. Bursty groups can save up a limited amount of CPU time in that account for when they need it.

By default, cpu.cfs_burst_us is zero, which disables the burst mechanism and preserves the previous behavior. There is a sysctl knob that can be used to disable burst usage across the entire system. Another knob (sysctl_sched_cfs_bw_burst_onset_percent) causes the controller to give each group a given percentage of their burst quota at the beginning of each period, regardless of whether that time has been accumulated in previous periods.

The patch set comes with some benchmark results showing order-of-magnitude reductions in worst-case latencies when the burstable controller is in use. This idea has been seen on the lists a few times at this point, both in its current form and as separate implementations by Cong Wang and Konstantin Khlebnikov. It looks as if the biggest roadblocks have been overcome at this point, so this change could find its way into the mainline as soon as the 5.13 merge window.

Comments (4 posted)

Python cryptography, Rust, and Gentoo

By Jake Edge
February 10, 2021

There is always a certain amount of tension between the goals of those using older, less-popular architectures and the goals of projects targeting more mainstream users and systems. In many ways, our community has been spoiled by the number of architectures supported by GCC, but a lot of new software is not being written in C—and existing software is migrating away from it. The Rust language is often the choice these days for both new and existing code bases, but it is built with LLVM, which supports fewer architectures than GCC supports—and Linux runs on. So the question that arises is how much these older, non-Rusty architectures should be able to hold back future development; the answer, in several places now, has been "not much".

The latest issue came up on the Gentoo development mailing list; Michał Górny noted that the Python cryptography library has started replacing some of its C code with Rust, which is now required to build the library. Since the Gentoo Portage package manager indirectly depends on cryptography, "we will probably have to entirely drop support for architectures that are not supported by Rust". He listed five architectures that are not supported by upstream Rust (alpha, hppa, ia64, m68k, and s390) and an additional five that are supported but do not have Gentoo Rust packages (mips, 32-bit ppc, sparc, s390x, and riscv).

Górny filed a bug in the cryptography GitHub repository, "but apparently upstream considers Rust's 'memory safety' more important than ability to actually use the package". As might be guessed, the developers of the library have a bit of a different way of looking at things. But the enormous comment stream on the bug made it clear that many were taken by surprise by the change that was made in version 3.4 of cryptography, which was released on February 7.

Christian Heimes, one of the ~~developers of~~ contributors to the library, made it clear that they would not be removing the dependence on Rust. He pointed to an FAQ entry on how to disable the Rust dependency for building the library, but noted that it will not work when cryptography 3.5 is released. He also pointed out that Rust is solely a build-time dependency; there are no run-time dependencies added.

But multiple people in the bug report complained about how the notice was given that the Rust dependency was being added; some thought that the project followed semantic versioning, which would mean that this kind of change should not come in a minor release. It turns out that the project has its own versioning scheme, which allows this kind of change (as does semantic versioning, actually). But Heimes did indicate that there may not have been sufficient communication about the change. He pointed to a pull request from July 2020 and a December 22 cryptography-dev mailing list announcement, both by Alex Gaynor, as places where the issue surfaced. Following links from those finds more discussion of the idea, but it is clear that news of the upcoming change did not reach far outside of the cryptography community. In part, that may be due to the usual way users get the library, as Heimes explained:

The majority of users either uses binary wheels (macOS x86_64, glibc Linux x86_64 + aarch64, Windows X86 + X86_64) or Linux distro packages. Binary wheels don't require an additional Rust libraries. Only users on Alpine (musl), BSD, other hardware platforms, and distro packagers are affected.

Many of the Alpine Linux users who were affected by the change, some of whom were loudly annoyed in comments on the bugs, have continuous integration and deployment (CI/CD) systems that update and build relevant packages frequently. In this case, though, the missing Rust compiler broke many of them. The most recent Alpine versions do have Rust support, though, so the fix there is fairly straightforward, or may be.

But for architectures that currently do not, and, in truth, likely never will support Rust, there is no way forward except perhaps forking cryptography and maintaining the C-based version going forward. Górny suggested that in the gentoo-dev thread and in the bug. Others were similarly inclined toward that, but it is unclear if there is really enough wherewithal to support such a fork. Python 3.8 and 3.9 release manager Łukasz Langa challenged Górny (and others) to proceed with a fork: "I invite you to do that. Please put your money and time where your mouth is. Report back in a year's time how it went."

Langa also pointed out that the cryptography maintainers are volunteers as well, which means they get to allocate their efforts in whatever direction they wish, even if it makes it inconvenient for other volunteers elsewhere. Beyond that, those changes are being made for a reason:

Before I begin, I'd like to remind you that security is a numbers game. If the cryptography maintainers can help 90% of their users by switching to a modern memory-safe language, then they'd be irresponsible holding back just because among the remaining 10% there exist fringe platforms which can't even run a Rust compiler.

[...] You expect those volunteers to keep their security-focused project dependent on inherently insecure technology because it would make your own job easier. Your goals and requirements might not be matching the goals and plans of the maintainers of cryptography. It might be unfortunate for you but it really is as simple as that.

The bug comments went on at length; there were some real problems that needed addressing in the way the CI/CD systems were handling versions in dependencies like cryptography, for example. But there was plenty of heat directed at the developers for "forcing" their Rust choice on others, and for "breaking" various systems. For their part, the developers have tried to help those with systems that can run Rust, but have shrugged their shoulders about the others.

Eventually, things boiled over and commenting was disallowed from anyone other than project contributors. Gaynor, in particular, felt that the problems were unavoidable for these, largely ancient, platforms. Once the thread had closed, he summarized what had been discussed and reiterated that the cryptography developers are not going to be held back by platforms that do not support Rust.

Back in Gentoo-land, it turned out that the cryptography dependency for Portage came because it was using urllib3 and requests. Those two packages in Gentoo are dependent on cryptography, but it turns out that they do not actually need it. A pull request to fix that was merged, so the problem for Portage, which is pretty fundamental to the operation of a Gentoo system, was averted.

At least it was averted for now. Górny is concerned that the trustme TLS test-certificate generator, which is used in the distribution's tests, does need cryptography, so some platforms may not be able to be fully tested. On the other hand, the cryptography developers ~~have decided~~ may decide to create a 3.3 LTS release that will maintain the pre-Rust version of the library until the end of 2021. Only fixes for CVEs will be made in that version, however.

But Górny has a bigger worry. He believes it is possible that some future version of Python itself will require Rust, though it is not entirely clear what he is basing that on. It would be devastating for Gentoo on the architectures that do not have Rust, since the distribution relies heavily on Python. It would seem likely to be problematic for other distributions as well, but the only real solution there is to get LLVM (thus Rust) working for those architectures—or for the gccrs GCC front-end for Rust (or a similar project) to make further progress.

While it may well be that Python itself does not go down that path, it is pretty clear that Rust is becoming more popular with every passing day. It would certainly be wonderful if it could be supported "everywhere", but it is going to take some real work to get there. The LLVM developers have been somewhat leery of taking on new architectures, unless they can be convinced there will be long-term support for them, which is understandable, but makes the problem even worse.

We saw a problem similar to that of Gentoo's back in 2018 with Debian and librsvg and we are likely to see it recur—frequently—over the coming years. It is not unreasonable for projects to use new tools, nor for projects to be uninterested in supporting ancient architectures. It is most certainly unfortunate, but we find ourselves between the proverbial rock and its friend, the hard place. Perhaps, with luck, something will change with that predicament.

Comments (259 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>