Leading items

Welcome to the LWN.net Weekly Edition for January 20, 2022

This edition contains the following feature content:

Python sets, frozensets, and literals: why can't you write a literal frozenset in Python?
Brian Kernighan on the origins of Unix: an early Unix pioneer reminisces at linux.conf.au.
Resurrecting fbdev: a move to assume maintainership of the kernel's framebuffer-device subsystem stirs things up.
The first half of the 5.17 merge window: what's coming in the next kernel release.
Struct slab comes to 5.17: a look at a low-level memory-management change and why it makes sense.
A note for LWN subscribers: it's time for LWN's first price change in nearly 12 years.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Python sets, frozensets, and literals

By Jake Edge
January 18, 2022

A Python "frozenset" is simply a set object that is immutable—the objects it contains are determined at initialization time and cannot be changed thereafter. Like sets, frozensets are built into the language, but unlike most of the other standard Python types, there is no way to create a literal frozenset object. Changing that, by providing a mechanism to do so, was the topic of a recent discussion on the python-ideas mailing list.

Dictionaries, lists, tuples, and sets, which are called collections in Python, can all be created "on the fly" in the language using a variety of brackets:

    >>> a_dict = { 'a' : 42 }
    >>> a_set = { 'a', 42 }
    >>> a_list = [ 'a', 42 ]
    >>> a_tuple = ( 'a', 42 )

    >>> print(a_dict, a_set, a_list, a_tuple)
    {'a': 42} {42, 'a'} ['a', 42] ('a', 42)

The tuple is the only immutable type in there, as the rest can be changed by various means. In Python terms, that means the tuple is the only hashable object; it can be used in places where a hashable is required, which includes dictionary keys and set members. Both of those mechanisms require a stable, unchanging value, which in turn requires an immutable object.

As with mathematical sets, Python sets only contain a single element of a given value; adding the same value multiple times does not change the set. Continuing on from the example above:

    >>> a_set.add('b')
    >>> a_set.add('a')
    >>> a_set
    {'b', 42, 'a'}

Meanwhile, a frozenset can only be created using the frozenset() constructor:

    >>> an_fset = frozenset(a_set)
    >>> an_fset
    frozenset({'b', 42, 'a'})
    >>> an_fset.add('c')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'frozenset' object has no attribute 'add'

As can be seen, no new elements can be added to the immutable frozenset; some of the operations that are defined for sets are not available for frozensets. One implication is that, since set members must be hashable, sets containing sets must actually contain frozensets:

    >>> new_set = { a_set }
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'set'
    >>> new_set = { an_fset }
    >>> new_set
    {frozenset({'b', 42, 'a'})}

And, as expected, creating a set of two identical frozensets only contains one element:

    >>> fset2 = frozenset(a_set)
    >>> new_set = { fset2, an_fset }
    >>> new_set
    {frozenset({'b', 42, 'a'})}

Literal frozensets?

It is against that backdrop that Steven D'Aprano posted a query about adding a way to create frozensets on the fly in some fashion, like with the other built-in collection types. His jumping off point was an enhancement request that noted the most recent Python compiler already takes a shortcut when building literal sets of constant values; it creates them as frozensets, rather than as mutable sets. D'Aprano used the dis module to disassemble and display the bytecode for a simple Python statement:

CPython already has some neat optimizations in place to use frozensets instead of sets, for example:
>>> import dis
>>> dis.dis("x in {1, 2, 3}")
  1           0 LOAD_NAME                0 (x)
              2 LOAD_CONST               0 (frozenset({1, 2, 3}))
              4 CONTAINS_OP              0
              6 RETURN_VALUE
and the compiler can build frozensets of literals as a constant.

The set literal "{1, 2, 3}" is built as a constant frozenset (since it cannot be changed) by the compiler and the bytecode loads it directly. But, as he demonstrated with another example, when the programmer wants a frozenset of constant values themselves, the compiler creates a frozenset constant, turns it into a set, and calls frozenset() at run time to create the frozenset it already had: "So to get the frozenset we want, we start with the frozenset we want, and make an unnecessary copy the long way o_O".

It should be noted that his second example, and the one shown in the enhancement request, only occur in the under-development Python 3.11 version of the language. The example shown above, though, works in Python 3.8 (and likely earlier versions), so the compiler clearly has the ability needed to create frozenset constants. But all sets obviously cannot just be switched to frozensets. Beyond that, since the frozenset() call is made at run time, there is the possibility that the function has been shadowed or monkey-patched to do something subtly (or not-so-subtly) different. D'Aprano had a suggestion:

It seems to me that all of the machinery to make this work already exists. The compiler already knows how to create frozensets at compile-time, avoiding the need to lookup and call the frozenset() builtin. All we need is syntax for a frozenset display.
How does this work for you?
    f{1, 2, 3}

As might be guessed, the "spelling" of the syntax was not pleasing to some. Chris Angelico objected that it would look too much like other constructs that have a different interpretation:

While it's tempting, it does create an awkward distinction.
f(1, 2, 3) # look up f, call it with parameters
f[1, 2, 3] # look up f, subscript it with paramters
f{1, 2, 3} # construct a frozenset
And that means it's going to be a bug magnet.

He suggested using angle brackets instead (e.g. <1, 2, 3>), if the parser could be made to handle it. D'Aprano thought that the switch to the PEG parser might enable using angle brackets, but he was not in favor of doing so:

Reading this makes my eyes bleed:
    >>> <1, 2, 3> < <1, 2, 3, 4>
    True

D'Aprano said that Python's f-strings provide another example of how "f" can be used in a potentially confusing way; beyond that, "r" can be used to prefix raw strings, but be used as a function or array name too. "I don't think that f{} will be any more of a bug magnet than f"" and r"" already are." Inevitably, other suggestions were made, including Rob Cliffe's for "fs{1, 2, 3}" to avoid the possible ambiguity of simply using "f" and Greg Ewing's joke suggestion of using the Unicode snowflake ("❄{1, 2, 3}"). Matthew Barnett (MRAB) suggested double curly brackets, which would open up another possibility:

How about doubling-up the braces:
    {{1, 2, 3}}
and for frozen dicts:
    {{1: 'one', 2: 'two', 3: 'three'}}
if needed?

At least currently, there is no built-in frozendict; it was rejected when proposed back in 2012. But either of those two expressions currently raise exceptions, because sets and dictionaries are not hashable, which means that syntax could potentially be used. As Barnett pointed out, though, nested frozensets might lead to curly-brace overload.

Oscar Benjamin wondered about whether frozenset comprehensions (analogous to list comprehensions) should be supported. A set comprehension like the following:

    >>> {x**2 for x in range(10)}
    {0, 1, 64, 4, 36, 9, 16, 49, 81, 25}

could be turned into the frozenset equivalent. Benjamin asked, should that work? In the abstract, at least, D'Aprano thought it should, since it "would be an obvious extension of the syntax". There could be technical hurdles that "might make it less attractive", however.

Advantages

There were questions about the advantages of adding some kind of literal frozenset syntax. In his initial message, D'Aprano said that the idea had come up before, most recently in 2018, where it was suggested for consistency's sake. Inada Naoki recognized the inconsistency of not having a way to specify a literal frozenset, but was only lukewarm on adding syntax for it unless improvements to existing code could be demonstrated. Christopher Barker also thought that consistency was behind the current effort, but D'Aprano said that there were other reasons for it:

CPython already has all the machinery needed to create constant frozensets of constants at compile time. It already implicitly optimizes some expressions involving tuples and sets into frozensets, but such optimizations are fragile and refactoring your code can remove them. Ironically, that same optimization makes the explicit creation of a frozenset needlessly inefficient.

He noted some uses of frozenset in the standard library that might benefit from the new syntax, but does not think the change will be earth-shattering by any means:

The benefit is not huge. This is not list comprehensions or decorator syntax, which revolutionized the way we write Python, it is an incremental improvement. If the compiler didn't already have the machinery in place for building compile-time constant frozensets, this might not be worth the effort.

D'Aprano also described some of the problems that can arise with the current state of affairs. Creating a frozenset is dependent on the frozenset() function, unlike, say, creating a list: "[1, 2, 3] is guaranteed to return a genuine list, even if the name 'list' is deleted, shadowed or replaced". In addition, there are current optimizations that are done, but that are somewhat fragile:

If you are writing if x in ("this", "that", "another", "more") then you probably should be using a frozenset literal, since membership testing in sets is faster than linear search of a tuple.
I think that the CPython peephole optimizer actually replaces that tuple with a frozenset, which is cool, but you can defeat that optimization and go back to slow linear search by refactoring the code and giving the targets a name:
    targets = ("this", "that", "another", "more")
    if x in targets: ...

In the end, this "feature" would not be a big change, either in CPython, itself, or for the Python ecosystem, but it would remove a small wart that might be worth addressing. Consistency and avoiding needless work when creating a literal frozenset both seem like good reasons to consider making the change. Whether a Python Enhancement Proposal (PEP) emerges remains to be seen. If it does, no major opposition arises, and the inevitable bikeshed-o-rama over its spelling ever converges, it just might appear in an upcoming Python—perhaps even Python 3.11 in October.

Comments (48 posted)

Brian Kernighan on the origins of Unix

By Jonathan Corbet
January 17, 2022

LCA

Once again, the COVID pandemic has forced linux.conf.au to go virtual, thus depriving your editor of a couple of 24-hour, economy-class, middle-seat experiences. This naturally leads to a set of mixed feelings. LCA has always put a priority on interesting keynote talks, and that has carried over into the online event; the opening keynote for LCA 2022 was given by Brian Kernighan. Despite being seen as a founder of our community, Kernighan is rarely seen at Linux events; he used his LCA keynote to reminisce for a while on where Unix came from and what its legacy is.

He began by introducing Bell Labs, which was formed by US telecommunications giant AT&T to carry out research on how to improve telephone services. A lot of inventions came out of Bell Labs, including the transistor, the laser, and fiber optics. Such was the concentration of talent there that, at one point, Claude Shannon and Richard Hamming shared an office. Kernighan joined Bell Labs in 1967, when there were about 25 people engaged in computer-science research.

Early on, Bell Labs joined up with MIT and General Electric to work on a time-sharing operating system known as Multics. As one might have predicted, the attempted collaboration between a research lab, a university, and a profit-making company did not work all that well; Multics slipped later and later, and Bell Labs eventually pulled out of the project. That left two researchers who had been working on Multics — Ken Thompson and Dennis Ritchie — without a project to work on.

After searching for a machine to work on, Thompson eventually found an old PDP-7, which was already obsolete at that time, to do some work on filesystem design. The first Unix-like system was, in essence, a test harness to measure filesystem throughput. But he and Ritchie later concluded that it was something close to the sort of timesharing system they had been trying to build before. This system helped them to convince the lab to buy them a PDP-11/20 for further development. The initial plan was to create a system for document processing, with an initial focus of, inevitably, preparing patent applications. The result was "recognizably Unix" and was used to get real work done.

The first Unix versions were entirely done in assembly, but the Multics project had shown that higher-level languages could be used for operating-system development. Starting with the BCPL language used there, Thompson pared it down to a language called B; Ritchie then added niceties like a type system, creating C. The C language hit a "sweet spot" in language design that has proved hard for anybody else to hit since, Kernighan said.

The advent of C brought the development of a wide range of system tools, perhaps the most important of which was Steve Johnson's portable C compiler, which made the operating system itself portable. Around this time pipes were added as well. That was the genesis of one of the core Unix ideas: write small tools that can be combined to solve complicated problems.

Sixth-edition Unix, released in 1975, was the first widely used version of the operating system. It included a number of other core Unix concepts, including the hierarchical filesystem, devices represented as files, a programmable shell, regular expressions, and more. All of this was implemented in about 9,000 lines of C code. The system was small and comprehensible, which led to a lot of interesting things, including the highly influential Lions' Commentary on Unix.

The 1980s were the golden age of Unix, Kernighan said. Unix was everywhere and widely respected. Thompson and Ritchie were given the Turing Award for their work. The absolute peak, he said, was when Unix appeared in the film Jurassic Park. But then the fight between AT&T and Berkeley began, leading to a pointless lawsuit, and the beginning of the fragmentation of the system.

In 1991, Linus Torvalds announced his work on Linux, and "the rest is history".

Kernighan moved into his conclusion by asking what the technical legacy of Unix is. Some significant pieces of that legacy, including the hierarchical filesystem, use of high-level languages, and the programmable shell, have their origins in Multics. Others, including pipes, the whole tools concept, and regular expressions, are a direct result of the Unix work. And most importantly, he said, was that Unix brought a new philosophy on how to create software.

Almost all of this was in place by 1975 and, he said, it may well be true that there have been no great insights into operating-system design since. Certainly Unix has seen a lot of additions, including networking, multiprocessor support, graphical interfaces, Unicode, and more. But it's all built on the foundation created nearly 50 years ago.

The creation of Unix was the result of an accidental combination of factors, starting with the juxtaposition of two exceptionally creative people "with good taste". Kernighan gave a lot of credit to "benign management" at Bell Labs that allowed this work to go forward, naming Doug McIlroy in particular. It was also spurred by the arrival of cheap hardware (in which category he included the $50,000 PDP-11 used to develop the system early on). But a key part was the Bell Labs working environment, which included stable funding and a long-term view, which is something that is hard to find today.

Could something like Unix happen again? There are plenty of talented people around now, he said, and good management does exist. Hardware has never been cheaper, and most of the software is free. But, he said, good environments for the creation of this kind of work are hard to find. Even so, he concluded, think of all the great things that began with one or two people and a good idea; that can certainly happen again. Ritchie once said that Unix was created to provide a community where interesting things could happen; we should try to create such a community again.

In your editor's opinion, Kernighan missed an opportunity to evaluate the free-software community in these terms. Companies may not have long time horizons, but many free-software projects do. It is, still, a place where people with good ideas can come together and see where those ideas lead. It would have been good to hear his thoughts on whether the free-software community has become that place where interesting things can happen and, if not, what we should seek to change to get there.

[The video of this talk is available on YouTube.]

Comments (67 posted)

Resurrecting fbdev

By Jake Edge
January 19, 2022

The Linux framebuffer device (fbdev) subsystem has long languished in something of a purgatory; it was listed as "orphaned" in the MAINTAINERS file and saw fairly minimal maintenance, mostly driven by developers working elsewhere in the kernel graphics stack. That all changed, in an eye-opening way, on January 17, when Linus Torvalds merged a change to make Helge Deller the new maintainer of the subsystem. But it turns out that the problems in fbdev run deep, at least according to much of the rest of the kernel graphics community. By seeming to take on the maintainer role in order to revert the removal of some buggy features from fbdev, Deller has created something of a controversy.

Part of the concern within the graphics community is the accelerated timeline that these events played out on. Deller posted his intention to take over maintenance of the framebuffer on Friday, January 14, which received an ack from Geert Uytterhoeven later that day. Two days later, before any other responses had come in, Deller sent a pull request to Torvalds to add Deller as the fbdev maintainer, which was promptly picked up. On January 19, Deller posted reversions of two patch sets that removed scrolling acceleration from fbdev. In the meantime, those reversions had already been made in Deller's brand new fbdev Git tree.

The patch sets that were being targeted for reversion had been posted and merged some time ago. Daniel Vetter disabled accelerated scrolling for the framebuffer console (fbcon) back at the end of 2020. At the time, he added a "todo" item to garbage collect the code that supported that accelerated scrolling. Claudio Suarez posted a patch completing that todo item in September 2021, which was committed in October. On January 13, shortly before deciding to take on maintenance of fbdev, Deller asked for a reversion of the latter patch (or parts of it).

Once Monday January 17 rolled around, Vetter and others noticed the flurry of activity that had occurred over the weekend and weighed in. Vetter suggested that it might have been premature to make a maintainer change "without even bothering to get any input from the people who've been maintaining it before". In particular, he was concerned about moving fbdev and fbcon to a tree separate from the DRM tree; the subsystem may have been marked as orphaned but the situation is more complicated than that:

Because the status isn't entirely correct, fbdev core code and fbcon and all that has been maintained, but in bugfixes only mode. And there's very solid&important reasons to keep merging these patches through a drm tree, because that's where all the driver development happens, and hence also all the testing (e.g. the drm test suite has some fbdev tests - the only automated ones that exist to my knowledge - and we run them in CI [continuous integration]). So moving that into an obscure new tree which isn't even in linux-next yet is no good at all.
Now fbdev driver bugfixes is indeed practically orphaned and I very much welcome anyone stepping up for that, but the simplest approach there would be to just get drm-misc commit rights and push the oddball bugfix in there directly.

Beyond that, Jani Nikula was taken aback by the whirlwind pace of the changes. In particular, he was not happy to see the reversions being made in the new fbdev tree almost immediately, even though the objection was only made a few days earlier. "I'm heavily in favor of maintainers who are open, transparent, collaborative, who seek consensus through discussion, and only put their foot down when required." Deller said that he had just started going through the backlog of patches; "nothing has been pushed yet". He said that Nikula should simply ignore the state of the fbdev tree at this point.

In response to Vetter, Deller said that having a separate tree was not important. He listed four goals for maintaining fbdev going forward:

to get fixes which were posted to fbdev mailing list applied if they are useful & correct,
to include new drivers (for old hardware) if they arrive (probably happens rarely but there can be). I know of at least one driver which won't be able to support DRM.... Of course, if the hardware is capable to support DRM, it should be written for DRM and not applied for fbdev.
reintroduce the state where fbcon is fast on fbdev. This is important for non-DRM machines, either when run on native hardware or in an emulator.
not break DRM development

Vetter pointed Deller to the documentation for coming up to speed on DRM development and for getting commit rights in the drm-misc tree, which is the proper path for fbdev fixes, he said. After that:

I think once we've set that up and got it going we can look at the bigger items. Some of them are fairly low-hanging fruit, but the past 5+ years absolutely no one bothered to step up and sort them out. Other problem areas in fbdev are extremely hard to fix properly, without only doing minimal security-fixes only support, so fair warning there. I think a good starting point would be to read the patches and discussions for some of the things you've reverted in your tree.
Anyway I hope this gets you started, and hopefully after a minor detour: Welcome to dri-devel, we're happy to take any help we can get, there's lots to do!

Deller eventually decided to keep the fbdev tree, though he does plan to coordinate with the rest of the graphics development community:

I'm not planning to push code to fbdev/fbcon without having discussed everything on dri-devel. Everything which somehow would affect DRM needs to be discussed on dri-devel and then - after agreement - either pushed via the fbdev git tree or the drm-misc tree.

It is clear there are differences of opinion on how to proceed. The hardware-accelerated scrolling that was removed was dependent on the 2D bit-blit acceleration features of older hardware. But the code that used it in the fbdev drivers was apparently rather buggy; over the years, syzbot repeatedly found problems in that code, which is why it was eventually removed. The DRM subsystem does not have support for 2D acceleration, and will not, due to some serious technical difficulties in doing so.

On the other hand, Deller and others have graphics hardware that uses the fbdev drivers and, formerly, had reasonable performance using the hardware-accelerated scrolling. That scrolling performance went away when the code was removed, and they would like to get it back. But reverting the removals simply brings back the buggy code. From the perspective of the DRM developers, the right way forward is to create DRM-based drivers for these devices, but Deller and others disagree.

The larger issue is how the transition has been handled, Vetter said in the reversion thread:

The other side is that being a maintainer is about collaboration, and this entire fbdev maintainership takeover has been a demonstration of anything but that. [...] This entire affair of rushing in a maintainer change over the w/e [weekend] and then being greeted by a lot of wtf mails next Monday does leave a rather sour aftertaste. Plus that thread shows a lot of misunderstandings of what's all been going on and what drm can and cannot do by Helge, which doesn't improve the entire "we need fbdev back" argument.

Vetter strongly believes that if the removed features are to return, the fbdev code needs to be modernized to a point "where we can still tell distros that enabling it is an ok thing to do and not just a CVE subscription". In addition, he believes there is a more straightforward path toward improving the scrolling behavior without bringing back all of the problems that syzbot has found:

Also wrt the issue at hand of "fbcon scrolling": The way to actually do that with some speed is to render into a fully cached shadow buffer and upload changed areas with a timer. Not with hw accelerated scrolling, at least not if we just don't have full scale development teams for each driver because creating 2d accel that doesn't suck is really hard. drm fbdev compat helpers give you that shadow buffer for free (well you got to set some options).

But Deller sees things differently; there are existing drivers that need the support that was removed. He intends to try to restore that support, while also presumably fixing whatever problems syzbot or others find:

But in addition fbdev/fbcon is the kernel framework for nearly all existing graphic cards which are not (yet) supported by DRM. They need fbdev/fbcon to show their text console and maybe a simple X server. If you break fbdev for those cards, they are completely stuck. Hopefully those drivers will be ported to DRM, but that's currently not easily possible (or they would be so slow that they are [unusable]).

The DRM developers seem skeptical that the problems already identified can be addressed, but it would seem that they should be giving Deller some time to do so. The "orphaned" status of fbdev was perhaps not the right choice, though it is clear that the DRM community believed any change to that status would come by way of discussion and agreement, rather than via a surprise weekend takeover. Be that as it may, a new maintainer for a long-unloved part of the kernel should be seen as a good thing. We will have to wait to see how it all works out.

Comments (26 posted)

The first half of the 5.17 merge window

By Jonathan Corbet
January 13, 2022

As of this writing, just short of 7,000 non-merge commits have been pulled into the mainline kernel repository for the 5.17 release. The changes pulled thus far bring new features across the kernel; read on for a summary of what has been merged during the first half of the 5.17 merge window.

Architecture-specific

The arm64 architecture has gained support for the kernel concurrency sanitizer (KCSAN).
32-Bit Arm systems now support KFENCE.
The boot-time memtest memory tester is now available on the m68k architecture.
The new "AMD P-State" subsystem is a power-control mechanism for upcoming AMD processors that, it is said, offers significantly better performance. See this documentation commit for more information.

Core kernel

The bpf_loop() helper is an alternative way of implementing (some) loops in BPF programs; it can improve performance and ease the task of getting loops past the BPF verifier.
The "compile once/run everywhere" (CO-RE) mechanism, formerly implemented in user space, now runs within the kernel. This is a step toward the eventual implementation of signed BPF programs and also makes BPF functionality more readily available to languages like Go.
The scheduler now tracks forced-idle time — the time that an SMT sibling processor is forced into the idle state as the result of core-scheduling constraints. This information, which can be used to evaluate the cost of enabling core scheduling, can be found in /proc/PID/sched.
The RCU_FAST_NO_HZ configuration option, meant for advanced tweaking of the RCU algorithm on tickless CPUs, has been removed. It seems that no actual users of this feature could be found.

Filesystems and block I/O

ID-mapped filesystem mounts can now be made on top of filesystems that are, themselves, ID-mapped. This merge commit has some more information.

Hardware support

Graphics: the direct rendering subsystem has gained support for electronic privacy screens, as found on various laptop models. Also: JDI R63452 Full HD DSI panels, Ilitek ILI9163 display panels, Novatek NT35950 DSI panels, Boe BF060Y8M-AJ0 panels, Sony Tulip Truly NT35521 panels, and R-Car DU MIPI DSI encoders.
Hardware monitoring: Texas Instruments INA238 power monitors, ASUS WMI B550/X570 and X370/X470/B450/X399 hardware monitoring interfaces, Delta AHE-50DC fan control modules, Renesas RZ/G2L thermal sensors, MPS MP5023 hardware monitoring interfaces, and NZXT fan controllers.
Media: STMicroelectronics STM32 Chrom-Art accelerators, Maxim MAX96712 quad GMSL2 deserializers, OmniVision OV5693 sensors, and various codecs with VP9 support.
Miscellaneous: Apple PMGR power-state controllers, R-Car Gen4 system controllers, Samsung Exynos universal serial interfaces, StarFive JH7100 clock generators, StarFive JH7100 reset controllers, Marvell CN10K performance-monitoring units, HiSilicon PCIe performance-monitoring units, Marvell CN10K random number generators, Letsketch WP9620N tablets, Maxim MAX77976 battery chargers, Lenovo Yoga Book tablets, Siemens Simatic LED controllers, Siemens Simatic IPC watchdogs, Asus TF103C 2-in-1 keyboard docks, Renesas R-Car Gen3 and RZ/N1 NAND controllers, TI TPS68470 PMIC regulators, and Maxim MAX20086-MAX20089 camera power protectors.
Networking: Engleder TSN endpoint Ethernet MACs, Microchip Lan966x network switches, Vertexcom MSE102x SPI interfaces, and Mellanox Spectrum-4 Ethernet switches.
Pin control: Qualcomm SDX65 and SM8450 pin controllers, StarFive JH7100 pin controllers, NXP IMXRT1050 pin controllers, and Intel Thunder Bay pin controllers.

Networking

The reference-count tracking infrastructure has been added. This mechanism should help developers track down the source of reference-count bugs. For now it is specific to the networking subsystem but should be relatively easily extended to other parts of the kernel.
The new "converged security and management engine" module allows communication with the Intel management engine (the separate processor lurking within Intel CPUs) via WiFi.
Support for offloading traffic-control actions to network devices has been added; some information can be found in this commit.
The management component transport protocol (MCTP) is now supported over serial devices. MCTP support over SMBus was also merged but subsequently reverted after the I2C maintainer complained about not having been involved in the necessary I2C core changes.

Security-related

The kernel's random-number generator has switched from the SHA1 hash algorithm to BLAKE2s, which is both faster and more secure.

Virtualization and containers

User-mode Linux can now be booted with a devicetree blob, facilitating testing of driver code.
The Xen USB virtual host driver allows access to USB devices to be passed through to Xen guests.

Internal kernel changes

The struct page fields used by the slab allocators have been moved into a separate structure. An early version of this work was covered in this article; it has since been pushed further by Vlastimil Babka.
KCSAN has gained the ability to detect some missing memory barriers. Some more information can be found in this documentation commit.
The new gpio-sim module creates simulated GPIO chips for testing; see this commit for more information.
The kernel is now built with -Wcast-function-type, which generates a warning when function pointers are cast to an incompatible type. This check is necessary to avoid setting off control-flow integrity alarms.

There are currently about 4,000 changesets sitting in linux-next, suggesting that activity will be a bit slower for the rest of the merge window. That said, there are undoubtedly some interesting changes yet to land in the mainline; that should happen by the time the merge window closes on January 23. Stay tuned for our second-half summary, which will arrive shortly after that date.

Comments (8 posted)

Struct slab comes to 5.17

By Jonathan Corbet
January 14, 2022

The ongoing memory folio work has caused ripples through much of the kernel and inspired a few side projects, one of which was the removal of slab-specific fields from struct page. That work has been pulled into the mainline for the 5.17 kernel release; it is thus a good time to catch up with the status of struct slab and why this work is important.

`struct page` and `struct slab`

The page structure is at the core of the memory-management subsystem. One of these structures exists for every page of physical memory in the system; they are used to track the status of memory as it is used (and reused) during the lifetime of the system. Physical pages can adopt a number of different identities over time; they can hold user-space data, kernel data structures, DMA buffers, and so on. Regardless of how a page is used, struct page is the data structure that tracks its state. These structures are stored in a discontiguous array known as the system memory map.

There are a few problems that have arisen with this arrangement. The page structure was significantly reorganized for 4.18, but the definition of struct page is still a complicated mess of #ifdefs and unions with no mechanisms to ensure that the right fields are used at any given time. The unlucky developer who needs to find more space in this structure will be hard put to understand which bits might be safe to use. Subsystems are normally designed to hide their internal data structures, but struct page is heavily used throughout the kernel, making any memory-management changes more complicated. One possible change — reducing the amount of memory consumed by page structures by getting rid of the need for a structure for every page — is just a distant dream under the current organization.

So there are a lot of good reasons to remove information from struct page and hide what remains within the memory-management subsystem. One of the outcomes from the folio discussions has been a renewed desire to get a handle on struct page, but that is not a job for the faint of heart — or for the impatient. Many steps will be required to reach that goal. The merging of the initial folio patches for 5.16 was one such step; the advent of struct slab in 5.17 is another.

There are many memory allocators inside the kernel, but two sit at the core of the memory-management subsystem and are responsible for most allocations in a running system. The page allocator, as its name suggests, deals only in units of pages; it is used when larger amounts of memory are needed. The slab allocator, instead, efficiently handles allocations of smaller objects, including those done with functions like kmalloc(). The slab allocator will obtain blocks of one or more pages from the page allocator, then split those blocks up and hand out the pieces as needed. There are actually three slab allocators supported by the kernel (described below), but one of them must be chosen at configuration time.

When the slab allocator allocates pages, those pages are marked inside the associated page structures as being slab pages, and the interpretation of numerous fields within those structures changes. The slab-specific information does not really need to be in struct page, and the slab allocators shouldn't need access to the other information in that structure, but it is all mixed together anyway.

Changes for 5.17

The separation of struct slab is a first step toward remedying this situation. For now, struct slab overlays the page structure and, thus, still uses the same memory, but the new slab structure hides struct page and constrains the slab allocators to using only the slab-specific data stored there. This work was originally done by Matthew Wilcox as part of the folio effort; it was later taken on and pushed to its conclusion by Vlastimil Babka.

The kernel currently supports three slab allocators: SLAB (the original allocator), SLUB (a newer allocator, focused on scalability, that is normally used outside of embedded applications), and SLOB (a tiny allocator for highly memory-constrained systems). The allocator that any given kernel will use is chosen at build time using a configuration option. One of the changes Babka made to the patch set was to further narrow the definition of struct slab to only the fields needed for the chosen allocator. There is still only one definition with a set of #ifdef blocks, but it might make more sense to view the end result without them. If the SLAB allocator is selected, struct slab ends up looking like this:

    struct slab {
        unsigned long __page_flags;
        union {
            struct list_head slab_list;
            struct rcu_head rcu_head;
        };
        struct kmem_cache *slab_cache;
        void *freelist;    /* array of free object indexes */
        void *s_mem;    /* first object */
        unsigned int active;
        atomic_t __page_refcount;
    #ifdef CONFIG_MEMCG
        unsigned long memcg_data;
    #endif
    };

(The allocator-specific fields are shown in bold). If, instead, SLUB is configured, the structure becomes:

    struct slab {
        unsigned long __page_flags;
        union {
            struct list_head slab_list;
            struct rcu_head rcu_head;
    #ifdef CONFIG_SLUB_CPU_PARTIAL
            struct {
                struct slab *next;
                int slabs;    /* Nr of slabs left */
            };
    #endif
        };
        struct kmem_cache *slab_cache;
        /* Double-word boundary */
        void *freelist;        /* first free object */
        union {
            unsigned long counters;
            struct {
                unsigned inuse:16;
                unsigned objects:15;
                unsigned frozen:1;
            };
        };
        unsigned int __unused;
        atomic_t __page_refcount;
    #ifdef CONFIG_MEMCG
        unsigned long memcg_data;
    #endif
    };

And for SLOB it is:

    struct slab {
        unsigned long __page_flags;
        struct list_head slab_list;
        void *__unused_1;
        void *freelist;        /* first free block */
        long units;
        unsigned int __unused_2;
        atomic_t __page_refcount;
    #ifdef CONFIG_MEMCG
        unsigned long memcg_data;
    #endif
    };

This organization helps to ensure that one slab allocator does not accidentally use fields belonging to another, yielding another increase in type safety.

The new structure lives in mm/slab.h; it is not under include, and thus it is not available to code outside of the memory-management subsystem. That created trouble for the x86 bootmem allocator and zsmalloc(), both of which were using the slab-specific fields in struct page even though they are not slab allocators. Those usages have been changed to other struct page fields; comments have also been added suggesting that this usage should be cleaned up someday.

Meanwhile, the code within the slab allocators has been changed to use the new structure, with the conversion from struct page happening at the beginning of the call chains. That isolates most of the slab code from struct page, paving the way for future work that could separate the two structures entirely and allow slab structures to be allocated dynamically as needed.

The end result is a view into the system memory map for slab allocators that begins to separate them from the lower-level memory-management details and increases type safety. Linux users, meanwhile, should see no changes other than, with luck, a reduction in the number of bugs going forward. Further in the future, there may come a time when struct slab can be dynamically allocated and separated entirely from the memory map. That change will be a while in coming, though; meanwhile, a cleaning up of the core memory-management types is a step in the right direction.

Comments (2 posted)

A note for LWN subscribers

By Jonathan Corbet
January 18, 2022

January 22, 2022 will be the 24th anniversary of the publication of the first LWN.net Weekly Edition. A lot has happened in the intervening years; the Linux community has grown immeasurably, and LWN has grown with it. Later this year will also be the 20th anniversary of the adoption of our subscription-based model, which has sustained LWN ever since. There is a change coming for our subscribers that will, with luck, help to set up LWN to thrive in the coming years.

The nominal price for an LWN subscription is $7 per month, a price that has remained unchanged since 2010. That $7 buys a lot less now than it did twelve years ago. Your editor is reliably informed by the Internet that inflation in the US has been just under 28% from 2010 until the middle of 2021; that rate doesn't include the last few months. Prices for some things, most notably health insurance in the US, have increased by rather more than that.

The bottom line is that the time has come to ask LWN subscribers to give us a little more. Indeed, perhaps it is past time, but we have been hesitant to ask more of a community that has been so supportive for the last 20 years. There comes a point, though, where things need to be brought back into balance, and this is that point.

Thus, starting February 1, the nominal rate for an LWN subscription will increase to $9/month, which is almost exactly in line with that mid-2021 inflation rate. There will be similar changes to subscriptions at the "starving hacker" and "project leader" levels, which will be $5 and $16 per month, respectively. We are leaving the price for our "maniacal supporter" subscriptions the same; we greatly appreciate the generosity of those of you who have chosen to support us at that level and do not feel like we could ask for more.

Subscriptions purchased prior to that date will, of course, remain good for the covered period. Subscribers paying monthly will continue to be billed for the old rate for the number of months of authorized charges they had remaining as of the posting of this article (up to one year), after which the new rate will apply. Rates for group subscriptions will also be changing; the new rates will be communicated at renewal time.

Our goal has always been for an LWN subscription to be both essential and affordable; we believe that this remains the case even at the new rates. Meanwhile, the increased revenue will help us as we work to increase staff and position LWN for the next 24 years. The Linux and free-software communities have a lot of work to do yet; our plan is for LWN to be part of that process.

LWN's subscription model was relatively rare at the time we adopted it, and it was far from clear whether it would succeed. The degree of its success has been somewhat uneven over the years but, in recent times, LWN has been on a solid (if understaffed) financial footing. The subscription model has indeed succeeded in keeping LWN around, and it ensures that our interests are aligned with those of our readers — and not with advertisers. It works because all of you have chosen to support us, and for that we are deeply grateful. Thank you, again, for your support; we would not have been here all this time without you.

Comments (55 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>