LWN.net Weekly Edition for May 5, 2022
Welcome to the LWN.net Weekly Edition for May 5, 2022
This edition contains the following feature content:
- An overview of structural pattern matching for Python: a PyCon presentation on the history and use of the match operation.
- Modern Python performance considerations: what's being done — and what developers can do — to make Python code go faster.
- Printbuf rebuffed for now: a pretty-printing mechanism collides with code already in the kernel.
- The BPF allocator runs into trouble: how a dedicated memory allocator revealed problems in the memory-management subsystem.
- NUMA rebalancing on tiered-memory systems: when you have multiple types of memory, how does the kernel decide which pages go where?
- The 2022 Linux Storage, Filesystem,
Memory-Management, and BPF Summit: the bare beginning of our LSFMM
coverage:
- A memory-folio update: the current status and future direction of the folio project.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
An overview of structural pattern matching for Python
Python's match statement, which provides a long-sought C-like switch statement—though it is far more than that—has now been part of the language for more than six months. One of the authors of the series of Python Enhancement Proposals (PEPs) that described the feature, Brandt Bucher, came to PyCon 2022 in Salt Lake City, Utah to talk about the feature. He gave an overview of its history, some of its many-faceted abilities, a bit about how it was implemented, and some thoughts on its future, in a presentation on April 29, which was the first day of talks for the conference.
Bucher said that he was studying computer engineering when he encountered Python, which made him realize that he liked developing software a lot more than he liked designing hardware. He got involved in Python core development during that time and has been a core developer for the language for nearly two years at this point. He is now working for Microsoft on the Faster CPython team, though his biggest project to date has been the work on shepherding and implementing structural pattern matching (i.e. match), much of which was done while he was working for Research Affiliates.
History
The genesis of the PEP was a "nerd sniping email on a Wednesday during COVID" from Guido van Rossum to Bucher to see if he wanted to be part of the effort to try to add a match statement to the language. Bucher had been looking for a project, so he was definitely interested and joined the team of six authors working on what became PEP 622 ("Structural Pattern Matching"). It was, Bucher said, "a monster": a huge PEP that was not written with a clear audience in mind, which contained lots of extra bells and whistles that bloated the PEP (and feature).
The Python steering council diligently read the PEP, he said, and asked the authors to rewrite it so that the council could render a judgment on the feature. That resulted in three new PEPs to fully describe it:
- PEP 634: "Structural Pattern Matching: Specification"
- PEP 635: "Structural Pattern Matching: Motivation and Rationale"
- PEP 636: "Structural Pattern Matching: Tutorial"
Making that change improved the PEP, by splitting it into three pieces, each of which was aimed at a different audience. Bucher said that while PEP 634 is important, he hoped that the only people who read it were those who were implementing the feature. The other two are less dry and more approachable, he said; PEP 635 was specifically targeted at making the case to the council and other core developers. PEP 636 is one that everyone should read; it shows how to use match by walking through an example of a text-based adventure game.
The PEP authors used a dedicated GitHub repository for collaborating on the feature. That repository still exists and it makes a good record of the project going forward, he said. The decisions that were made and the reasons behind them are contained in the pull requests, issues, and so forth. During the process, several medium-sized applications using match were developed; as the feature changed during the process, that allowed the team to judge if a change truly improved the feature or not.
Not a switch statement
Bucher stressed, as he did multiple times in the discussions along the way, that the feature is not a switch statement. It looks a lot like a switch statement and can behave like one, but there is a lot more to it. By seeing it as only that, much of the power of structural pattern matching is lost. His goal in the talk was to convince attendees that the match feature is one that has far more uses and that it deserves a place in their code.
Structural pattern matching combines two features that Python developers are already familiar with: control flow and destructuring. Control flow is "just branching; we do this all the time". He showed two simple if-elif-else code snippets, one of which looked at the value of a particular object and the other that looked at the "shape" of the object by looking at its length. The control flow was changed based on tests of those two characteristics of the object.
Destructuring is simply pulling data out of an object, which can be done in various ways. One of his tests was accessing meal[0], which uses sequence unpacking, but there are other destructuring operations, such as extracting the key-value pairs from a dictionary or accessing arbitrary attributes on a Python object.
So structural pattern matching is allowing the programmer to branch while they destructure or destructure while they branch. The feature allows a "very powerful declarative programming style that creates a new way of viewing a lot of the same problems that we have been struggling with for a very long time". As he pointed out, though, he was roughly one-third of the way through the talk and had not actually shown a match statement, so he was about to change that.
He put up a slide with a simple example of the feature:
match meal:
case entrée, side:
...
The syntax is to have match followed by an expression to
evaluate. Then there can be one or more case statements, each of
which has a pattern following the word case He wanted to
emphasize that "entrée, side" is a pattern and not an
expression. Instead of telling Python to create a tuple of two elements,
the code is saying how it wants the language to pull apart a sequence of
length two should it encounter one in meal. It is sort of like
the pattern is the left-hand side of an assignment statement, Bucher said.
To clarify what the code is doing, he put up the equivalent code from Python 3.9:
if isinstance(meal, Sequence) and len(meal) == 2:
entrée, side = meal
...
As with the match version, the code matches a sequence of length
two and pulls out the two elements of the sequence into entrée and
side. The match pattern used is a "sequence pattern" that will
match any sequence of, in this instance, two elements. Inside the
sequence pattern are two "capture patterns" that specify the variable names to
destructure a matching sequence into. Sequence patterns can use either parentheses or square
brackets if desired, so the following two patterns are equivalent to the original one
above:
case (entrée, side):
case [entrée, side]:
The "_" is used as a wildcard pattern, which is like a capture pattern, in that it will match anything, but it does not bind any value, unlike a capture pattern. So, a pattern of "[_, side]" would match a two-element sequence but only bind side as if only meal[1] were extracted in the equivalent Python 3.9 code. A common idiom is to use "case _:" as the last case entry to serve as a catch-all. Each case is tried in turn; if none of the previous ones match, the catch-all will match and can, for example, raise an exception.
The other pattern type that Bucher described was value patterns, which just look like a literal value:
case ["Spam", side]:
...
# equivalent to:
if isinstance(meal, Sequence) and len(meal) == 2 and meal[0] == "Spam":
side = meal[1]
...
Beyond just strings, value patterns support byte strings, integers, floats, complex
numbers, booleans, and None; value patterns effectively just use the
equality operator. There are other pattern types, but he wanted to focus
on these in the talk; he suggested reading PEP 636 to find out
more. He did quickly show a few more examples, without dwelling on them, just to
give attendees a taste of the other matching abilities:
case ["Spam" | "eggs", side]:
case ["Spam", side] if not self.has_tried(side):
case {"entrée": "Spam", "side": side}:
case {"meal": ["Spam", side]}:
case Meal(Food("Spam"), Food(side)):
The last one is a class pattern, which he finds to be the most exciting
pattern type. The example will match a Meal object that contains
two
Food objects; it is not actually creating any instances of those
classes, simply instructing the language how to pull one apart if it
encounters it.
The case compiles to a bunch of isinstance() checks and attribute accesses.
Class patterns are what led to his "big 'a-ha' moment" with the feature. He was writing code for a red-black tree, which is a self-balancing binary tree; it is a data structure that he never fully understood, even though he had implemented one before. Implementing the operations for the tree (e.g. insertion, rebalancing, deletion, etc.) using class patterns "really really clarified how the logic worked" because it allowed him to see "the shape of the data" and the pieces he was working with as they were being rearranged.
"We didn't invent really any of this", Bucher said. Structural pattern matching has been around, mostly in functional programming languages, for more than 50 years. He showed a short function to recursively calculate the factorial of a number using match:
def f(n: int) -> int:
match n:
case 0 | 1:
return 1
case _:
return n * f(n - 1)
He showed the equivalent programs in Rust and Scala, using their
match features, which look quite similar the one above. The PEP authors
definitely looked at the other languages with an eye to making Python
programmers able to read (and, perhaps write) structural pattern matching
code for other languages—and vice versa.
Implementation
He turned to the implementation of the feature, which he said took him around nine months, all of it during COVID lockdown. The match feature was able to take advantage of a new capability that comes with the PEG parser that is now used for CPython. The "match" and "case" strings are "soft keywords"; the parser is able to recognize when they are being used as keywords versus when they are used as regular identifiers. He cannot take credit for that part of the feature as Van Rossum implemented it as one of the first things that was done.
Soft keywords allowed the feature to be added without interfering with existing code that uses those names as identifiers, which is likely in plenty of code in the ecosystem. So they were able to add a big new feature, with a lot of surface area, that completely maintained backward compatibility. He showed an example of converting some code that uses those names, which nicely demonstrates that:
...
match = re.match(PATTERN, "...")
if match is not None:
case, status = match
if status == "closed":
...
# could become
...
match = re.match(PATTERN, "...")
match match:
case case, "closed":
...
It looks kind of odd, but it works, he said.
He described a bit about the structural pattern matching compiler, which he likes to call the "SPaM compiler" for brevity's sake. He showed and compared the bytecode output for both versions of the first example above, starting with the Python 3.9 version. Even though both versions do the same thing, the version using match was noticeably shorter because it takes advantage of a new opcode (MATCH_SEQUENCE). That opcode was implemented by Mark Shannon and it does the same isinstance() check as the Python 3.9 code, but effectively does so as quickly as is possible by checking a flag on the CPython object and it is implemented in the CPython runtime. The bytecode also uses a new GET_LEN opcode, rather than pushing the name "len" onto the stack and then calling it; it just directly checks the length of the sequence in the CPython object.
That example shows the advantage of having dedicated syntax in the language for the match feature, Bucher said. The Python compiler has a lot of extra knowledge about exactly what you are trying to do within each case, so it can generate more efficient code. "This is the sort of thing that you only get from native syntax"; other techniques like transpiling to Python or using a Python package to achieve a similar effect cannot be as fast.
Future plans
He finished the talk by looking at some features that he is working on for the future; he has mostly working code for them, but they are not ready to be added quite yet. If he finds the time to finish them up, they may become part of Python 3.12 (which is due in October 2023). "The future is faster."
One area that he is "really excited about" is improved control flow. He gave an example:
match meal:
case ["Spam", side]:
print("Yay, Spam!")
case ["eggs", side]:
print("Oh, eggs?")
case [_, side]:
print("Hm, something else.")
He noted that the existing implementation steps through the cases in a fairly
simplistic way. In particular, it checks whether meal is a
sequence three times, checks the length of the meal
sequence three times, and assigns side three times, but none of
those is really necessary to do more than once. So the new code will
recognize that it only needs to check if meal is a sequence with a
length of two and, if so, assign meal[1] to side and then branch based on
the value of meal[0].
This optimization uses a technique known as decision trees, which has a lot prior art and research done on it for languages like OCaml. It allows merging overlapping checks for adjacent patterns. "It is not too difficult to do; Python throws a couple of curve balls your way that make it non-trivial", but he has a 98% working version of it. It can apply to many different kinds of patterns beyond just the one he showed.
Something that falls out of the decision-tree work is improved reachability checks. If a particular case cannot be reached because what it would match is already matched by previous entries, CPython can detect that and give the developer a warning. For example:
for number in range(100):
match number % 5, number % 3:
case _, 0: print("Spam!")
case 0, _: print("Eggs?")
case 0, 0: print("Spam and eggs.")
case _, _: print(number)
With that code, the third case is not reachable because one of the
two before it will catch that situation, but it may not be obvious when
looking at the code. It is difficult to determine that with the current
CPython compiler, but it is trivial to detect using the decision-tree
analysis and to give a warning. The
solution in this case is simple: just move the third case up to
the top.
That completed his talk, but he could not resist a joke at the end: "I heard there was great hiking in Salt Lake City but I did not realize it was inside the convention center." That was met with laughter and applause from the large, standing-room-only crowd of attendees. It was indeed something like a ten-minute walk to most of the track rooms from the keynotes and expo floor.
But, after two years of virtual PyCons, everyone seemed happy to return to something approaching normalcy. The vaccination and mask requirements helped to give attendees some level of comfort with maintaining their health at the event as well. Overall, this year's return to an in-person PyCon seemed like it was a rousing success.
[I would like to thank LWN subscribers for supporting my trip to Salt Lake City for PyCon.]
Modern Python performance considerations
There is a lot of work going on right now on speeding up Python; Kevin Modzelewski gave a presentation at PyCon 2022 on some of that work. Much of it has implications for Python programmers in terms of how to best take advantage of these optimizations in their code. He gave an overview of some of the projects, the kinds of optimizations being worked on, and provided some benchmarks to give a general idea of how much faster various Python implementations are getting—and which operations are most affected.
Modzelewski works at Anaconda on the Pyston "optimized Python interpreter". He wanted to focus on "modern Python" in the talk; there are lots of tips about how to speed up Python code available, but many of those are "not quite as useful anymore". There are some new tips, however, that can be used with these up and coming optimized implementations, which he wanted to talk about.
Why Python is slow
The first topic he raised, "why Python is slow", is somewhat divisive, he said; everyone seems to have a different view on that, but he would be presenting his personal views on it. This is not the first time we have reported on a talk by Modzelewski on this subject; he spoke at the 2016 Python Language Summit on the question of why the language is slow, along with a bit about Pyston.
The most common reason given is that interpreted languages are slow and, since Python is interpreted, it is slow, but that is not what he has found. In his measurements of web servers, the overhead of interpretation is about 10%; that is "significant, and people don't want it, but it doesn't explain why Python can be ten to 100 times slower than C". In order to explain that, you need to look at the dynamic nature of Python.
"Python is a very dynamic language", in lots of different ways, but he wanted to focus on just a few of those in the talk. The first is perhaps the most obvious, the interpreter does not know the types of any of the variables. That means any operation on a variable needs to look up what type it is in order to figure out how to perform the operation on that type.
The second dynamic behavior is in variable lookups, though that is a less obvious one. For example, "print" is not a keyword in Python, so a simple print statement needs to look up the value for the "print" name in order to call it as a function. In particular, it has to determine whether the name has been overridden by anything in the scope and it has to do that every time print() is called. The same goes things like len(), int(), and many more built-in functions; all of them require expensive lookups every time they are used.
The third aspect is Python's dynamic attribute lookup. In general, even in a single class, the interpreter does not know what attributes exist on an object of that class. That means it needs a dynamic representation for the object to track the attributes. Looking up the attributes in Python is pretty fast, but still much slower than it would be if the list of attributes were static.
There are three separate projects that are currently being worked on to try to speed up Python in various ways; all of them are "coming out in various forms either last year or this year". There is his project, Pyston, which is a fork of CPython, the Faster CPython project that is being worked on in the main CPython repository by a team at Microsoft, and there is Cinder, which is a fork of CPython that is being worked on at Instagram. All of them are available now in one form or another.
Benchmarks
The "controversial slide" of his talk did not look like much at the outset, as it was just an empty table. He would be filling it in with his benchmarks of various projects over the next part of the talk "with a lot of disclaimers". The first controversial piece of that is the choice of which benchmarks to use to analyze performance.
There is the well-established pyperformance benchmark, which is "nice in a lot of ways". It is a semi-standard that is used by a lot of people for reporting Python benchmark numbers. In his experience, though, it tends to overstate performance benefits, so he likes to look more at application code for benchmarks.
To that end, he wrote a benchmark using the Flask web framework. He chose Flask because it is one of the simpler Python web frameworks, so he thought he would be able to get more of the projects working with it. He would be showing results for both of those benchmarks for several different projects.
The "next controversial thing" is which CPython version to choose as a baseline to measure the others against. He chose Python 3.8 because that is what Pyston is based on; all of the comparison numbers he presented were in relation to that baseline. He used the Ubuntu version of Python 3.8 because it is one of the faster builds of that version of Python. He was surprised to find that different builds were significantly faster or slower than others; Ubuntu is fast, while the macOS and Windows builds are slow.
"I get to list my project first", he said with a grin. He measured Pyston 2.3.2 as 62% faster on pyperformance, but only 34% faster on the flask benchmark. Those numbers are quite different, obviously, and he was not claiming that one benchmark was more accurate than the other. It just shows that it is important to choose a benchmark that is more representative of the kinds of programs you will be running.
He moved on to Python 3.11a7 from early April, which includes most of the Faster CPython work. "They also show good improvements on both of these numbers." On pyperformance, it was 15% faster; 10% faster for flask. The Faster CPython folks are reporting a different number for pyperformance, 25%, but that is not what he measured; "I don't know exactly where the difference is".
Cinder does not have releases, so he just grabbed the code from GitHub and built it. He got strange numbers that showed a marked decrease in performance compared to Python 3.8 (-51% for pyperformance and -18% for flask). He put question marks next to those because he does not believe they are real numbers; Instagram is using it internally and he doubts they would be using something slower. He wondered if perhaps there were patches that were not yet released into the GitHub tree.
He also benchmarked two other projects, both of which use just-in-time (JIT) compilers, PyPy and Pyjion. PyPy is fairly well-known, while Pyjion is less so, but he was curious to see the measurements for them. PyPy 7.3.9 is not able to run pyperformance because it does not support all of its dependencies and it was 36% slower on flask, which he believes reflects the different set of tradeoffs that PyPy has made. Pyjion was effectively a bust since he measured it at 1000 times slower on flask. That number got a double question mark because he does not think that reflects the numbers that the project is getting, but he "did not have time to sort all these things out before the talk, unfortunately".
What is being done
In a variety of ways, these projects are addressing the problems he identified that contribute to making Python slow. The interpretation overhead is being addressed in projects like Pyston and Cinder by way of JIT compilers. They convert the Python code as it is running into assembly instructions; "this sort of definitionally gets rid of interpretation overhead". While JIT compilation is interesting from a technical perspective, he would not be talking much about it because its gains come for free; the 10% overhead is nice to get back, but programmers cannot affect it much. Changing your code will not really make much of a difference one way or the other to the performance gains that JIT compilation brings, he said.
In what he called a "sweeping generalization", Modzelewski said that the three main projects he was focusing on were applying the same "bread and butter technique" in a variety of ways. Those techniques are based on the idea that most code "does not use the full dynamic power that it could at any given time" and that Python can quickly check to see if they are using the dynamic features. If those features are not being used, the language can do something fast instead of following the slower path needed to handle them.
That is the source of most of the speedups shown in the benchmarks. It sounds great, he said: "Python has dynamic features but you are not paying for them if you are not using them anymore." You can turn that statement around, however: you are paying for the dynamic features if you use them. Those features used to come for free, because you paid for them whether you used them or not.
But that situation has changed. You do not need to avoid dynamic features, and code will still get performance improvements if you continue to use them. But if you want to get the best performance possible, thinking about these things, and avoiding those features where possible, will make your code even faster, Modzelewski said.
Examples
The penalty for looking up built-ins (and other global variables) that he described at the beginning of the talk is one of the areas that has been optimized. If the code is using lots of print() or len() calls, for example, these newer Pythons speculate that that they have not been reassigned since the last time the lookup was done. It is easy in CPython to know whether any global variable has been reassigned since the last time the lookup was done; if not, the value of the lookup has to be the same as it was the last time. He showed two function definitions to demonstrate what he meant by reassignment:
def f():
global l
l = []
# slow:
print()
def f():
global l
l.append(1)
# not slow:
print()
In the first function, an explicitly global variable has been reassigned,
which means that the slower path needs to be used to look up print().
In the second function, l has simply been mutated, which does not
affect the speed since no global reassignment has been done.
He showed his measurements of a benchmark both with and without reassignments. For Python 3.8, the times were the same (12.3ns), which indicates that the price is paid in either case. For Pyston, there was a sizable reduction for the case with reassignments (9.5ns) and a huge boost for the case without them (1.7ns). Python 3.11a7 had a nearly two-fold increase in speed even with reassignments (6.4ns), and a less dramatic drop from there without them (5.9ns).
He cautioned that the numbers should not be taken too literally as he thinks they will evolve rapidly. He was a bit surprised by the measurements and suspects that the Faster CPython team will get some ideas from them as well. But the overall conclusion is that, in modern Python, not reassigning global variables will make the rest of the code run faster. He suggested that any needed global mutable state be stored in an object if faster performance is the goal.
Attribute lookup is similar in some ways. In general, an object's attributes are stored in a dictionary, which has a fast hash table implementation in Python, but it is still slower than in C where a value can be retrieved using a pointer. An individual attribute lookup is not terribly slow, but Python programs do a lot of them so it adds up.
The technical details are rather complex, he said, but at a high level the same idea is being applied to attributes: speculating that if a lookup "looks the same" as the previous one, it can be executed the same way it was done before. He showed two ways that changing an object's "shape" will affect its run-time performance:
# different shape
class Cls: pass
obj1 = Cls()
obj2 = Cls()
obj1.x = 1
obj2.y = 2
# type mutated
class Cls: pass
obj = Cls()
obj.x = 1
Cls.y = 2
In the first case, attribute lookup on the two objects will be slow for the
rest of the program once those statements have been executed. There are a
lot of ways that a class can intercept the lookup of its attributes,
but they are not usually used; the interpreter can know that
those interceptions have not been used before, but once the class itself is
changed, that situation may have changed. In the
second case, changing the class means that the current fast path for
class attribute lookup has to be bypassed because the
interpreter cannot know whether the change affects attribute lookups.
He made a benchmark to measure the two cases above and the "happy case" where neither of those was done. He reported those numbers, which showed that both Pyston and Faster CPython improved things considerably in nearly all of the cases, with the happy case showing roughly 6x speedup for Pyston and 3x for Python 3.11a7. The baseline measurement showed that the cost was much the same for all three, which demonstrates that the price of doing these kinds of things is always being paid.
Once again, those numbers are going to change over time, but the general idea is that avoiding those kinds of changes will improve the performance of programs. Changing the shape of the object is the worst of the two and the code where he has seen that being done looks to him like it was meant to be a memory-saving technique. But doing so forces the interpreter to use a less-efficient representation for the object, so that savings is illusory. In general, code should set attributes with the same names on all objects of the same class, and do so in the same order, if it can. In passing, he noted that using __slots__ is now the fastest way to handle attributes on classes in Python.
Method calls are a special-case of attribute lookup where the attribute's value is immediately used to call a function. There is some old advice that if you are repeatedly doing a method call, say in a loop, that retrieving the method once before the loop and caching it to use inside the loop is a way to get better performance. For Python 3.8, there is a noticeable improvement of about 66% when doing so, but the newer Pythons actually see a performance degradation.
The reason is that method calls is one of the areas where optimizations have been focused and, in general, the new Pythons "want to see more of your code at once to optimize more of it, especially in this particular case". But caching the method outside of the loop will mean that those optimizations no longer apply. That is most true for built-in types; for methods on Python classes, there is still an improvement for caching the method lookup, but it is much smaller for all three of the interpreters measured.
He also measured lookups for functions in modules. He chose math.sqrt() because it is effectively just a single instruction, so everything measured is the overhead of the lookup and call. There is an improvement, especially in Python 3.8 (86%), for caching that lookup, but it is fairly modest for the others (roughly 15% for both). Maybe that 15% is enough, he said, but it is quite a bit smaller than before. For more typical calls from modules that actually perform some work, the savings is even more modest in all cases.
As attribute lookups get faster, the benefits of this caching technique get smaller, Modzelewski said. His personal advice is to stop caching method and function lookups; it is not worth the mental overhead and readability hit. As these optimized Pythons get smarter, the savings will get even smaller. He did not think that kind of code should be removed from existing programs but thinks that particular piece of advice can be left by the wayside going forward.
Other considerations
There are dynamic features in Python that are expensive now, but will likely get even more expensive over time. He did not go into any detail about them, but pointed out that using some of them may have wide-ranging effects because they inhibit some of the optimizations that are being added. So they are not just expensive where you are using them, but they might affect the performance of other parts of the code.
A problem that the community needs to address is that attaching a profiler to a Python program effectively disables almost all of the optimizations. At least that is true for Pyston; he is not sure if it is the case for Faster CPython or Cinder. It is, in general, a hard problem, he said; developers may be profiling a different version of the code than will actually be running. There may need to be a different profiling API or some other way to solve that problem.
The last thing Modzelewski wanted to talk about was C extensions, which are generally used either for bindings to another language or for providing better performance. The common wisdom is to use Cython or some other mechanism to convert performance-critical code to C, "but this situation is getting pretty murky now". All of the optimizations that he had been talking about currently only apply to Python code, so C extensions have a certain set of optimizations, while the interpreter has a different set. So which is going to improve performance depends on which set is going to help your code the most.
It is hard to give a good rubric for when to choose one over the other, but he converted his attribute lookup benchmark to a C extension using Cython to see. It showed a good improvement over Python 3.8, but was far worse for Pyston. He apparently was unable to measure Faster CPython but did not say why that was. He noted that Cython does not do any of the optimizations he had been talking about, but there is no barrier to doing so that he is aware of, so those could be adopted by Cython over time.
His feeling is that object-oriented code is going to be helped more by the new interpreters, while numeric code will continue to be improved using C extensions. It is something that developers will need to verify for themselves, however, as the situation is rather complicated right now. Unfortunately there is not a lot of help available to guide developers toward writing more performant Python; using these tips, doing experiments, and benchmarking is the way forward at this point.
The overall goal of the new optimizations is to not make Python code pay for dynamic features that it is not using. That is great, he concluded, but it adds new complexity to the decisions programmers will need to make when they are trying to squeeze the best performance out of their code. Avoiding unneeded dynamic features, and finding other ways to accomplish the same goals, is generally the new path to follow, though.
[I would like to thank LWN subscribers for supporting my trip to Salt Lake City for PyCon.]
Printbuf rebuffed for now
There is a long and growing list of options for getting information out of the kernel but, in the real world, print statements still tend to be the tool of choice. The kernel's printk() function often comes up short, despite the fact that it provides a set of kernel-specific features, so there has, for some time, been interest in better APIs for textual output from the kernel. The "printbuf" proposal from Kent Overstreet is one step in that direction, but will need some changes to make it work well with features the kernel already has.A call to printk() works well when kernel code needs to output a simple line of text. It is not as convenient when there is a need for complex formatting or when multiple lines of output must be generated. It is possible to use multiple printk() calls for even a single line of text, just as it is with printf() in user space, but there is a problem: the kernel is a highly concurrent environment, and anything can happen between successive printk() calls, including printk() calls from other contexts. That results in intermixed output, often described with technical terms like "garbled", that can be painful to make sense of.
Printbuf
An answer to that problem is to assemble the complex output in memory, then to print it in a single operation. That is where printbufs come in. A printbuf is a simple structure containing a pointer to a char buffer and some housekeeping information, including the length of that buffer and how much of it contains valid data. Kernel code can set up a printbuf with something like:
#include <linux/printbuf.h>
struct printbuf buf = PRINTBUF;
PRINTBUF is a simple structure initializer that zeroes the entire thing. There is then a whole set of functions that will append text information to the buffer, including:
void pr_buf(struct printbuf *buf, const char *fmt, ...);
void pr_char(struct printbuf *buf, char c);
void pr_newline(struct printbuf *buf);
void pr_human_readable_u64(struct printbuf *buf, u64 v);
void pr_human_readable_s64(struct printbuf *buf, s64 v);
void pr_time(struct printbuf *buf, u64);
/* ... */
pr_buf() works like printk(), except that the resulting text ends up in buf rather than going directly to the system log. Many other functions exist for adding specific types of data to the buffer, some of which is shown above. At any time, the accumulated text can be found in buf.buf, which can be passed to printk() to output the whole buffer in a single call. When a printbuf is no longer needed, it should be passed to printbuf_exit() to free its resources.
Missing from this discussion so far is any mention of memory management. The printbuf code handles that; it allocates the string buffer, and reallocates it to a larger size whenever it threatens to overflow. Those allocations are done at the GFP_KERNEL priority, though printbuf can use GFP_ATOMIC if the atomic field in the structure is set to a true value. If an allocation fails, the code will make a note of it but will continue, dropping some output but preserving what it can.
When Overstreet first posted
this code in mid-April, one of the first comments was a one-liner from
Christoph Hellwig asking: "How does this use case differ from
that of lib/seq_buf.c?
" Overstreet, it seems, was
unaware of the seq_buf mechanism and, as a consequence, had
reimplemented much of it. His response was to propose replacing seq_buf
entirely with his new implementation.
Seq_buf
Seq_buf was first added to the kernel for the 3.19 release in 2014. It is meant to solve essentially the same problem, though the approach taken is a little different. A seq_buf uses a static buffer allocated by the caller; initialization looks something like this:
#include <linux/seq_buf.h>
char buf[MY_BUFFER_SIZE];
struct seq_buf seq;
seq_buf_init(&seq, buf, MY_BUFFER_SIZE);
The process of generating output in a seq_buf is strikingly similar to the approach used for printbuf; there is a familiar-looking series of functions, including:
int seq_buf_printf(struct seq_buf *s, const char *fmt, ...);
extern int seq_buf_puts(struct seq_buf *s, const char *str);
extern int seq_buf_putc(struct seq_buf *s, unsigned char c);
extern int seq_buf_putmem(struct seq_buf *s, const void *mem,
unsigned int len);
/* ... */
Sending the contents of a seq_buf to the log is a simple matter of calling printk() with the previously allocated buffer. This API also includes functions like seq_buf_to_user(), which will copy the contents of a seq_buf into user space. On the other hand, it lacks some of the fancier formatting features provided by the printbuf mechanism. Arguably, though, the biggest difference between the two interfaces is the automatic memory management done by printbuf. A seq_buf can run out of space but, in the absence of allocation failures, a printbuf never will.
Reconciling the two
There would appear to be agreement that the printbuf submission brings some useful features, but there is little interest in having two subsystems in the kernel that do the same job. So it is not surprising that Overstreet was advised to set printbuf aside and, instead, add any needed capabilities to seq_buf. Steve Rostedt, who wrote the original seq_buf code, offered to help with that task.
Overstreet was not thrilled with that idea, though:
Printbuf is the more evolved, more widely used implementation, and you're asking me to discard it so the kernel can stick with its more primitive, less widely used implementation.
The "more widely used" claim raised some eyebrows, given that printbuf is not in the kernel and thus, with regard to the mainline, not used at all. He was, it seems, counting uses in his own, out-of-tree, bcachefs code — an argument that tends to carry little weight in the kernel community.
Meanwhile, a patch adding printbuf use in the memory-management subsystem drew questions from Michal Hocko, who was not convinced of the value of the new output that it generates. He later also raised concerns on the use of dynamic memory allocation for logging from the memory-management subsystem. When trying to log information about, for example, an out-of-memory situation, attempting to allocate more memory tends not to end well; at best it will dip into the final memory reserves that should be dedicated to the task of freeing memory.
The conversations continued over a few different thread branches, and got somewhat adversarial in a few of them. Overstreet made it clear, with references to "not-invented-here syndrome" and such, that he was not pleased with the reception given to his code. It began to look like one of those threads that leads to the developer involved walking away from the kernel community altogether.
Hopefully that is not how this discussion will end, though. The memory-management logging topic will have a session at the upcoming Linux Storage, Filesystem, and Memory-Management Summit. Meanwhile, Overstreet did eventually come to agree that implementing his features on top of the existing seq_buf code might be a viable path forward. Assuming that this direction works out, it could lead to the kind of resolution that the kernel community normally strives for: the incorporation of useful new features without duplicating mechanisms that the kernel already supports. The proof will be in the updated patch sets, if and when they are posted.
The BPF allocator runs into trouble
One of the changes merged for the 5.18 kernel was a specialized memory allocator for BPF programs that have been loaded into the kernel. Since then, though, this feature has run into a fair amount of turbulence and will almost certainly be disabled in the final 5.18 release. This outcome is partly a result of bugs in the allocator itself, but this work also had the bad luck to trip some older and deeper bugs within the kernel's memory-management subsystem.In current kernels, memory space for BPF programs (after JIT translation) is allocated using the same code that allocates space for loadable kernel modules; this would seem to make sense since, in either case, that space will be used for executable code that runs within the kernel. But there is a key difference between those two use cases. Kernel modules are relatively static; they are almost never removed once they have been loaded. BPF programs, instead, can come and go frequently; there can be thousands of loading and unloading events over the life of the system.
That difference turns out to be important. Memory for executable code must, unsurprisingly, have execute permissions set and thus, must also be read-only. That requires this memory to have its own mapping in the page tables, meaning that it must be split out of the kernel's (huge-page) direct mapping. That breaks up the direct map into smaller pages. Over time, this has the effect of fragmenting the direct map, which can affect performance measurably. The main goal for the BPF allocator was to segregate these allocations into a set of dedicated, huge pages and avoid this fragmentation.
Shortly after this code was merged, though, the regression reports, along with more general expressions of concern, started to roll in. That drew the attention of Linus Torvalds and other developers, and revealed a series of problems. While some of those problems were in the BPF allocator itself, the most disruptive issue come down to an older change made in an entirely different subsystem: the vmalloc() allocator.
vmalloc() and huge pages
vmalloc() (along with its inevitable variants) differs from other kernel memory-allocation interfaces in that it returns memory that is virtually contiguous, but which may be physically scattered. It is thus good for larger allocations where the memory need not be physically contiguous. Heavy use of vmalloc() was once discouraged due to its higher overhead and the shortage of available address space on 32-bit systems, but attitudes have changed over time. It is now reasonably common to use vmalloc() as a way of avoiding the possibility that a larger allocation might fail due to memory fragmentation. Functions like kvmalloc(), which will automatically fall back to vmalloc() if an ordinary allocation is not possible, have been added in recent years.
In 2021, Nick Piggin enhanced vmalloc() with the ability to allocate huge pages if the requested size is large enough. One might well wonder why this was useful, since vmalloc() is explicitly meant for cases when the memory need not be physically contiguous; the answer, of course, is that huge pages can give better performance by reducing pressure on the CPU's translation lookaside buffer. The kernel has a few larger allocations that can benefit from this improvement, so it was merged for the 5.13 kernel.
Even at the time, there were some caveats, though. There are places in the kernel that will be unpleasantly surprised by receiving huge pages in response to a vmalloc() call; these include the PowerPC module loader. So Piggin also added a flag, VM_NO_HUGE_VMAP, which requests that only base pages be used. Of course, vmalloc() takes no flags, so the ability to avoid huge-page allocations only could only be accessed via the low-level __vmalloc_node_range() function until vmalloc_no_huge() was added later in the 5.13 cycle. Huge-page allocations were also not enabled for the x86 architecture at that time since nobody had put in the time to look for potential problems there.
The first patch in the BPF-allocator series enabled huge-page allocations in vmalloc() for the x86 architecture; that was needed to make huge pages available to the BPF allocator. It all seemed to work fine until wider testing started to turn up problems; it seems that enabling huge pages in vmalloc() on x86 might not have been the best idea. Except that the problem actually had little to do with the x86 architecture.
When vmalloc() (as it existed at the beginning of the 5.18 cycle) would allocate a huge page in response to a request, the result was a compound page — a set of contiguous base pages that behaves like a single, larger page. These pages are organized differently; most of the information regarding their use is stored in the page structure for the first ("head") base page. The page structures for the following ("tail") pages mostly just contain a pointer to the head page. It is important not to treat tail pages as being independent, or bad things will happen.
Bad things happened. It turns out that the kernel does not lack for code that assumes it can treat memory from vmalloc() as being made up of base pages; this code will tweak individual page structures without noticing that it is dealing with tail pages. That leads to corruption of the system memory map and a kernel oops once that corruption is noticed. One case where this is known to happen, which was first noticed by Rick Edgecombe, is driver code calling vmalloc_to_page() to obtain a page structure somewhere within a vmalloc() allocation (and, thus, possibly in the middle of a compound page). It turns out that there are quite a few drivers using vmalloc_to_page(); each of those is almost certainly broken if the memory involved is made up of compound pages.
This particular problem was eventually fixed by Piggin; the
code now splits allocated huge pages back into base pages (while retaining the
huge-page mapping), taking tail pages out of the picture.
But there were some other surprises lurking within the vmalloc()
subsystem as well; as the issues accumulated, Torvalds concluded
that "HUGE_VMALLOC was badly misdesigned
". It was, he
said, buggy from the beginning; the problems only turned up now because
enabling the feature on the x86 architecture resulted in far wider testing.
Resolutions for 5.18
Piggins's fix was merged for the 5.18-rc4 prepatch. Meanwhile, Song Liu, the author of the BPF allocator patches, was working to find a set of solutions that would allow that allocator to be used safely; the result was a four-part patch set that:
- Removed the VM_NO_HUGE_VMAP flag in favor of a new VM_ALLOW_HUGE_VMAP variant. That changes the sense of the flag, making huge-page allocations an opt-in feature rather than opt-out.
- Caused alloc_large_system_hash() (which is used to allocate space for large hash tables) to opt into huge-page allocations, since they are known to be safe there.
- Added a function called module_alloc_huge() which also enables huge-page allocations.
- Used module_alloc_huge() to allocate the space used by the BPF allocator.
This response might have been sufficient if the wider use of huge
pages in vmalloc() was the only problem. Torvalds, however, didn't
like what he saw in the BPF allocator code either. Among other things,
he pointed out that it enabled execute permission on the allocated memory
without initializing it first, adding a bunch of random executable text to
the kernel's address space. He concluded: "I really don't think
this is ready for prime-time
".
Following through on that conclusion, he decided to apply just Liu's first patch, which had the effect of disabling huge-page allocations in vmalloc() entirely (since nothing used the new opt-in flag). Initially he intended to stop there, but later decided that the second patch was also safe to apply. Then he even went one step further, adding a patch of his own enabling huge-page allocations in kvmalloc(). The reasoning here is that memory returned from that function might have come from a slab allocator, so recipients should not be using low-level tricks with the underlying page structures in any case.
Liu has since fixed the uninitialized-memory problem in another patch series. BPF maintainer Alexei Starovoitov has tried to make the case that this work should be applied as well, making the BPF allocator available in the 5.18 release. Torvalds remains unconvinced, though, so this work seems more likely to be 5.19 (or possibly even later) material. BPF users will probably just have to wait one more cycle to have access to the specialized memory allocator.
There are a number of conclusions that can be drawn from this little episode. Tweaking low-level memory-management features is tricky and can create problems in surprising places. There is a lot of value in the widespread testing that comes with the more popular architectures; it will turn up bugs that can remain hidden on architectures with smaller user bases. But, perhaps most significantly, this is the kind of problem that lends credence to the claim that access to struct page should never have been allowed outside of the memory-management subsystem. Exposing such low-level details to the kernel as a whole was always going to lead to surprises of this type. Weaning the rest of the kernel off of struct page (which is just beginning to happen) will be a long and difficult task, but may well be worth the pain.
NUMA rebalancing on tiered-memory systems
The classic NUMA architecture is built around nodes, each of which contains a set of CPUs and some local memory; all nodes are more-or-less equal. Recently, though, "tiered-memory" NUMA systems have begun to appear; these include CPU-less nodes that contain persistent memory rather than (faster, but more expensive) DRAM. One possible use for that memory is to hold less-frequently-used pages rather than forcing them out to a backing-store device. There is an interesting problem that emerges from this use case, though: how does the kernel manage the movement of pages between faster and slower memory? Several recent patch sets have taken differing approaches to the problem of rebalancing memory on these systems.
Migration and reclaim
The kernel detects if a given page needs to be migrated using a technique called "NUMA hint-faulting". Ranges of a task's address space are periodically unmapped so that subsequent accesses to a page in the range will trigger a page fault. When a page fault occurs, the memory management subsystem can then use the location of the CPU that triggered the page fault to determine whether the page needs to be migrated to the node which contains that CPU. The absence of a fault altogether indicates that the page is getting colder, and may be migrated to a slow-tier node during reclaim. As workloads run and access patterns change, pages transition between hot and cold, and are migrated between fast and slow NUMA nodes accordingly.
Memory reclaim is driven by a "watermark" system that tries to keep at least a minimum number of free pages available. When an allocation is requested, the kernel compares the number of free pages in the node where the allocation is taking place to a zone watermark threshold. If the number of free pages in the node, after the allocation, is lower than the threshold specified by the watermark, then the kswapd kernel thread is awoken to asynchronously scan and reclaim pages from the node. This allows memory to be freed preemptively, before memory pressure in the node causes allocations to block, and direct reclamation to occur.
Zone watermarks in the kernel are statically sized according to the memory profile of the host. Systems with less memory will have lower zone watermarks, while watermarks for larger systems will be higher. Intuitively, this scaling makes sense. If you have a machine with a huge amount of memory, reclaim should probably be triggered sooner than on a machine with very little memory, as the expectation is that an application will be more aggressive in requesting memory on a system that has more of it. Yet, having a static threshold also has drawbacks. In the context of tiered-memory systems, if a node's threshold is too low, fast nodes may not reclaim aggressively enough, and there will be no space available to promote hot pages from the slow-tier nodes.
Optimizing reclaim and demotion
A recent patch set by Huang Ying highlights and addresses this problem. The premise behind this work is that the working-set size of workloads on systems with multiple memory types will, in the common case, exceed the total amount of fast DRAM in the system. This makes sense. If a system wasn't overcommitting DRAM, there would be no need to use other memory types in the first place.
The implication of this insight is that, on tiered-memory systems, pages will be constantly moved between fast and slow memory nodes as they're accessed by the application. If the fast nodes are near capacity, the kernel won't be able to promote globally hot pages into those nodes during rebalancing; resulting in higher-than-necessary access latencies due to hot pages residing on slow-tier nodes. The trick is, therefore, ensuring that sufficient pages are reclaimed from fast nodes such that, in addition to making space for future allocations, there is also enough room in the fast nodes to promote hot pages from slower nodes.
Ying's patch set addresses this need by introducing a new WMARK_PROMO watermark that is larger than the (previously highest) WMARK_HIGH watermark. When a page is unable to be migrated to a faster node due to memory pressure, kswapd is woken up to reclaim memory up to the new WMARK_PROMO threshold. This slightly more aggressive reclaim strategy better ensures that there is sufficient space for hot pages to be promoted from the slow memory nodes to the fast memory nodes, and thus better accommodates the working sets that are common on tiered-memory systems.
The controversy of statically sized watermarks
While adding the WMARK_PROMO watermark improves the chances that there will be sufficient space on fast nodes to promote hot pages from slower nodes, one has to wonder whether the general notion of static watermarks should be revisited. Consider that, even if the chosen watermark threshold is sufficiently high to ensure that pages may be promoted to a fast node, a threshold that is higher than necessary will leave DRAM unused, and the application's performance will be negatively impacted. The fact that a new watermark was required in the first place is indicative of the nature of the problem, which is largely dependent on both the characteristics of the system itself, and the workloads it's running.
The drawbacks of using a static watermark were discussed in reviews of the patch. For example, with regard to an earlier version of Ying's patch, which hard-coded the number of additional pages required during reclaim to be 10MB larger than WMARK_HIGH, Zi Yan questioned whether such a value made sense:
Why 10MB? Is 10MB big enough to avoid creating memory pressure on fast memory? This number seems pretty ad-hoc and may only work well on your test machine.
Ying acknowledged that the 10MB value was hard to justify and that there was room for improvement beyond the current implementation. The threshold was subsequently changed into the separate WMARK_PROMO watermark, based on a suggestion by Johannes Weiner, who also pointed out that another option was to have promotions dynamically boost the kswapd watermarks on demand. This would avoid the problem of DRAM being under-utilized, though of course it would also come at the cost of increased complexity.
There is certainly nothing wrong with incremental improvements, nor with sticking with a simple approach until more complexity is required. It will be interesting to see, however, whether the kernel will eventually require a more dynamic and flexible framework for expressing decisions regarding reclaim and page migration.
Avoiding page ping-pong
In addition to requiring fast nodes to have sufficient space for promoted pages, there is another problem that is unique to tiered-memory systems. In conventional NUMA setups, application working sets are typically sized to fit into one or more nodes. Once the application has reached a steady state and most or all of the pages are correctly located on the nodes where they're locally accessed, migration should taper off. Applications on tiered memory systems do not behave this way, though, since their working sets may not fit into the NUMA nodes they are running on. Rather than reaching a steady state, pages are instead continuously ping-ponged back and forth between slow and fast NUMA nodes as they're accessed by applications.
This is a related, but different problem than the one solved by Ying's patchset. The new watermark ensures that there's sufficient space on a fast node for hot pages to be promoted from a slow node, but doesn't prevent pages from being continuously and aggressively migrated between cold and fast nodes. If the overhead of performing the migration exceeds the performance gained from the improved access latencies of having a page on a local DRAM node, then the rate of page migration clearly needs to be adjusted. There have been multiple proposals for how to solve this issue.
One proposal by Ying involves recording the time that passes between a page in slow memory being unmapped to create a NUMA hint fault and when that fault is actually observed on a memory access. The shorter the time that the page was unmapped, the more likely it is that a page is actually hot. That time is compared to some threshold (tunable by a system administrator), and the page is only promoted if the elapsed time is within that threshold.
While time-since-access feels like a natural way to quantify page hotness, the approach is also quite complex, and requires adding a lot of new code. The question of what the threshold defining a page to be considered "hot" should be is also unclear; tuning by a system administrator may be required. A follow-on patch proposed a method to dynamically tune the threshold based on the volume of migrations, but it, too, is quite complex.
An alternative approach was proposed in a patch set by Hasan Al Maruf. When a page is demoted, it is removed from the active LRU list and placed onto the inactive LRU list. Al Maruf's patch updates the NUMA hint-fault handler to check whether a page is in this inactive state and, if so, move it to the active state and defer promotion until a subsequent fault. If the page is once again accessed, it will be observed as present on the active LRU, and the promotion will occur. The advantage of this solution is that it uses an existing mechanism in the kernel for tracking page hotness. As memory pressure increases and more pages are reclaimed, more pages are moved to the inactive LRU list, thus causing page promotion to be throttled proportionally.
A consensus has not yet been reached on which solution will be chosen, though Al Maruf's patch set will likely be accepted thanks to its simplicity and its use of existing mechanisms for tracking a page's hotness. While the solution is not expected to be controversial, there is always the Linux Storage, Filesystems, and Memory-Management Summit around the corner, where developers can discuss the merits of each approach in person.
The 2022 Linux Storage, Filesystem, Memory-Management, and BPF Summit
The Linux Storage, Filesystem, Memory-Management, and BPF Summit (LSFMM) has long been one of the key events for many core kernel developers. The last LSFMM event, though, was held in the innocent, pre-pandemic days of early 2019. After three years, it finally was possible to hold an in-person gathering at beginning of May 2022 in Palm Springs, California, USA. As usual, LWN was there.
Plenary sessions
There were no truly plenary sessions at LSFMM 2022, since the BPF group was always on its own schedule. The sessions listed below covered the storage, filesystem, and memory-management tracks.
- A memory-folio update: the current status and future direction of the folio project.
- Remote participation at LSFMM: a discussion with the remote participants on what worked—and didn't—for them.
Joint filesystem/memory-management sessions
- Coping with hardware-poisoned page-cache pages: What should the kernel do when memory errors corrupt a page in the page cache?
- Dealing with negative dentries: how the kernel can manage the cache of file-name lookup failures better.
- Recent RCU changes: a quick overview of read-copy-update (RCU) followed by descriptions of some of the bigger changes that have gone into it over the last few years.
- Page pinning and filesystems: this year's discussion on how to solve the problems with get_user_pages().
- Sharing memory for shared file extents: exploring a way to stop wasting memory with extents that are shared between multiple files.
Joint storage/filesystem sessions
- Unique identifiers for NFS: how to create and manage unique IDs needed by NFS.
- Challenges with fstests and blktests: testing for filesystems and the block layer is an area that needs more work—and collaboration.
- Adding an in-kernel TLS handshake: what is the best way to add support for initiating a TLS handshake from the kernel?
- Maintainers don't scale: a discussion on some of the challenges faced by Linux kernel maintainers.
- Best practices for fstests: how to collaborate on better testing infrastructure for filesystems.
- ioctl() forever?: a session looking at some of the problems with using the ioctl() system call along with a few alternatives to it.
- Zoned storage: the challenges with supporting zoned storage and sequential-write-only zones in particular.
- A discussion on readahead: how much data should the kernel be speculatively reading ahead into the page cache?
Memory-management sessions
- Ways to reclaim unused page-table pages: a lot of the memory used to hold page tables is not needed in that role and could be put to better uses.
- The ongoing search for mmap_lock scalability: the 2022 version of the perennial page-fault scalability topic, perhaps with some solutions on the horizon this time.
- Improving memory-management documentation: how to capture some of the "tribal knowledge" behind memory management and make it available to more developers.
- The state of memory-management development: Andrew Morton describes some changes to how memory-management patches are handled.
- Seeking an API for protection keys supervisor: protection keys can help to harden the kernel, but the best way of managing them is still unclear.
- Better tools for out-of-memory debugging: understanding out-of-memory problems is not easy; what can be done to improve the situation?
- Solutions for direct-map fragmentation: a number of new technologies need to carve pages out of the kernel's direct map; how can that functionality be supported without hurting performance?
- CXL 1: Management and tiering: the first three sessions on the Compute Express Link and how it should be managed by the Linux kernel.
- Proactive reclaim: there are reasons to want to reclaim memory in a more proactive manner, but it is not clear how any such feature should be controlled.
- Merging the multi-generational LRU: an extended discussion concluded that the time has come to merge this huge change, but some open questions remain.
- Sharing page tables with mshare(): page tables are not normally shared between processes, which can lead to massive overhead in situations where memory is highly shared. This session discussed a proposal for a new system call to enable page-table sharing between cooperating processes.
- CXL 2: Pooling, sharing, and I/O-memory resources: CXL memory can be highly dynamic, making it hard to support in the kernel.
- Cleaning up dying control groups, 2022 edition: progress has been made with regard to getting dying memory control groups out of the way, but the problem is not yet fully solved.
- get_user_pages() and COW, 2022 edition: the long slog toward making page pinning work properly in the kernel.
- Fixing a race in hugetlbfs: using hugetlbfs can save a lot of page-table overhead for massively shared regions, but the implementation currently has an unpleasant and difficult-to-fix bug.
- Preserving guest memory across kexec: a proposed mechanism to allow a host kernel to be updated while minimally disturbing guest virtual machines.
Filesystem sessions
- Changing filesystem resize patterns: filesystems that are created small, but continually resized to be much larger, are causing some headaches.
- The netfslib helper library: a relatively new library to collect up common operations for network filesystems.
- Dynamically allocated pseudo-filesystems: a discussion on the right path for adding a general facility that would allow pseudo-filesystems (e.g. tracefs, debugfs) to reduce their memory footprint by allocating inodes and directory entries only when needed.
- Bringing bcachefs to the mainline: The bcachefs filesystem may be getting close to ready for merging.
- Snapshots, inodes, and filesystem identifiers: Filesystems that support snapshots, thus duplicate inode numbers, can be problematic.
- Change notifications for network filesystems: how to allow Linux clients to monitor additions, deletions, and other changes in network filesystems.
- Making O_TMPFILE atomic (and statx() additions): two half-sessions looking at features that could be added for filesystems.
- ID-mapped mounts: a discussion on the relatively new ID-mapped mounts feature for Linux filesystems.
- Filesystems, testing, and stable trees: the stable and, especially, LTS trees need more filesystem testing, but it is currently hard to do.
- Retrieving kernel attributes: a discussion on a proposed interface for getting information out of the kernel using the extended-attribute API.
- Disabling an extent optimization: filesystems do not necessarily maintain holes in files as holes, rather than just regions with zeroes, which can lead to some problems.
Other resources
- Christoph Hellwig has posted a summary of the action items from the BPF-track session on standardization.
The obligatory group photo
We would like to thank LWN subscribers for supporting our travel to and reporting from LSFMM 2022.
A memory-folio update
The folio project is not yet two years old, but it has already resulted in significant changes to the kernel's memory-management and filesystem layers. While much work has been done, quite a bit remains. In the opening plenary session at the 2022 Linux Storage, Filesystem, Memory-management and BPF Summit, Matthew Wilcox provided an update on the folio transition and led a discussion on the work that remains to be done.Wilcox began with an overview of the folio work, a more complete description of which can be found in the above-linked article. In short, a folio is a way of representing a set of physically contiguous base pages. It is a response to a longstanding confusion in the memory-management subsystem, wherein a "page" can refer either to a base page or a larger compound page. Adding a new term disambiguates the term "page" and simplifies many memory-management interfaces.
Beyond terminology, there is another motivation for the folio work. The kernel really needs to manage memory in larger chunks than 4KB base pages. There are millions of those pages even on a typical laptop; that is a lot of pages to manage and a pain to deal with in general, causing the waste of a lot of time and energy. Better interfaces are needed to facilitate management of larger units, though; folios are meant to be that better interface.
Current status
A folio is represented by struct folio; it is essentially an alias for the head page of a compound page. Wilcox has been adding uses of folios into the kernel over the course of the last year; this project has come a long way but is not yet complete.
One open question concerns when the kernel should allocate large folios — those
containing more than one base page. Only the readahead code allocates them
now; the filesystem write path still does everything in terms of base
pages. If writes are done to large folios that were brought in via readahead,
they will see and use those large folios. Appending to a file will
always use base pages, though. There are almost certainly advantages
to using large folios in the write path, but it will be necessary to figure
out what the criteria for creating them will be.
Meanwhile, the process of converting filesystem code to folios continues. Wilcox encouraged filesystem developers to look for infrastructure that already exists when possible rather than reimplementing it themselves. He pointed out the support layer for network filesystems that was recently rewritten by David Howells. It would also be good for filesystems to move away from the old buffer-head APIs and use the relatively new iomap infrastructure whenever possible.
Ted Ts'o said that more guidance on conversion to iomap would be useful. Moving a filesystem over can be a daunting task, he said, but developers should understand that it can be done incrementally. For example, a filesystem's read path can be converted while leaving the write path unchanged for now. This can be useful, Wilcox agreed, especially since iomap is still missing some capabilities, such a support for features like fs-verity or compression. That lack is often more problematic on the write side than on the read side.
API complaints
Josef Bacik said that one particularly annoying problem for Btrfs is that the memory-management subsystem's page locks must be taken before filesystem-level locks. That makes it hard at the filesystem level, and gets in the way of needed features like range locking. He would love to see this issue addressed, but knows that it will not be easy. Wilcox admitted that this problem had not been on his radar at all, but it is something he will have to look into. Chris Mason noted that the problem is not specific to Btrfs; other filesystems have encountered similar difficulties over the years.
Bacik also said that page reclaim driven by memory management can also be problematic, and the interface to filesystems is not great. It would be good, he said, to be able to distinguish requests like "please free whatever memory you can now" from requests to free specific pages. Wilcox said that much of the kernel's reclaim machinery may not be relevant anymore; it was designed in the days when filesystems were far less capable than they are now. Good filesystems now are already keeping all of their drives busy doing writeback; there is really little more that they can do if the memory-management code wants them to free specific pages. Perhaps the memory-management subsystem should simply stop requesting the reclaim of pages that reach the end of the least-recently-used (LRU) list, he suggested.
There is a possible way to test that idea, he said; perhaps filesystems should simply remove their implementation of the writepage() address-space operation. Howells said that he had done that in the AFS filesystem, with seemingly good results. Some other filesystems, including 9P, will be harder though.
The problem there, Ts'o said, is that the memory-management subsystem is trying to solve multiple problems at the same time. When responding to global memory pressure, it just needs some pages to be freed and will not be that picky about where they happen to be. Once control groups enter the picture, though, it becomes necessary to relieve memory pressure within a specific container; that requires reclaim to be more focused. When compaction is being performed to create huge pages, it comes down to freeing specific pages. These cases need to be thought about separately. Removing writepage() may help with the global problem, but the need to free specific pages doesn't go away.
Wilcox expressed a hope that widespread use of large folios will help with the compaction problem at least, since there should be far less fragmentation in the first place. In some benchmark runs he has seen the length of the LRU lists reduced by a factor of 1000, which is "just insane".
On the other hand, he said, one potential problem resulting from large folios may be a form of write amplification. Dirty state is tracked at the folio level, not at the level of the individual base pages contained therein; when the time comes to write out data, the entire folio will be written even if only one byte has changed. This will increase the write bandwidth used by the system, but should also help to reduce fragmentation on copy-on-write filesystems. He said that he didn't expect "serious trouble" though.
Others were not so sure. Mason pointed out that Jens Axboe has been putting in a considerable amount of effort to make it easy to perform small operations in io_uring. This work is specifically motivated by write-bandwidth concerns. Axboe added that bandwidth is indeed a concern, but is more of a problem on the read side than with writes. There was some discussion on how big the problem actually is; one developer pointed out that the situation will vary depending on the filesystem in use. For a network filesystem with high latency, writing too much data may be better than doing multiple round trips with the server. There was a general agreement that better metrics are needed to understand the situation properly.
Longer-term goals
Moving on, Wilcox said that he is still in the process of converting the address-space operations provided by filesystems to folios; there are still a couple of them to be done. In many cases, this "conversion" is a matter of changing a function prototype to accept a pointer to struct folio rather than to struct page, then adding a line like:
struct page *p = (struct page *) folio;
This pattern is, he said, "a bad code smell"; it is a sign that the code in question needs further work. The plan is to eventually convert every filesystem to folios — but not necessarily to the point of using large folios.
There is an underlying motivation behind this work: he hopes to eventually remove one of the big union members from struct page, once filesystems are no longer using that structure. Memory-management developers, he said, want to put a lot more information into struct page, but there are strong reasons to not make that structure any larger. So, instead, he would like to shrink it; perhaps, someday, it can be reduced (from 64 bytes) to a single pointer. Even better, that could be one pointer per folio, rather than one structure per page, allowing the kernel to get back the 1.6% of memory that is currently used to hold page structures.
That, he said, will allow companies to save money on memory and use it to send their developers to more conferences.
Howells said that it would be good to eventually get rid of the write_begin() and write_end() address-space operations; Wilcox agreed, saying that they were originally designed for the needs of ext3, and later filesystems have had to fit into that model. Goldwyn Rodrigues pointed out that iomap is not currently using those callbacks.
Kent Overstreet complained about the practice of passing around structures full of callbacks, which he described as an "old model" of API design. Bacik said, though, that he doesn't really care about the API as long as it lets him focus on Btrfs and not have to worry about how memory management works. Wilcox answered that much of his work has been aimed at making filesystems easier to write in general, and he hopes that folios help in that regard. Nothing in filesystems should have to care about pages, he said, except for, possibly, the page-fault path.
Overstreet, though, objected that developers should care more about such things. Many of the kernel's internal interfaces have aged badly; developers should be talking about what the pain points are and how to remove them. Bacik said that the kernel needs developers who care about these interfaces specifically; he, personally, is on the edge of burnout and can't take on other tasks. So he is happy about the folio work; there is an owner who cares about the interface and is working to make it better. He said that this is hard, thankless work, and thanked Wilcox for taking it on.
Wilcox closed the session by acknowledging that the folio work is imposing costs on many other developers, and said that he feels the weight of that cost. Developers have made the costs clear to him, some more politely than others. He thanked Bacik for his comments, saying the he is glad that somebody, at least, sees the benefit of this work.
Brief items
Kernel development
Kernel release status
The current development kernel is 5.18-rc5, released on May 1. Linus said: "So if rc4 last week was tiny and smaller than usual, it seems to have been partly timing, and rc5 is now a bit larger than usual. But only a very tiny bit larger - certainly not outrageously so, and not something that worries me."
Stable updates: 5.15.37 and 4.19.241 were released on May 1.
Distributions
Fedora project leader Matthew Miller weighs in (TechRepublic)
TechRepublic has published an interview with Fedora project leader Matthew Miller.
Basically, every modern language provides a lot of building blocks that usually come from other smaller open-source projects. These are libraries, and they do things like format text, handle images, connect to databases and deal with talking across the internet. Projects like Fedora or Debian used to work to try to package up every such library in our own format, made to work nicely with everything else.Now, every new language — Rust, for example — comes with its own tools to manage these, and they don’t work nicely together with our old way. The sheer scale is overwhelming — for Rust alone, as I checked just now there are 81,541 such libraries. We can’t keep up with repackaging all of that into our own format, let alone that plus all of the other languages. We need to approach this differently in order to still provide a good solution to software developers.
I think a lot of that will need machine learning and automation … we’ll need to keep adjusting so we can provide the value that Linux distributions give users in trust, security and coherent integration at an exponential scale.
Distributions quotes of the week
I think the primary issue is that the crafting of binary packages is 'fairly' manual. Someone has to put the src.rpm in the meat grinder (koji) in the right order with the right spices (flags) to make the sausage at the other end. We rely on the cook to remember how they did it the last 10 times and that the taster (functional and ci) says it works. This normally works well but then it turns out that something swapped out somewhere and once 'fully cooked' (composed) that the sausage explodes.— Stephen Smoogen
Stable releases of core components of a major desktop should never contain bugs like "deleting contacts sometimes doesn't work" or "you can't add photos to an album in the Photos application because the dialog where you're supposed to do it is completely broken and the list entries multiply like rabbits who've been dosed up on viagra". Distribution validation testing is not *for* finding bugs like this.— Adam Williamson
Development
Firefox 100 released
Version 100.0 of the Firefox browser has been released. New features include video caption display on various proprietary sites, multiple-language spelling checking, invisible scrollbars, and more.Hughes: fwupd 1.8.0 and 50 million updates
Richard Hughes announces the fwupd 1.8.0 release and notes that the associated Linux Vendor Firmware Service has now shipped a minimum of 50 million firmware updates.
Just 7 years ago Christian asked me to “make firmware updates work on Linux” and now we have a thriving client project that respects both your freedom and your privacy, and a thriving ecosystem of hardware vendors who consider Linux users first class citizens. Of course, there are vendors who are not shipping updates for popular hardware, but they’re now in the minority — and every month we have two or three new vendor account requests.
DeVault: Announcing the Hare programming language
Drew DeVault has announced the existence of a new programming language called "Hare".
Hare is a systems programming language designed to be simple, stable, and robust. Hare uses a static type system, manual memory management, and a minimal runtime. It is well-suited to writing operating systems, system tools, compilers, networking software, and other low-level, high performance tasks.
SystemTap 4.7 released
Version 4.7 of the SystemTap tracing system is out. "Enhancements to this release include: a new stap-profile-annotate tool, a new --sign-module module signing option, -d is now implied for processes specified with -c/-x".
Miscellaneous
Willis: Engaging with the OSI Elections 2022.1
Nathan Willis took a long look at the Open Source Initiative's 2022 board election and wasn't entirely pleased with what he saw.
So it’s a troubling ballot to look at. There’s an ostensibly non-profit organization that’s an official OSI affiliate trying to run its CEO as an individual candidate while also running a second member (a board director) on the appropriate, affiliate ballot in the same election. There’s also two financial sponsors running candidates on the individual ballot, one of them (Red Hat) running two candidates at the same time for the two open seats.
Page editor: Jake Edge
Announcements
Newsletters
Distributions and system administration
Development
Meeting minutes
Miscellaneous
Calls for Presentations
CFP Deadlines: May 5, 2022 to July 4, 2022
The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.
| Deadline | Event Dates | Event | Location |
|---|---|---|---|
| May 15 | July 21 | Icinga Camp Berlin | Berlin, Germany |
| May 26 | September 15 September 18 |
EuroBSDCon 2022 | Vienna, Austria |
| May 30 | September 13 September 16 |
Open Source Summit Europe | Dublin, Ireland |
| May 30 | August 23 August 24 |
Open Source Summit Latin America | Online |
| June 9 | October 19 October 21 |
ROSCon 2022 | Kyoto, Japan |
| June 10 | September 21 September 22 |
Open Mainframe Summit | Philadelphia, US |
| June 15 | July 22 July 24 |
TUG 2022 - Online Presentations Covering the World of TeX | Online |
| June 19 | October 1 October 7 |
Akademy 2022 | Barcelona, Spain |
| June 19 | September 12 September 14 |
Linux Plumbers Conference | Dublin, Ireland |
| June 23 | July 23 July 24 |
IndiaFOSS 2022 | Bangalore, India |
| June 25 | September 19 September 21 |
Open Source Firmware Conference | Gothenburg, Sweden |
| June 27 | October 25 October 28 |
PostgreSQL Conference Europe | Berlin, Germany |
| June 30 | September 21 September 22 |
DevOpsDays Berlin 2022 | Berlin, Germany |
| July 1 | September 15 September 16 |
Linux Security Summit Europe | Dublin, Ireland |
If the CFP deadline for your event does not appear here, please tell us about it.
Upcoming Events
Events: May 5, 2022 to July 4, 2022
The following event listing is taken from the LWN.net Calendar.
| Date(s) | Event | Location |
|---|---|---|
| May 10 May 11 |
HOT - Heidelberg OSADL Talks | Online |
| May 10 May 11 |
Red Hat Summit 2022 | Boston, US |
| May 13 | PostgreSQL Conference Germany | Leipzig, Germany |
| May 13 May 14 |
Fedora Linux 36 Release Party | |
| May 17 May 19 |
Yocto Project Summit 2022.05 | Online |
| May 24 May 27 |
PGCon | Online |
| May 30 May 31 |
Embedded Recipes | Paris, France |
| May 31 June 2 |
sambaXP | Göttingen, Germany |
| June 1 June 4 |
BSDCan | Online |
| June 1 June 3 |
Kernel Recipes | Paris, France |
| June 2 June 4 |
openSUSE Conference 2022 | Nürnberg, Germany |
| June 2 | Devconf.CZ Mini IRL | Brno, Czech |
| June 7 June 9 |
Open Infrastructure Summit | Berlin, Germany |
| June 7 June 9 |
SUSECON 2022 | Online |
| June 8 June 9 |
Zephyr Developer Summit 2022 | Mountain View, CA, USA |
| June 10 June 12 |
OSFF Summer Hackathon 2022 | Darmstadt, Germany |
| June 10 June 12 |
South East Linux Fest | Charlotte, NC, USA |
| June 17 June 18 |
CentOS Dojo Summer Online 2022 | Online |
| June 21 June 24 |
Open Source Summit North America | Austin, TX, USA |
| June 21 June 25 |
The Perl and Raku Conference 2022 | Houston, Texas, USA |
| June 23 June 24 |
Linux Security Summit North America | Austin, TX, USA |
If your event does not appear here, please tell us about it.
Security updates
Alert summary April 28, 2022 to May 4, 2022
| Dist. | ID | Release | Package | Date |
|---|---|---|---|---|
| Debian | DSA-5125-1 | stable | chromium | 2022-04-27 |
| Debian | DSA-5126-1 | stable | ffmpeg | 2022-05-01 |
| Debian | DLA-2989-1 | LTS | ghostscript | 2022-05-01 |
| Debian | DLA-2985-1 | LTS | golang-1.7 | 2022-04-28 |
| Debian | DLA-2986-1 | LTS | golang-1.8 | 2022-04-28 |
| Debian | DLA-2990-1 | LTS | jackson-databind | 2022-05-02 |
| Debian | DSA-5127-1 | stable | kernel | 2022-05-02 |
| Debian | DLA-2987-1 | LTS | libarchive | 2022-04-30 |
| Debian | DSA-5128-1 | stable | openjdk-17 | 2022-05-03 |
| Debian | DLA-2992-1 | LTS | openvpn | 2022-05-03 |
| Debian | DLA-2988-1 | LTS | tinyxml | 2022-04-30 |
| Debian | DLA-2991-1 | LTS | twisted | 2022-05-03 |
| Fedora | FEDORA-2022-cc64b21327 | F34 | CuraEngine | 2022-05-02 |
| Fedora | FEDORA-2022-bc606b86f4 | F35 | CuraEngine | 2022-05-02 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | bettercap | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | bettercap | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | chisel | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | chisel | 2022-04-28 |
| Fedora | FEDORA-2022-17aa1c62da | F34 | chromium | 2022-05-03 |
| Fedora | FEDORA-2022-0f14e2308e | F35 | chromium | 2022-05-03 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | containerd | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | containerd | 2022-04-28 |
| Fedora | FEDORA-2022-05918f0838 | F34 | dhcp | 2022-04-29 |
| Fedora | FEDORA-2022-3a63897745 | F35 | doctl | 2022-04-28 |
| Fedora | FEDORA-2022-22b85a45cb | F34 | epiphany | 2022-04-30 |
| Fedora | FEDORA-2022-ad26447c98 | F35 | epiphany | 2022-04-30 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | gobuster | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | gobuster | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-contrib-opencensus-resource | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-contrib-opencensus-resource | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-appc-docker2aci | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-appc-docker2aci | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-appc-spec | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-appc-spec | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-containerd-continuity | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-containerd-continuity | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-containerd-stargz-snapshotter | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-containerd-stargz-snapshotter | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-coredns-corefile-migration | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-coredns-corefile-migration | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-envoyproxy-protoc-gen-validate | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-envoyproxy-protoc-gen-validate | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-francoispqt-gojay | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-francoispqt-gojay | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-gogo-googleapis | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-gogo-googleapis | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-gohugoio-testmodbuilder | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-gohugoio-testmodbuilder | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-google-containerregistry | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-google-slothfs | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-google-slothfs | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-googleapis-gnostic | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-googleapis-gnostic | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-googlecloudplatform-cloudsql-proxy | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-googlecloudplatform-cloudsql-proxy | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-grpc-ecosystem-gateway-2 | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-haproxytech-client-native | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-haproxytech-dataplaneapi | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-instrumenta-kubeval | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-instrumenta-kubeval | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-intel-goresctrl | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-intel-goresctrl | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-oklog | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-oklog | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-pact-foundation | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-pact-foundation | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-prometheus | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-prometheus | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-prometheus-alertmanager | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-prometheus-alertmanager | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-prometheus-node-exporter | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-prometheus-node-exporter | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-prometheus-tsdb | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-redteampentesting-monsoon | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-redteampentesting-monsoon | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-spf13-cobra | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-spf13-cobra | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-github-xordataexchange-crypt | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-github-xordataexchange-crypt | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-gopkg-src-d-git-4 | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-gopkg-src-d-git-4 | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-k8s-apiextensions-apiserver | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-k8s-apiextensions-apiserver | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-k8s-code-generator | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-k8s-code-generator | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-k8s-kube-aggregator | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-k8s-kube-aggregator | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-k8s-sample-apiserver | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-k8s-sample-apiserver | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-k8s-sample-controller | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-k8s-sample-controller | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-mongodb-mongo-driver | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-mongodb-mongo-driver | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-storj-drpc | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-storj-drpc | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | golang-x-perf | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | golang-x-perf | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | gopass | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | grpcurl | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | grpcurl | 2022-04-28 |
| Fedora | FEDORA-2022-6b512ae9e5 | F34 | gzip | 2022-04-30 |
| Fedora | FEDORA-2022-eeb6c686c7 | F36 | gzip | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | onionscan | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | onionscan | 2022-04-28 |
| Fedora | FEDORA-2022-c87047f163 | F35 | podman | 2022-04-29 |
| Fedora | FEDORA-2022-dbd2935e44 | F34 | rsync | 2022-04-29 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | shellz | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | shellz | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | shhgit | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | shhgit | 2022-04-28 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | snowcrash | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | snowcrash | 2022-04-28 |
| Fedora | FEDORA-2022-29327a4b98 | F34 | stb | 2022-04-28 |
| Fedora | FEDORA-2022-fe84314a8e | F35 | stb | 2022-04-28 |
| Fedora | FEDORA-2022-a2f0201723 | F34 | suricata | 2022-05-04 |
| Fedora | FEDORA-2022-585661c82c | F34 | thunderbird | 2022-04-28 |
| Fedora | FEDORA-2022-832689aa6b | F34 | usd | 2022-04-29 |
| Fedora | FEDORA-2022-61f6ee6353 | F35 | usd | 2022-05-01 |
| Fedora | FEDORA-2022-b605768c94 | F34 | vim | 2022-04-30 |
| Fedora | FEDORA-2022-64b2c02d29 | F34 | xen | 2022-05-02 |
| Fedora | FEDORA-2022-5cbd6de569 | F34 | xq | 2022-04-28 |
| Fedora | FEDORA-2022-3a63897745 | F35 | xq | 2022-04-28 |
| Fedora | FEDORA-2022-ec66ee6b59 | F34 | xz | 2022-05-02 |
| Fedora | FEDORA-2022-07cd35f6b8 | F36 | xz | 2022-05-02 |
| Mageia | MGASA-2022-0158 | 8 | chromium-browser-stable | 2022-05-02 |
| Mageia | MGASA-2022-0159 | 8 | curl | 2022-05-02 |
| Mageia | MGASA-2022-0156 | 8 | firefox/nss/rootcerts | 2022-04-29 |
| Mageia | MGASA-2022-0154 | 8 | kernel | 2022-04-28 |
| Mageia | MGASA-2022-0155 | 8 | kernel-linus | 2022-04-28 |
| Mageia | MGASA-2022-0157 | 8 | thunderbird | 2022-04-29 |
| Oracle | ELSA-2022-1566 | OL8 | container-tools:2.0 | 2022-04-28 |
| Oracle | ELSA-2022-1565 | OL8 | container-tools:3.0 | 2022-04-28 |
| Oracle | ELSA-2022-1537 | OL8 | gzip | 2022-04-27 |
| Oracle | ELSA-2022-1550 | OL8 | kernel | 2022-04-27 |
| Oracle | ELSA-2022-1556 | OL8 | mariadb:10.3 | 2022-04-28 |
| Oracle | ELSA-2022-1557 | OL8 | mariadb:10.5 | 2022-05-03 |
| Oracle | ELSA-2022-1541 | OL7 | maven-shared-utils | 2022-04-29 |
| Oracle | ELSA-2022-1541 | OL7 | maven-shared-utils | 2022-04-29 |
| Oracle | ELSA-2022-1546 | OL8 | polkit | 2022-04-27 |
| Oracle | ELSA-2022-1546 | OL8 | polkit | 2022-04-27 |
| Oracle | ELSA-2022-9344 | OL7 | qemu | 2022-04-29 |
| Oracle | ELSA-2022-9344 | OL7 | qemu | 2022-04-29 |
| Oracle | ELSA-2022-1552 | OL8 | vim | 2022-05-02 |
| Oracle | ELSA-2022-1643 | OL8 | xmlrpc-c | 2022-05-02 |
| Oracle | ELSA-2022-1642 | OL8 | zlib | 2022-04-28 |
| Red Hat | RHSA-2022:1645-02 | OSP16.2 | Red Hat OpenStack Platform 16.2 (python-twisted) | 2022-04-29 |
| Red Hat | RHSA-2022:1665-01 | EL8.2 | gzip | 2022-05-02 |
| Red Hat | RHSA-2022:1676-01 | EL8.4 | gzip | 2022-05-03 |
| Red Hat | RHSA-2022:1663-01 | RHSC | python27-python and python27-python-pip | 2022-05-02 |
| Red Hat | RHSA-2022:1662-01 | RHSC | rh-maven36-maven-shared-utils | 2022-05-02 |
| Red Hat | RHSA-2022:1664-01 | RHSC | rh-python38-python, rh-python38-python-lxml, and rh-python38-python-pip | 2022-05-02 |
| Red Hat | RHSA-2022:1643-01 | EL8 | xmlrpc-c | 2022-04-28 |
| Red Hat | RHSA-2022:1644-01 | EL8.4 | xmlrpc-c | 2022-04-28 |
| Red Hat | RHSA-2022:1642-01 | EL8 | zlib | 2022-04-28 |
| Red Hat | RHSA-2022:1661-01 | EL8.2 | zlib | 2022-05-02 |
| Slackware | SSA:2022-117-01 | curl | 2022-04-27 | |
| Slackware | SSA:2022-122-01 | libxml2 | 2022-05-02 | |
| Slackware | SSA:2022-120-01 | pidgin | 2022-04-30 | |
| SUSE | SUSE-SU-2022:1510-1 | MP4.0 MP4.1 MP4.2 SLE15 | amazon-ssm-agent | 2022-05-03 |
| SUSE | SUSE-SU-2022:1437-1 | MP4.2 SLE15 oS15.3 | buildah | 2022-04-27 |
| SUSE | SUSE-SU-2022:1430-1 | MP4.1 MP4.2 SLE15 SLE-m5.2 SES6 SES7 oS15.3 | cifs-utils | 2022-04-27 |
| SUSE | SUSE-SU-2022:1428-1 | OS8 SLE12 | cifs-utils | 2022-04-27 |
| SUSE | SUSE-SU-2022:1429-1 | OS9 SLE12 | cifs-utils | 2022-04-27 |
| SUSE | SUSE-SU-2022:14951-1 | SLE11 | cifs-utils | 2022-04-27 |
| SUSE | SUSE-SU-2022:14950-1 | SLE11 | cifs-utils | 2022-04-27 |
| SUSE | SUSE-SU-2022:1427-1 | SLE15 | cifs-utils | 2022-04-27 |
| SUSE | SUSE-SU-2022:1507-1 | SLE12 | containerd, docker | 2022-05-03 |
| SUSE | SUSE-SU-2022:1435-1 | MP4.1 MP4.2 MP4.3 SLE15 SLE-m5.1 SLE-m5.2 SES6 oS15.3 oS15.4 | firewalld, golang-github-prometheus-prometheus | 2022-04-27 |
| SUSE | SUSE-SU-2022:1484-1 | MP4.2 SLE15 | git | 2022-05-02 |
| SUSE | SUSE-SU-2022:1455-1 | MP4.2 SLE15 SLE-m5.0 SLE-m5.1 SLE-m5.2 oS15.3 oS15.4 | glib2 | 2022-04-28 |
| SUSE | SUSE-SU-2022:1479-1 | MP4.2 SLE15 oS15.3 oS15.4 | jasper | 2022-04-29 |
| SUSE | SUSE-SU-2022:1475-1 | SLE12 | jasper | 2022-04-29 |
| SUSE | SUSE-SU-2022:1513-1 | MP4.1 MP4.2 SLE15 SES6 SES7 | java-11-openjdk | 2022-05-03 |
| SUSE | SUSE-SU-2022:1474-1 | SLE12 | java-11-openjdk | 2022-04-29 |
| SUSE | SUSE-SU-2022:1436-1 | MP4.2 SLE15 oS15.3 oS15.4 | libaom | 2022-04-27 |
| SUSE | SUSE-SU-2022:1476-1 | MP4.2 SLE15 oS15.3 oS15.4 | libcaca | 2022-04-29 |
| SUSE | SUSE-SU-2022:1508-1 | SLE12 | libcaca | 2022-05-03 |
| SUSE | SUSE-SU-2022:1465-1 | MP4.2 SLE15 SLE-m5.1 SLE-m5.2 oS15.3 oS15.4 | libslirp | 2022-04-29 |
| SUSE | SUSE-SU-2022:1516-1 | SLE15 | libwmf | 2022-05-04 |
| SUSE | SUSE-SU-2022:1516-1 | SLE15 | libwmf | 2022-05-04 |
| SUSE | SUSE-SU-2022:0731-2 | oS15.4 | mariadb | 2022-04-29 |
| SUSE | SUSE-SU-2022:1478-1 | SLE12 | mutt | 2022-04-29 |
| SUSE | SUSE-SU-2022:1461-1 | MP4.1 MP4.2 SLE15 SES7 oS15.3 oS15.4 | nodejs12 | 2022-04-28 |
| SUSE | SUSE-SU-2022:1466-1 | SLE12 | nodejs12 | 2022-04-29 |
| SUSE | SUSE-SU-2022:1462-1 | MP4.1 MP4.2 SLE15 SES7 oS15.3 oS15.4 | nodejs14 | 2022-04-28 |
| SUSE | SUSE-SU-2022:1459-1 | SLE12 | nodejs14 | 2022-04-28 |
| SUSE | openSUSE-SU-2022:0123-1 | oS15.3 | opera | 2022-05-02 |
| SUSE | SUSE-SU-2022:1509-1 | oS15.3 oS15.4 | pcp | 2022-05-03 |
| SUSE | SUSE-SU-2022:1477-1 | MP4.2 SLE15 oS15.3 oS15.4 | python-Twisted | 2022-04-29 |
| SUSE | SUSE-SU-2022:1446-1 | MP4.2 SLE15 oS15.3 oS15.4 | python-paramiko | 2022-04-28 |
| SUSE | SUSE-SU-2022:1447-1 | SLE12 | python-paramiko | 2022-04-28 |
| SUSE | SUSE-SU-2022:1454-1 | MP4.2 SLE15 oS15.3 oS15.4 | python-pip | 2022-04-28 |
| SUSE | SUSE-SU-2022:1448-1 | SLE15 | python-requests | 2022-04-28 |
| SUSE | SUSE-SU-2022:1485-1 | MP4.2 SLE15 | python39 | 2022-05-02 |
| SUSE | SUSE-SU-2022:1512-1 | MP4.1 MP4.2 SLE15 SLE-m5.0 SES6 SES7 | ruby2.5 | 2022-05-03 |
| SUSE | SUSE-SU-2022:1515-1 | MP4.0 MP4.1 MP4.2 SLE15 | rubygem-puma | 2022-05-04 |
| SUSE | SUSE-SU-2022:1515-1 | MP4.0 MP4.1 MP4.2 SLE15 | rubygem-puma | 2022-05-04 |
| SUSE | SUSE-SU-2022:1483-1 | SLE12 | subversion | 2022-05-02 |
| SUSE | SUSE-SU-2022:1431-1 | MP4.1 MP4.2 SLE15 SES7 oS15.3 oS15.4 | webkit2gtk3 | 2022-04-27 |
| SUSE | SUSE-SU-2022:1511-1 | SLE15 SES6 | webkit2gtk3 | 2022-05-03 |
| SUSE | SUSE-SU-2022:1506-1 | MP4.2 SLE15 SLE-m5.1 SLE-m5.2 | xen | 2022-05-03 |
| SUSE | SUSE-SU-2022:1505-1 | SLE12 | xen | 2022-05-03 |
| Ubuntu | USN-5397-1 | 18.04 20.04 21.10 22.04 | curl | 2022-04-28 |
| Ubuntu | USN-5396-1 | 18.04 | ghostscript | 2022-04-28 |
| Ubuntu | USN-5382-2 | 22.04 | libinput | 2022-05-02 |
| Ubuntu | USN-5398-1 | 14.04 16.04 18.04 21.10 | libsdl1.2, libsdl2 | 2022-04-28 |
| Ubuntu | USN-5399-1 | 18.04 20.04 21.10 | libvirt | 2022-05-02 |
| Ubuntu | USN-5390-2 | 22.04 | linux-raspi | 2022-05-03 |
| Ubuntu | USN-5392-1 | 16.04 18.04 20.04 21.10 22.04 | mutt | 2022-04-28 |
| Ubuntu | USN-5400-1 | 18.04 20.04 21.10 22.04 | mysql-5.7, mysql-8.0 | 2022-05-03 |
| Ubuntu | USN-5395-1 | 18.04 20.04 21.10 22.04 | networkd-dispatcher | 2022-04-28 |
| Ubuntu | USN-5371-2 | 22.04 | nginx | 2022-04-28 |
| Ubuntu | USN-5393-1 | 18.04 20.04 21.10 | thunderbird | 2022-04-27 |
| Ubuntu | USN-5394-1 | 20.04 21.10 | webkit2gtk | 2022-04-28 |
Kernel patches of interest
Kernel releases
Architecture-specific
Build system
Core kernel
Development tools
Device drivers
Device-driver infrastructure
Documentation
Filesystems and block layer
Memory management
Networking
Security-related
Virtualization and containers
Page editor: Jonathan Corbet

![[Palm trees]](https://static.lwn.net/images/conf/2022/lsfmm/palms-sm.png)
![[Group photo]](https://static.lwn.net/images/conf/2022/lsfmm/group-sm.png)