LWN.net Weekly Edition for November 11, 2021 [LWN.net]

Welcome to the LWN.net Weekly Edition for November 11, 2021

This edition contains the following feature content:

Late-bound argument defaults for Python: an attempt to fix a longstanding "wart" in the Python language.
5.16 Merge window, part 1: the first set of changes for the 5.16 kernel release.
The balance between features and performance in the block layer: when should the need for performance block the addition of new features?
Intel AMX support in 5.16: how the kernel will support Intel's matrix instructions.
Concurrency in Julia: an extensive look at ways to do concurrent programming in the Julia language.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Late-bound argument defaults for Python

By Jake Edge
November 10, 2021

Python supports default values for arguments to functions, but those defaults are evaluated at function-definition time. A proposal to add defaults that are evaluated when the function is called has been discussed at some length on the python-ideas mailing list. The idea came about, in part, due to yet another resurrection of the proposal for None-aware operators in Python. Late-bound defaults would help with one use case for those operators, but there are other, stronger reasons to consider their addition to the language.

In Python, the defaults for arguments to a function can be specified in the function definition, but, importantly, they are evaluated in the scope where the function is defined. So default arguments cannot refer to other arguments to the function, as those are only available in the scope of the function itself when it gets called. For example:

    def foo(a, b = None, c = len(a)):
        ...

That definition specifies that a has no default, b defaults to None if no argument gets passed for it, and c defaults to the value of len(a). But that expression will not refer to the a in the argument list; it will instead look for an a in the scope where the function is defined. That is probably not what the programmer intended. If no a is found, the function definition will fail with a NameError.

Idea

On October 24, Chris Angelico introduced his proposal for late-bound arguments. He used an example function, derived from the bisect.bisect_right() function in the standard library, to demonstrate the idea. The function's arguments are specified as follows:

def bisect(a, x, lo=0, hi=None):

He notes that there is a disparity between lo and hi: "It's clear what value lo gets if you omit it. It's less clear what hi gets." Early in his example function, hi is actually set to len(a) if it is None. Effectively None is being used as a placeholder (or sentinel value) because Python has no way to directly express the idea that hi should default to the length of a. He proposed new syntax to identify hi as a late-bound argument:

def bisect(a, x, lo=0, hi=:len(a)):

The "=:" would indicate that if no argument is passed for hi, the expression would be evaluated in the context of the call and assigned to hi before any of the function's code is run. It is interesting to note that the documentation for bisect.bisect_right() linked above looks fairly similar to Angelico's idea (just lacking the colon) even though the actual code in the library uses a default value of None. It is obviously useful to know what the default will be without having to dig into the code.

In his post, Angelico said that in cases where None is a legitimate value, there is another way to handle the default, but it also obscures what the default will be:

And the situation only gets uglier if None is a valid argument, and a unique sentinel is needed; this standard idiom makes help() rather unhelpful:
_missing = object()
def spaminate(thing, count=_missing):
    if count is _missing: count = thing.getdefault()
Proposal: Proper syntax and support for late-bound argument defaults.
def spaminate(thing, count=:thing.getdefault()):
    ...
[...]
The purpose of this change is to have the function header define, as fully as possible, the function's arguments. Burying part of that definition inside the function is arbitrary and unnecessary.

The first order of business in these kinds of discussions is the inevitable bikeshedding about how the operator is spelled. Angelico chose a "deliberately subtle" syntax, noting that in many cases it will not matter when the argument is bound. It is visually similar to the walrus operator (":="), but that is not legal in a function definition, so there should be no ambiguity, he said.

Ethan Furman liked the idea but would rather see a different operator (perhaps "?=") used because of the potential confusion with the walrus operator. Guido van Rossum was also in favor of the feature, but had his spelling suggestion as well:

I like that you're trying to fix this wart! I think that using a different syntax may be the only way out. My own bikeshed color to try would be `=>`, assuming we'll introduce `(x) => x+1` as the new lambda syntax, but I can see problems with both as well :-).

New syntax for lambda expressions has also been discussed, with most settling on "=>" as the best choice, in part because "->" is used for type annotations; some kind of "arrow" operator is commonly used in other languages for defining anonymous functions. Several others were similarly in favor of late-bound defaults and many seemed to be happy with Van Rossum's spelling, but Brendan Barnwell was opposed to both; he was concerned that it would "encourage people to cram complex expressions into the function definition". Since it would only truly be useful—readable—for a simpler subset of defaults, it should not be added, he said. Furthermore:

To me, this is definitely not worth adding special syntax for. I seem to be the only person around here who detests "ASCII art" "arrow" operators but, well, I do, and I'd hate to see them used for this. The colon or alternatives like ? or @ are less offensive but still too inscrutable to be used for something that can already be handled in a more explicit way.

But Steven D'Aprano did not think that the addition of late-bound defaults would "cause a large increase in the amount of overly complex default values". Angelico was also skeptical that the feature was some sort of bad-code attractant. "It's like writing a list comprehension; technically you can put any expression into the body of it, but it's normally going to be short enough to not get unwieldy." In truth, any feature can be abused; this one does not look to them to be particularly worse in that regard.

PEP 671

Later that same day, Angelico posted a draft of PEP 671 ("Syntax for late-bound function argument defaults"). In it, he adopted the "=>" syntax, though he noted a half-dozen other possibilities. He also fleshed out the specification of the default expression and some corner cases:

The expression is saved in its source code form for the purpose of inspection, and bytecode to evaluate it is prepended to the function's body.
Notably, the expression is evaluated in the function's run-time scope, NOT the scope in which the function was defined (as are early-bound defaults). This allows the expression to refer to other arguments.
Self-referential expressions will result in UnboundLocalError::
    def spam(eggs=>eggs): # Nope
Multiple late-bound arguments are evaluated from left to right, and can refer to previously-calculated values. Order is defined by the function, regardless of the order in which keyword arguments may be passed.

But one case, which had been raised by Ricky Teachey in the initial thread, was discussed at some length when Jonathan Fine asked about the following function definition:

def puzzle(*, a=>b+1, b=>a+1):
    return a, b

Angelico was inclined to treat that as a syntax error, "since permitting it would open up some hard-to-track-down bugs". Instead it could be some kind of run-time error in the case where neither argument is passed, he said. He is concerned that allowing "forward references" to arguments that have yet to be specified (e.g. b in a=>b+1 above) will be confusing and hard to explain. D'Aprano suggested handling early-bound argument defaults before their late-bound counterparts and laid out a new process for argument handling that was "consistent and understandable". In particular, he saw no reason to make some kinds of late-bound defaults into a special case:

Note that step 4 (evaluating the late-bound defaults) can raise *any* exception at all (it's an arbitrary expression, so it can fail in arbitrary ways). I see no good reason for trying to single out UnboundLocalError for extra protection by turning it into a syntax error.

Angelico noted that it was still somewhat difficult for even experienced Python programmers to keep straight, but, in addition, he had yet to hear of a real use case. Erik Demaine offered two examples, "though they are a bit artificial"; he said that simply evaluating the defaults in left-to-right order (based on the function definition) was reasonably easy to understand. Angelico said that any kind of reordering of the evaluation was not being considered; as he sees it:

The two options on the table are:
1) Allow references to any value that has been provided in any way
2) Allow references only to parameters to the left

Option 2 is a simple SyntaxError on compilation (you won't even get as far as the def statement). Option 1 allows everything all up to the point where you call it, but then might raise UnboundLocalError if you refer to something that wasn't passed.
The permissive option allows mutual references as long as one of the arguments is provided, but will give a peculiar error if you pass neither. I think this is bad API design.

Van Rossum pointed out that the syntax-error option would break new ground: "Everywhere else in Python, undefined names are runtime errors (NameError or UnboundLocalError)." Angelico sees the error in different terms, though, noting that mismatches in global and local scope are a syntax error; he gave an example:

>>> def spam():
...     ham
...     global ham
...
  File "<stdin>", line 3
SyntaxError: name 'ham' is used prior to global declaration

He also gave a handful of different function definitions that were subtly different using the new feature; he was concerned about the "bizarre inconsistencies" that can arise, because they "are difficult to explain unless you know exactly how everything is implemented internally". He would prefer to see real-world use cases for the feature to decide whether it should be supported at all, but was adamant that the strict left-to-right interpretation was easier to understand:

If this should be permitted, there are two plausible semantic meanings for these kinds of constructs:
1) Arguments are defined left-to-right, each one independently of each other
2) Early-bound arguments and those given values are defined first, then late-bound arguments

The first option is much easier to explain [...]

D'Aprano explained that the examples cited were not particularly hard to understand and fell far short of the "bizarre inconsistencies" bar. There is a clear need to treat the early-bound and late-bound defaults differently:

However there is a real, and necessary, difference in behaviour which I think you missed:
    def func(x=x, y=>x)  # or func(x=x, @y=x)
The x=x parameter uses global x as the default. The y=x parameter uses the local x as the default. We can live with that difference. We *need* that difference in behaviour, otherwise these examples won't work:
    def method(self, x=>self.attr)  # @x=self.attr

    def bisect(a, x, lo=0, hi=>len(a))  # @hi=len(a)
Without that difference in behaviour, probably fifty or eighty percent of the use-cases are lost. (And the ones that remain are mostly trivial ones of the form arg=[].) So we need this genuine inconsistency.

As can be seen, D'Aprano prefers a different color for the bikeshed: using "@" to prepend late-bound default arguments. He also said that Angelico had perfectly explained the "harder to explain" option in a single sentence; both are equally easy to explain, D'Aprano said. Beyond that, it does not make sense to "prohibit something as a syntax error because it *might* fail at runtime". In a followup message, he spelled that out further:

We don't do this:
    y = x+1  # Syntax error, because x might be undefined
and we shouldn't make this a syntax error
    def func(@spam=eggs+1, @eggs=spam-1):
either just because `func()` with no arguments raises. So long as you pass at least one argument, it works fine, and that may be perfectly suitable for some uses.

Winding down

While many of the participants in the threads seem reasonably happy—or at least neutral—on the idea, there is some difference of opinion on the details as noted above. But several thread participants are looking for a more general "deferred evaluation" feature, and are concerned that late-bound argument defaults will preclude the possibility of adding such a feature down the road. Beyond that, Eric V. Smith wondered about how late-bound defaults would mesh with Python's function-introspection features. Those parts of the discussion got a little further afield from Angelico's proposal, so they merit further coverage down the road.

At first blush, Angelico's idea to fix this "wart" in Python seems fairly straightforward, but the discussion has shown that there are multiple facets to consider. It is not quite as simple as "let's add a way to evaluate default arguments when the function is called"—likely how it was seen at the outset. That is often the case when looking at new features for an established language like Python; there is a huge body of code that needs to stay working, but there are also, sometimes conflicting, aspirations for features that could be added. It is a tricky balancing act.

As with many python-ideas conversations, there were multiple interesting sub-threads, touching on language design, how to teach Python (and this feature), how other languages handle similar features (including some discussion of ALGOL thunks), the overall complexity of Python as it accretes more and more features, and, of course, additional bikeshedding over the spelling. Meanwhile, Angelico has been working on a proof-of-concept implementation, so PEP 671 (et al.) seems likely to be under discussion for some time to come.

Comments (22 posted)

5.16 Merge window, part 1

By Jonathan Corbet
November 4, 2021

As of this writing, Linus Torvalds has pulled exactly 6,800 non-merge changesets into the mainline repository for the 5.16 kernel release. That is probably a little over half of what will arrive during this merge window, so this is a good time to catch up on what has been pulled so far. There are many significant changes and some large-scale restructuring of internal kernel code, but relatively few ground-breaking new features.

Changes pulled in the first half of the 5.16 merge window include:

Architecture-specific

The Arm 8.6 timer extensions are now supported.
MIPS systems have a new just-in-time compiler for the BPF virtual machine.
KFENCE is now supported on PA-RISC machines.
Hitting the TOC button ("transfer of control") on a PA-RISC machine will now cause the kernel to enter the debugger.
RISC-V now has support for virtualization with KVM, an improvement that took rather longer than the developers would have liked.
The kernel has gained support for Intel's Advanced Matrix Extension (AMX) feature. This required extensive discussion and a major reworking of the existing floating-point support code.

Core kernel

The futex_waitv() system call (described in this article) has been merged.
The CPU scheduler has gained an understanding of "clusters", a hardware arrangement where multiple cores share the same L2 cache. The cluster-aware scheduler will take pains to distribute tasks across all clusters in the system to balance the load on caches across the machine.
The tracefs interface to the tracing mechanism now supports the setting of owner and group permissions; this feature can be used to give a specific group access to tracing functionality. The "other" permission bits still cannot be set to allow access to the world as a whole, though.
As usual there is a pile of BPF changes. The new bpf_trace_vprintk() helper can output information without the three-argument limit of bpf_trace_printk(). Support for calling kernel functions in loadable modules from BPF has been added. There is a new bloom-filter map type. Unprivileged BPF is now disabled by default. There is also a new document describing the licensing of various BPF components and the requirements for users.

Filesystems and block I/O

The block layer continues to see a series of performance optimizations resulting in significantly better per-core operation rates.
There is now support for multi-actuator (rotating) disks that can access multiple sectors at the same time. This commit documents the sysfs interface for these drives.
There is a new ioctl() command (CDROM_TIMED_MEDIA_CHANGE) for detecting media-change events in CDROM drives. Evidently people are still using CDROM drives...
The EROFS filesystem has gained simple multiple-device support.

Hardware support

Media: OmniVision OV13B10 sensors, Hynix Hi-846 sensors, and R-Car image signal processors.
Miscellaneous: Microchip external interrupt controllers, Apple mailbox controllers, Ingenic JZ47xx SoCs SPI controllers, Cadence XSPI controllers, Maxim MAX6620 fan controllers, Intel Keem Bay OCS elliptic-curve encryption accelerators, ACPI WMAA backlight interfaces, Intel ISHTP eclite embedded controllers, Barco P50 GPIO, and Samsung S6D27A1 DPI panels.
Networking: Realtek RTL8365MB-VC Ethernet switches, Realtek 802.11ax wireless chips, Asix AX88796C-SPI Ethernet adapters, and Mellanox MSN4800-XX line cards.

Networking

There is a new, user-settable socket option called SO_RESERVE_MEM. It has the effect of reserving some kernel memory permanently for the relevant socket; that, in turn, should speed networking operations, especially when the system is under memory pressure. Note that the feature is only available when memory control groups are in use, and the reserved memory is charged against the group's quota.
The In-situ Operations, Administration, and Maintenance (IOAM) support has been enhanced support the encapsulation of IOAM data into in-transit packets. This commit has a little bit of further information.
The ethtool netlink API has gained the ability to control transceiver modules; see this commit and this commit for more information.
The netfilter subsystem can now classify packets at egress time; see this commit for some more information.
Support for Automatic Multicast Tunneling (RFC 7450) has been added.
There are two new sysctl knobs controlling what happens to the ARP cache when a network device loses connectivity. arp_evict_nocarrier says whether ARP cache entries should be deleted when an interface loses carrier, while ndisc_evict_nocarrier does the same thing for the neighbor discovery table. Both exist to allow cache entries to be retained when a WiFi interface moves between access points on the same network. This commit and this one contain more information.

Security-related

Most of the work for strict memcpy() bounds checking has been merged. The actual patches enabling bounds checking across the kernel have not (yet) been merged, though, pending fixes for some remaining problems.
The io_uring subsystem has gained audit support.
The SELinux and Smack security modules can now apply security policies to io_uring operations.
Auditing will now record the contents of the open_how structure passed to openat2().
The integrity measurement architecture (IMA) can now apply rules based on the group ID of files and the users accessing them.
The default Spectre-mitigation behavior for seccomp() threads has changed, resulting in fewer mitigations being applied and a corresponding performance increase. One can read this commit for the reasoning behind this change, but the short version is that the extra mitigations weren't really buying any more security.

Internal kernel changes

The folio patch set, which has been the topic of much discussion over the past several months, was the very first thing to be merged for 5.16. This work adds a new "folio" type to represent pages that are known not to be tail pages, then reworks the internal memory-management APIs to use this type. The result is better type clarity and even a small performance increase — and a lot more work to do in the future.
There is a new internal function, cc_platform_has(), that provides a generic interface for kernel code to query the presence of confidential-computing features. Its first use is to replace mem_encrypt_active() for checking whether memory encryption is in use.

Assuming that the usual two-week schedule holds, the 5.16 merge window can be expected to close on November 14. Once that happens, we'll be back with a summary of the remaining changes pulled for the next kernel release.

Comments (none posted)

The balance between features and performance in the block layer

By Jonathan Corbet
November 5, 2021

Back in September, LWN reported on a series of block-layer optimizations that enabled a suitably equipped system to sustain 3.5 million I/O operations per second (IOPS). That optimization work has continued since then, and those 3.5 million IOPS would be a deeply disappointing result now. A recent disagreement over the addition of a new feature has highlighted the potential cost of a heavily optimized block layer, though; when is a feature deemed important enough to outweigh the drive for maximum performance?

Block subsystem maintainer Jens Axboe has continued working to make block I/O operations go faster. A recent round of patches tweaked various fast paths, changed the plugging mechanism to use a singly linked list, and made various other little changes. Each is a small optimization, but the work adds up over time; the claimed level of performance is now 8.2 million IOPS — well over September's rate, which looked good at the time. This work has since found its way into the mainline as part of the block pull request for 5.16.

So far, so good; few people will argue with massive performance improvements. But they might argue with changes that threaten to interfere, even in a tiny way, with those improvements.

Consider, for example, this patch set from Jane Chu. It adds a new flag (RWF_RECOVERY_DATA) to the preadv2() and pwritev2() system calls that can be used by applications trying to recover from nonvolatile-memory "poisoning". Implementations of nonvolatile memory have different ways of detecting and coping with data corruption; Intel memory, it seems, will poison the affected range, meaning that it cannot be accessed without generating an error (which then turns into a SIGBUS signal). An application can respond to that error by reading or writing the poisoned range with the new flag; a read will replace the poisoned data with zeroes (allowing the application to obtain whatever data is still readable), while a write will overwrite that data and attempt to clear the poisoned status. Either way, the application can attempt to recover from the problem and continue running.

Christoph Hellwig objected to this new flag on the grounds that it would slow down the I/O fast path:

Well, my point is doing recovery from bit errors is by definition not the fast path. Which is why I'd rather keep it away from the pmem read/write fast path, which also happens to be the (much more important) non-pmem read/write path.

Pavel Begunkov also complained, saying that each flag adds a bit of overhead that piles up over time: "default config kernels are already sluggish when it comes to really fast devices and it's not getting better". That caused Darrick Wong to ask: "So we can't have data recovery because moving fast [is] the only goal?". Begunkov denied saying that, but wasn't really clear on what he was saying.

The cost of this flag is tiny — perhaps not even measurable — in cases where it is not used. But even that overhead can look unacceptable to developers who are working to get the sustained IOPS numbers as high as possible. One flag leads to another and another, and someday in the future the performance cost becomes significant — or that is the argument, anyway. To avoid this kind of problem, the argument continues, niche features like nonvolatile memory poison recovery should be restricted to parts of the kernel that are not seen as being the fast path. In this case, adding the needed functionality to fallocate() has been suggested and tried, but it was eventually decided that fallocate() is not the right place for hardware-management features like poison-clearing.

Thus the current implementation, which has run into fast-path concerns. That, in turn, has provoked an extended outburst from Dave Chinner, who thinks that much of the current optimization work is misplaced:

The current approach of hyper-optimising the IO path for maximum per-core IOPS at the expense of everything else has been proven in the past to be exactly the wrong approach to be taking for IO path development. Yes, we need to be concerned about performance and work to improve it, but we should not be doing that at the cost of everything else that the IO stack needs to be able to do.

Optimization, he continued, should come after the needed functionality is present; "using 'fast path optimisation' as a blunt force implement to prevent new features from being implemented is just ... obnoxious".

The conversation stalled out shortly after that. This kind of disagreement over features and performance has been heard in the kernel community many times over the decades, though. Going faster is a popular goal, and the developers who are focused on performance have been known to get tired of working through performance regressions caused by feature additions that accumulate over time. But functionality, too, is important, and developers tire of having needed features blocked on seemingly insignificant performance concerns.

Often, performance-related pushback leads to revised solutions that do not have the same performance cost; along those lines, it's worth noting that Hellwig has some ideas for improved ways of handling I/O to nonvolatile memory. Other times, it just leads to the delay or outright blocking of needed functionality. What will happen in this case is not clear at this point, but debates like this will almost certainly be with us in the coming decades as well. It is, in the end, how the right balance between performance and functionality is (hopefully) found.

Comments (9 posted)

Intel AMX support in 5.16

By Jonathan Corbet
November 8, 2021

The x86 instruction set is large, but that doesn't mean it can't get bigger yet. Upcoming Intel processors will feature a new set of instructions under the name of "Advanced Matrix Extensions" (AMX) that can be used to operate on matrix data. After a somewhat bumpy development process, support for AMX has found its way into the upcoming 5.16 kernel. Using it will, naturally, require some changes by application developers.

AMX (which is described in this document) is meant to be a general architecture for the acceleration of matrix operations on x86 processors. In its initial form, it implements a set of up to eight "tiles", which are arrays of 16 64-byte rows. Programmers can store matrices in these tiles of any dimension that will fit therein; a matrix of 16x16 32-bit floating-point values would work, but other geometries are supported too. The one supported operation currently will multiply the matrices stored in two tiles, then add the result to a third tile. By chaining these operations, multiplication of matrices of any size can be implemented. Evidently other operations are meant to come in the future.

While AMX may seem like a feature aimed at numerical analysis, the real target use case would appear to be machine-learning applications. That would explain why 16-bit floating-point arithmetic is supported, but 64-bit is not.

The design of AMX gives the kernel control over whether these features can be used by any given process. There are a couple of reasons for this, one being that AMX instructions, as one might imagine, use a lot of processor resources. A process doing heavy AMX work on a shared computer may adversely affect other processes. But AMX also cannot be supported properly unless both the kernel and the user-space process are ready for it.

Development process

Support for AMX was first posted by Chang Bae in October 2020, but got relatively few review comments. By the time version 4 came out in February, more developers were paying attention, and they were not entirely pleased with how this feature was being integrated into the kernel's existing floating-point unit (FPU) code. Various versions followed, with the frustration level seeming to increase; at the end of September, Len Brown posted minutes from a conversation that, seemingly, showed a path forward.

Unfortunately, version 11, posted the very next day, seemed to ignore many of the decisions that had been made. This posting drew a sharp rebuke from Thomas Gleixner, who felt that the feature was being shoved into the kernel without listening to the complaints that were being raised. Things weren't looking great for AMX, but work was happening behind the scenes; in mid-October, Gleixner posted a massive reworking of the FPU code meant to ease the task of supporting AMX. A new AMX patch set followed shortly thereafter, and that, more or less, is what ended up in 5.16.

Gleixner's pull request for this code acknowledged its relatively unseasoned nature:

Note, this is relatively new code despite the fact that AMX support is in the works for more than a year now.
The big refactoring of the FPU code, which allowed to do a proper integration has been started exactly 3 weeks ago. Refactoring of the existing FPU code and of the original AMX patches took a week and has been subject to extensive review and testing. The only fallout which has not been caught in review and testing right away was restricted to AMX enabled systems, which is completely irrelevant for anyone outside Intel and their early access program. There might be dragons lurking as usual, but so far the fine grained refactoring has held up and eventual yet undetected fallout is bisectable and should be easily addressable before the 5.16 release. Famous last words...

The FPU code is relatively tricky, low-level stuff, so it would indeed be unsurprising to find a dragon or two lurking in the new work.

Using AMX

As noted above, the kernel is able to control which processes are able to use the AMX instructions. The first step for a user-space process would be to use a new arch_prctl() command (ARCH_GET_XCOMP_SUPP) to get a list of supported features; if the appropriate bit is set in the result, AMX is available. Then, another arch_prctl() command (ARCH_REQ_XCOMP_PERM) can be used to request permission to use AMX. Some checks are made (one to be described shortly), and there is an opportunity for security modules to express an opinion as well. Normally, though, the request will be granted. Permissions apply to all threads in a process and are carried over a fork; calling execve() will reset them, though.

One challenge presented by AMX is that processors can create a great deal of internal state while the AMX instructions are running. If the CPU is interrupted in the middle of an operation, that state must be saved somewhere or a lot of work could be lost. So, if a process is using AMX, the kernel must be prepared to save up to about 10KB of data in its interrupt handlers before doing much of anything else. This saving is done using the XSAVE instruction.

The kernel allocates memory for each process that can be used for this purpose. Allocating 10KB for every process in the system would waste a lot of memory, though; most processes will never use AMX instructions. Happily, the processor can be configured to trap into the kernel the first time any process executes an AMX instruction; the kernel can then check whether permission to use those instructions has been granted and, if so, allocate an appropriately sized buffer to hold the FPU state and allow the operation to continue.

One potential problem has to do with the sigaltstack() system call, which allows a thread to establish a new stack for signal handling. That stack, too, must be large enough to hold the FPU state if the process involved is using AMX. For years, developers have been told to use MINSIGSTKSZ as the minimum size for this stack; that size, which is 2KB, is nowhere near large enough for AMX-using processes. Indeed, it is not even large enough to use the AVX-512 extensions, a fact that has caused some corrupted-stack problems in the past.

To avoid this problem for AMX, the kernel will check to ensure that all signal stacks are sufficiently large. This check is done at each call to sigaltstack(), but a check of existing stacks is also done when a process requests permission to use AMX in the first place. Processes not using AMX will not need the larger stack and, thus, will not be broken by these checks. Processes that do want to use AMX will not be allowed to unless all of their signal stacks are big enough.

Once the infrastructure to perform these checks was put in place, the kernel also gained the ability to ensure that processes using AVX-512 have adequately sized signal stacks. Enforcing that condition, though, has the potential to break applications that seemingly work now, perhaps because their signal handlers are never actually called. To avoid this problem, there is a kernel configuration option (STRICT_SIGALTSTACK_SIZE) and a command-line option (strict_sas_size=), either of which can be used to control whether the strict checks are carried out when AVX-512 is in use.

Assuming that all of the pieces hold together, this is the form that AMX support will take in 5.16. Those wanting more information can look at the commits containing AMX test cases and some documentation on the arch_prctl() commands. Meanwhile, keep an eye out for dragons for the next nine weeks or so.

Comments (13 posted)

Concurrency in Julia

November 9, 2021

This article was contributed by Lee Phillips

The Julia programming language has its roots in high-performance scientific computing, so it is no surprise that it has facilities for concurrent processing. Those features are not well-known outside of the Julia community, though, so it is interesting to see the different types of parallel and concurrent computation that the language supports. In addition, the upcoming release of Julia version 1.7 brings an improvement to the language's concurrent-computation palette, in the form of "task migration".

Multithreading

Julia supports a variety of styles of concurrent computation. A multithreaded computation is a type of concurrency that involves simultaneous work on different processing units. Multithreading, and the related term parallel computation, usually imply that the various threads have access to the same main memory; therefore it involves multiple cores or processing units on a single computer.

When Julia is started without the -t flag it will use only one thread. To allow it to use all of the available hardware threads, it accepts the -t auto argument. This allows access to all of the logical threads on the machine. This is often not optimal, and a better choice can be -t n, where n is the number of physical cores. For example, the popular Intel Core processors provide two logical cores for each physical core, doubling up using a technique called hyperthreading, which can yield anything from a modest speedup to an actual slowdown, depending on the type of calculation.

To demonstrate multithreading in Julia, we will use the function below that tests whether a number is prime. As with all of the examples in the article, isprime() leaves many potential optimizations on the table in the interests of simplicity.

    function isprime(x::Int)
        if x < 2
            return false
        end
        u = isqrt(x) # integer square root
        for n in 2:u
            if x % n == 0
                return false
            end
        end
        return true
    end

    function isprime(x::Any)
        return "Integers only, please"
    end

The function should look only at integers (Int is an alias for Int64), so we've taken advantage of multiple dispatch to create two methods: one that tests for primality and one that complains if we give it anything besides an integer. The most specific method is always dispatched, so this lets us handle incorrect data types without writing explicit tests in the function.

A useful package for multithreading that performed well in my tests is Folds. This is not in the standard library so you must install it using Julia's package manager. The Folds package provides a small collection of high-level macros and functions for concurrency. The name derives from the use of "fold" to mean a reduction over an array. Folds provides multithreaded versions of map(), sum(), maximum(), minimum(), reduce(), collect(), and a handful of other automatically parallelized operations over collections.

Here we use map() to check a range of numbers for primality. If f() is a function of one variable and A is an array, the expression map(f, A) applies f() to each element of A, returning an array with the same shape as A. Replacing a map() operation with a multithreaded one is as easy as this:

    julia> @btime map(isprime, 1000:9999999);
      8.631 s (2 allocations: 9.54 MiB)

    julia> @btime Folds.map(isprime, 1000:9999999);
      5.406 s (45 allocations: 29.48 MiB)

In Julia, the syntax a:n:b defines an iterator called a "range" that steps from a to b by n (which defaults to 1). For the map() call, those ranges get turned into a vector.

Folds.map() divides the array into equal segments, one for each available thread, and performs the map() operation in parallel over each segment. As with all the experiments in this article, the timings were performed using Julia version 1.7rc1. For this test I started the REPL with julia -t2.

The @btime macro, from the BenchmarkTools package, runs the job several times and reports an average run time. The timings indicate a speedup not all that far from the ideal factor of two, on my two-core machine, at the cost of higher memory consumption. A CPU-usage meter confirms that only one core was working during the first calculation, while two were busy during the second.

Another form of multithreading is use of graphics processing units (GPUs). Originally designed to accelerate the inherently parallel graphics calculations in 3D games and other graphics-heavy applications, GPUs, with their hundreds or thousands of floating-point processors, are now used as array coprocessors for a wide variety of applications. The best candidates for GPU computing are inherently parallel calculations with a large ratio of arithmetic to memory operations. There is an organization called JuliaGPU that serves as an umbrella for Julia packages that implement or rely on this style of parallel processing.

Map operations are inherently parallel; they involve a function acting on each element of a collection in isolation, neither reading from nor writing to other elements. Because of this, conventional maps can be replaced with multithreaded versions without having to change anything else in the program. The complications that come with simultaneous access to the same data don't arise.

Real-life programs will contain parts that can be easily parallelized, others where greater care must be taken, and parts that can not be parallelized at all. Consider a simplistic simulation of a galaxy, with bodies moving under the influence of their mutual gravitational forces. The force on each body must be calculated by looking at the positions of all the others (out to some distance, depending on the accuracy sought); although this part of the calculation can be parallelized, the pattern is more complicated than the map() patterns considered above. But after all the forces are stored, each body's change in velocity can be calculated independently of the others, in an easily parallelizable procedure with the structure of a simple map().

Julia has all the facilities for the more complex forms of concurrent computation. A for loop can be made multithreaded using the @threads macro this way:

    Threads.@threads for i = 1:N
	<do something>        
    end

In this case, if, for example, we have two threads, one thread will get the loop from 1 to N/2 and the other from N/2 to N. If more than one thread needs access to the same data, that data must be protected by locking, which is provided by the lock() function. Julia prevents some race conditions through the use of an Atomic data type. Programs are not allowed to change data of this type except through a set of atomic operations such as Threads.atomic_add!() and Threads.atomic_sub!() instead of normal addition and subtraction.

Distributed processing

A different style of concurrency involves the cooperatively multitasking asynchronous coroutines that Julia calls Tasks. These coroutines can share a single thread, be distributed among threads, or be assigned to different nodes in a cluster or to geographically distributed machines communicating over the internet. All of these forms of distributed computing share the characteristic that processors do not have direct access to the same memory, so the data must be transported to the places where computation is to happen.

Tasks are well suited to the situation where a program needs to start a process that may take a while, but the program can continue because it doesn't need the result immediately. Examples are downloading data from the internet, copying a file, or performing a long-running calculation whose result is not needed until later.

When Julia is invoked with the -p n flag, n worker processes are initialized, in addition to the main executive process. The two multiprocessing flags can be combined, as in julia -p2 -t2. This would start up with two worker processes, each of which can compute with two threads. For the examples in this section, I used julia -p2.

The "manual" way to launch a task is by using the macro call @spawnat :any expr. This assigns a task to compute the expression expr on a worker process chosen by the system. The worker processes are identified by integers starting with 1; using an integer in place of :any spawns the task in that specific process.

More common is the use of a distributed version of map() called pmap(), which gets imported by the -p flag. pmap() performs the same calculation, but first breaks the array into pieces, sending an equally-sized piece to each worker process where the maps are performed in parallel, then gathers and assembles the results. For an example calculation we'll estimate π using the notoriously slowly converging Leibniz sum (seen at right).

The function leibniz(first, last, nterms) returns an array with the results for every nterms terms between the first two arguments:

    @everywhere function leibniz(first, last, nterms)
        r = [] 
        for i in first:nterms:last # Step from first to last by nterms
            i1 = i
            i2 = i+nterms-1
            # Append the next partial sum to r:
            push!(r, (i1, i2, 4*sum((-1)^(n+1) * 1/(2n-1) for n in i1:i2)))
        end  
        return r  
    end

The @everywhere macro, also imported by the -p flag, broadcasts function and variable definitions to all workers, which is necessary because the worker processes do not share memory with the main process. The leibniz() function is a direct translation of the Leibniz sum, using the sum() function to add up the terms in a generator expression. After every group of nterms terms, it pushes the cumulative partial result into the r array, which it returns when done.

In order to test the performance of pmap(), we can use it to simultaneously calculate the first 2 × 10⁸ terms:

    julia> @btime r = pmap(n -> leibniz(n, n+10^8-1, 1000), [1, 10^8+1]);
      7.127 s (1600234 allocations: 42.21 MiB)

The version using the normal map, which uses a single process, takes almost twice as long (but uses 23% of the memory):

    julia> @btime r = map(n -> leibniz(n, n+10^8-1, 1000), [1, 10^8+1]);
      13.630 s (200024 allocations: 9.76 MiB)

The figure below shows the results of summing the first 2 × 10⁸ terms of the Leibniz series.

The distributed version consumed significantly more memory than the normal map(), something that I observed in most cases. pmap() seems best suited to coarse-grained multiprocessing; it's an effective way to speed up an expensive operation over a small collection of inputs.

Julia's machinery for distributed processing also works across machines. When given a file of internet addresses, Julia will spawn jobs over the internet using SSH and public-key authentication, allowing distributed code developed locally to run with little modification on a network of suitably equipped computers.

Tasks and task migration

Another type of concurrency that Julia supports can be described as asynchronous multithreading. It computes with Julia tasks, but assigned to multiple threads within a single process. Therefore all tasks can access the same memory and cooperatively multitask. For the examples in this section I started Julia with julia -t2, so I had one thread for each CPU core.

We can start a task, and automatically schedule it on an available thread, with a macro: a = Threads.@spawn expr. This returns a Task object; we can wait for the Task and read the return value of expr with fetch(a).

For calculations involving a set of tasks with roughly equal run times, we should expect a close-to-linear speedup with respect to the number of CPU threads. In previous versions of Julia, once a task was scheduled on a thread, it was stuck there until it finished. This could create scheduling problems in cases where some tasks take longer to run than others; the threads hosting the lighter tasks may become idle while the heavy tasks continue to compete for time on their original threads. In such cases involving tasks with very unequal run times, we might no longer observe linear speedups; there are resources going to waste.

In version 1.7 of Julia, tasks are allowed to migrate among threads. Tasks can be rescheduled to available threads rather than remaining stuck where they were originally assigned. This feature is still largely undocumented, but I verified that tasks launched with Threads.@spawn can migrate upon calling yield(), which is the function that suspends a task and allows another scheduled task to run.

In order to directly observe the effect of thread migration, I needed to be able to turn it on and off. It's possible to compile a version of Julia without the feature enabled, but there is an easier way. The ThreadPools package provides a set of useful macros for concurrency. I used its @tspawnat macro in my experiments. Tasks spawned with this macro can migrate, just as tasks spawned using Threads.@spawn, but using @tspawnat allowed me to easily make a modified version that spawns tasks that are not allowed to migrate.

Monte Carlo physics simulations or probabilistic modeling are ideal types of calculations for asynchronous multithreading. This approach involves running a set of simulations that are identical aside from the series of pseudo-random numbers that control the details of each task's calculation. All the tasks are independent of each other, and when they are complete, various averages and their distributions can be examined.

For my experiments I wrote a simple program to look at the statistical properties of Julia's pseudo-random number generation function rand(), which returns a uniformly distributed random number between 0 and 1 when called with no arguments. The function below, when passed a number n, generates a random number n times, reporting the mean for each group of n/10 numbers:

    function darts(n)
          if n % 10 != 0 || n <= 0
              return nothing
          end
        a = Threads.threadid()
        means = zeros(10)
        for cyc in 1:10
            yield()
            s = 0
            for i in 1:Int(n/10)
                s += rand()
            end
            means[cyc] = s/(n/10)
        end
        return (a, Threads.threadid(), time() - t0, n, means)
    end

The program only works with an n that is a multiple of 10. It has a few additional lines for keeping track of which thread it ran on and to record the completion time, which it calculates as an elapsed time from a global t0. Before each one of the 10 cycles, it calls yield(), which allows the scheduler to switch to another task on the thread, and, with thread migration, to move the task to another thread.

As an example, if we run darts(1000), we'll get 10 mean values back, each one the result of picking 100 random numbers. According to the central limit theorem these mean values should have a normal distribution with a mean that is the same as that of the underlying (uniform) distribution.

In order to get a sample and plot its distribution, I launched 20 tasks, each running the darts() program with n = 10⁷. This ran 200 random trials, each one picking one million random numbers, then calculating and recording their mean. We also need a highly accurate estimate of the distribution's mean, so I launched one additional task with n = 10⁹ to calculate this. We know that the theoretical mean should be 0.5, so this will check for bias in rand(). With a trivial modification the procedure can be used with other underlying distributions whose means we may not know analytically.

I launched the tasks with the following:

    begin
        t0 = time()
        tsm = [@tspawnat rand(1:2) darts(n) for n in [repeat([1e7], 20); 1e9]]
    end

All of the tasks have access to t0, so they can all report the time at which they finished. The call rand(1:2) assigns each task initially to one thread or the other with equal probability. I repeated this experiment five times, and ran another experiment five times, identical except for the use of my modified @tspawnat macro that disables thread migration.

To look at the distribution from any one of the experiments we can collect the sample means into one array and plot it with the histogram() function that comes with the Plots package; an overview of the plotting system explains how to use histogram() and the other plotting functions used in this article. The result is shown in the next figure, with a bar superimposed showing the mean from the task using 10⁹ numbers.

The distribution is approximately normal with the correct mean.

The experiment consists of 20 short tasks and one task that takes 100 times as long. I was optimistic that this would exhibit the effects of thread migration, as the scheduler would have the opportunity to move tasks off of the thread containing the long-running calculation. This is the type of situation that thread migration was intended to improve. Calculations involving a group of tasks with similar resource requirements would not be expected to benefit from migration.

The average of the 21 times for completion, over five experiments, with thread migration enabled was 0.71 seconds, and 1.5 seconds with migration disabled. A detailed look at the timings shows how thread migration allows the job to complete twice as quickly. The next figure shows the return times for all 10 experiments, with a line drawn to guide the eye through the times for the fastest one.

A separate timing showed that single runs take 0.024 seconds for darts(1e7) and 2.4 seconds for darts(1e9), the ratio of run times that we would expect. Trial number 21 is the task calling darts(1e9), which takes about 2.5 seconds when it is run in parallel. The plot makes it clear that, without migration, about half the tasks get stuck sharing the thread with the single heavy task. The other half return quickly, after which an available thread goes to waste. In contrast, thread migration moves tasks onto the free thread, leading to better load balancing and a quicker overall completion. Looking at the values returned by the Threads.threadid() calls confirms that about half the tasks migrate.

This experiment shows the new thread migration feature is working as intended. It can speed up asynchronous multithreaded calculations in cases where tasks have significantly unequal run times, or where some tasks block waiting for I/O or other events.

In calculations like the ones in this and the previous section, the tasks proceeded completely independently of each other; no coordination was required. If a program has locations where it must wait for all the asynchronous tasks to complete before moving on to the next stage, it can use the @sync macro. This will wait until all tasks spawned in the scope where the macro appears have finished.

Conclusion

Concurrency is inherently difficult. Concurrent programs are harder to debug and to reason about. Julia doesn't erase this difficulty; the programmer still needs to beware of race conditions, consistency, and correctness. But Julia's macros and functions for tasks and data parallelism make simple things easy in many cases, and more complex problems approachable.

This is also an area where development is active, from packages such as Folds to implementation details like the operation of the scheduler, which has most recently led to the task migration feature. We can expect Julia's concurrency situation to continue to improve. We can also expect some of the best parts of the concurrency ecosystem to remain outside of the standard library and the official documentation; the programmer who wants to take the best advantage of these offerings will need to make an effort to keep up with the changing landscape.

Comments (14 posted)

Conill: an inside look into the illicit ad industry

Ariadne Conill shares some experience of working in the online advertising industry.

The cycle of patching on both sides is ongoing to this day. A friend of mine on Twitter referred to this tug-of-war as “core war,” which is an apt description: all of the involved actors are trying to patch each other out of being able to commit or detect subterfuge, and your browser gets slower and slower as more mitigations and countermeasures are layered on. If you’re not using an ad blocker yet, stop reading this, and install one: your browser will suddenly be a lot more performant.

Comments (87 posted)

GitLab servers are being exploited in DDoS attacks (The Record)

The Record is reporting on massive exploitation of an oldish vulnerability in GitLab instances.

While the purpose of these attacks remained unclear for HN Security, yesterday, Google’s Menscher said the hacked servers were part of a botnet comprising of “thousands of compromised GitLab instances” that was launching large-scale DDoS attacks.

The vulnerability was fixed in April, but evidently a lot of sites have not updated.

Comments (22 posted)

Samba 4.15.2, 4.14.10, 4.13.14 security releases available

There is a set of new Samba releases out there. They fix a long and intimidating list of security issues and seem worth upgrading to for any but the most protected of Samba servers.

Full Story (comments: 6)

Security quotes of the week

Here's the thing: the City of Stockholm should have scrutinized any third party app that touched its systems for privacy breaches. That's its job. But the way it proceeded shows that its primary concern wasn't safeguarding private data – it was safeguarding its reputation. By blocking a third-party app that succeeded where its app had failed, the City was able to maintain the fiction that the billion kroners Skolplattformen cost to produce was money well-spent. By slandering the volunteers who discovered security defects in its billion-kroner app, the City was able to maintain the fiction that it had exercised good oversight in public spending.
There's a name for this conduct: privacywashing, when legitimate adaptation, investigation and modification is blocked in the name of preserving privacy.

— Cory Doctorow

Also, the name of the bill is based on the idea of "filter bubbles" and many of the co-sponsors of the bill claim that these websites are purposefully driving people deeper into these "filter bubbles." However, as we again just recently discussed, new research shows that social media tends to expose people to a wider set of ideas and viewpoints, rather than more narrowly constraining them. In fact, they're much more likely to face a "filter bubble" in their local community than by being exposed to the wider world through the internet and social media.
So, in the end, we have a well-hyped bill based on the (false) idea of filter bubbles and the (false) idea of algorithms only serving corporate profit, which would require websites to give users a chance to turn off an algorithm -- which they already allow, and which would effectively kill off other useful tools like mobile optimization. It seems like the only purpose this legislation actually serves to accomplish is to let these politicians stand up in front of the news media and claim they're "taking on big tech!" and smile disingenuously.

— Mike Masnick

Comments (none posted)

Kernel release status

The 5.16 merge window is open; it can be expected to close on November 14.

Stable updates: 5.15.1, 5.14.17, 5.10.78, 5.4.158, and 4.19.216 were released on November 6.

Comments (none posted)

ksmbd: a new in-kernel SMB server (SAMBA+ blog)

Over at the SAMBA+ blog, the performance of the new ksmbd kernel SMB server and Samba in user space are compared:

ksmbd claims performance improvements on a wide range of benchmarks: the graphs on this page show a doubling of performance on some tests. There was also the notion that an in-kernel server is likely an easier place to support SMB Direct, which uses RDMA to transfer data between systems.
Clearly, those number are impressive, but at the same time recent improvements in Samba's IO performance put this into perspective: by leveraging the new "io_uring" Linux API Samba is able to provide roughly 10x the throughput compared to ksmbd.
Time will tell whether it's better to reside in kernel-space like ksmbd or in user-space like Samba in order to squeeze the last bit of performance out of the available hardware.

There are two graphs that show some impressive results for Samba.

Comments (5 posted)

Ryabitsev: lore+lei: part 1, getting started

Konstantin Ryabitsev introduces the "local email interface" (lei) functionality for the lore archive of kernel mailing lists.

Even though it started out as merely a list archival service, it quickly became obvious that lore could be used for a lot more. Many developers ended up using its search features to quickly locate emails of interest, which in turn raised a simple question — what if there was a way to “save a search” and have it deliver all new incoming mail matching certain parameters straight to the developers' inbox?
You can now do this with lei.

Comments (16 posted)

Horgan: Linux x86 Program Start Up

Patrick Horgan explains the process of starting a program on Linux in great detail.

Well, __libc_start_main calls __libc_init_first, who immediately uses secret inside information to find the environment variables just after the terminating null of the argument vector and then sets a global variable __environ which __libc_start_main uses thereafter whenever it needs it including when it calls main. After the envp is established, then __libc_start_main uses the same trick and surprise! Just past the terminating null at the end of the envp array, there's another vector, the ELF auxiliary vector the loader uses to pass some information to the process. An easy way to see what's in there is to set the environment variable LD_SHOW_AUXV=1 before running the program.

Comments (3 posted)

LXD 4.20 released

The LXD team has announced the release of version 4.20 of the LXD system container and virtual machine manager.

This is one very busy release with a lot of new features.
VM users will be happy to see the initial implementation of live migration and core scheduling support. Container users are getting new configuration keys to set sysctls.
Then the bulk of the new features are all network related with peer network relationships, network zones for auto-generated DNS and SR-IOV accelerated OVN networks.
And lastly, on the clustering front, it’s now possible to better control what servers will be receiving new workloads.

Comments (13 posted)

DistroWatch Weekly November 8

Lunar Linux Weekly News November 5

openSUSE Tumbleweed Review of the Week November 5

Ubuntu Weekly Newsletter November 6

Emacs News November 8

These Weeks in Firefox November 10

What's cooking in git.git November 3

What's cooking in git.git November 9

This Week in GNOME November 5

LLVM Weekly November 8

OCaml Weekly News November 9

Perl Weekly November 8

PostgreSQL Weekly News November 7

Python Weekly Newsletter November 4

Racket News November 5

Weekly Rakudo News November 8

Ruby Weekly News November 4

This Week in Rust November 3

Wikimedia Tech News November 8

Fedora FESCO meeting minutes November 8

openSUSE Board Minutes September 27

Perl Steering Council meeting minutes October 29

Perl Steering Council meeting minutes November 05

CFP Deadlines: November 11, 2021 to January 10, 2022

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

Deadline	Event Dates	Event	Location
December 10	April 5 April 7	Cephalocon 2022	Portland, OR, US
December 15	March 19 March 20	LibrePlanet 2022	Online
January 9	February 3 February 4	CentOS Dojo, FOSDEM 2022	online, online

If the CFP deadline for your event does not appear here, please tell us about it.

Events: November 11, 2021 to January 10, 2022

The following event listing is taken from the LWN.net Calendar.

Date(s)	Event	Location
November 10 November 12	Meeting C++ 2021	Online
November 17	SQLite & Tcl virtual conference	Online
November 27 November 28	EmacsConf	Online
November 30 December 2	Yocto Project Summit 2021-11	virtual
December 1 December 4	FlaskCon	Online
December 2 December 3	PGConf NYC	New York, USA
December 3 December 4	OLF Conference	Columbus, OH, USA

If your event does not appear here, please tell us about it.

Alert summary November 4, 2021 to November 10, 2021

Dist.	ID	Release	Package	Date
Arch Linux	ASA-202111-2		firefox	2021-11-09
Arch Linux	ASA-202111-5		grafana	2021-11-09
Arch Linux	ASA-202111-1		jenkins	2021-11-09
Arch Linux	ASA-202111-4		opera	2021-11-09
Arch Linux	ASA-202111-3		thunderbird	2021-11-09
Debian	DLA-2812-1	LTS	botan1.10	2021-11-08
Debian	DLA-2813-1	LTS	ckeditor	2021-11-09
Debian	DSA-5002-1	stable	containerd	2021-11-06
Debian	DLA-2814-1	LTS	openjdk-8	2021-11-09
Debian	DLA-2808-1	LTS	python3.5	2021-11-05
Debian	DLA-2810-1	LTS	redis	2021-11-05
Debian	DSA-5001-1	stable	redis	2021-11-05
Debian	DSA-5003-1	stable	samba	2021-11-09
Debian	DLA-2811-1	LTS	sqlalchemy	2021-11-06
Debian	DLA-2809-1	LTS	udisks2	2021-11-05
Fedora	FEDORA-2021-71ff867094	F33	ansible	2021-11-04
Fedora	FEDORA-2021-0e7910e389	F34	ansible	2021-11-04
Fedora	FEDORA-2021-5093f11905	F33	chromium	2021-11-04
Fedora	FEDORA-2021-70dd0b9f5d	F33	community-mysql	2021-11-10
Fedora	FEDORA-2021-f74148c6d4	F34	community-mysql	2021-11-10
Fedora	FEDORA-2021-46dc82116b	F35	community-mysql	2021-11-10
Fedora	FEDORA-2021-52398e2ff4	F33	firefox	2021-11-10
Fedora	FEDORA-2021-4fed2b55c4	F33	kernel	2021-11-04
Fedora	FEDORA-2021-4320606094	F34	kernel	2021-11-04
Fedora	FEDORA-2021-bdd146e463	F34	kernel	2021-11-06
Fedora	FEDORA-2021-ed8c2e1098	F35	kernel	2021-11-04
Fedora	FEDORA-2021-a093973910	F35	kernel	2021-11-06
Fedora	FEDORA-2021-e1d8a99caa	F34	mupdf	2021-11-04
Fedora	FEDORA-2021-e1d8a99caa	F34	python-PyMuPDF	2021-11-04
Fedora	FEDORA-2021-d2d7341538	F35	radeontop	2021-11-06
Fedora	FEDORA-2021-5b8d0d36bf	F33	rpki-client	2021-11-08
Fedora	FEDORA-2021-335bf19f5d	F34	rpki-client	2021-11-08
Fedora	FEDORA-2021-6a91085462	F35	rpki-client	2021-11-08
Fedora	FEDORA-2021-0578e23912	F34	rust	2021-11-04
Fedora	FEDORA-2021-7ad3a01f6a	F35	rust	2021-11-05
Fedora	FEDORA-2021-a5e55a9e02	F33	vim	2021-11-10
Fedora	FEDORA-2021-58ab85548d	F35	vim	2021-11-10
Fedora	FEDORA-2021-131360fa9a	F33	webkit2gtk3	2021-11-07
Fedora	FEDORA-2021-483d896d1d	F34	webkit2gtk3	2021-11-07
Fedora	FEDORA-2021-e1d8a99caa	F34	zathura-pdf-mupdf	2021-11-04
openSUSE	openSUSE-SU-2021:3616-1	15.3	binutils	2021-11-04
openSUSE	openSUSE-SU-2021:3643-1	15.3	binutils	2021-11-10
openSUSE	openSUSE-SU-2021:1462-1	15.2	chromium	2021-11-08
openSUSE	openSUSE-SU-2021:1455-1	15.2	java-1_8_0-openj9	2021-11-06
openSUSE	openSUSE-SU-2021:3615-1	15.3	java-1_8_0-openj9	2021-11-04
openSUSE	openSUSE-SU-2021:1460-1	15.2	kernel	2021-11-08
openSUSE	openSUSE-SU-2021:3641-1	15.3	kernel	2021-11-09
openSUSE	openSUSE-SU-2021:1451-1	15.2	libvirt	2021-11-05
openSUSE	openSUSE-SU-2021:3619-1	15.3	libvirt	2021-11-05
openSUSE	openSUSE-SU-2021:1452-1	oSB SLE15	mailman	2021-11-05
openSUSE	openSUSE-SU-2021:1461-1	15.2	qemu	2021-11-08
openSUSE	openSUSE-SU-2021:3604-1	15.3	qemu	2021-11-03
openSUSE	openSUSE-SU-2021:3605-1	15.3	qemu	2021-11-03
openSUSE	openSUSE-SU-2021:3614-1	15.3	qemu	2021-11-04
openSUSE	openSUSE-SU-2021:3634-1	15.3	rubygem-activerecord-5_1	2021-11-09
openSUSE	openSUSE-SU-2021:3639-1	15.3	tinyxml	2021-11-09
openSUSE	openSUSE-SU-2021:1458-1	oSB SLE15	transfig	2021-11-08
openSUSE	openSUSE-SU-2021:1454-1	15.2	webkit2gtk3	2021-11-06
openSUSE	openSUSE-SU-2021:3603-1	15.3	webkit2gtk3	2021-11-03
Oracle	ELSA-2021-4116	OL7	firefox	2021-11-04
Oracle	ELSA-2021-4116	OL7	firefox	2021-11-04
Oracle	ELSA-2021-4123	OL8	firefox	2021-11-04
Oracle	ELSA-2021-9541	OL7	httpd	2021-11-04
Oracle	ELSA-2021-9541	OL7	httpd	2021-11-04
Oracle	ELSA-2021-4134	OL7	thunderbird	2021-11-05
Oracle	ELSA-2021-4134	OL7	thunderbird	2021-11-05
Oracle	ELSA-2021-4130	OL8	thunderbird	2021-11-05
Red Hat	RHSA-2021:4381-01	EL8	GNOME	2021-11-09
Red Hat	RHSA-2021:4361-01	EL8	NetworkManager	2021-11-09
Red Hat	RHSA-2021:4593-01	EL8	annobin	2021-11-10
Red Hat	RHSA-2021:4599-01	EL8.1	annobin	2021-11-10
Red Hat	RHSA-2021:4600-01	EL8.2	annobin	2021-11-10
Red Hat	RHSA-2021:4598-01	EL8.4	annobin	2021-11-10
Red Hat	RHSA-2021:4519-01	EL8	autotrace	2021-11-09
Red Hat	RHSA-2021:4201-01	EL8	babel	2021-11-09
Red Hat	RHSA-2021:4384-01	EL8	bind	2021-11-09
Red Hat	RHSA-2021:4364-01	EL8	binutils	2021-11-09
Red Hat	RHSA-2021:4595-01	EL8	binutils	2021-11-10
Red Hat	RHSA-2021:4602-01	EL8.1	binutils	2021-11-10
Red Hat	RHSA-2021:4601-01	EL8.2	binutils	2021-11-10
Red Hat	RHSA-2021:4596-01	EL8.4	binutils	2021-11-10
Red Hat	RHSA-2021:4432-01	EL8	bluez	2021-11-09
Red Hat	RHSA-2021:4319-01	EL8	compat-exiv2-026	2021-11-09
Red Hat	RHSA-2021:4221-01	EL8	container-tools:2.0	2021-11-09
Red Hat	RHSA-2021:4222-01	EL8	container-tools:3.0	2021-11-09
Red Hat	RHSA-2021:4154-01	EL8	container-tools:rhel8	2021-11-09
Red Hat	RHSA-2021:4393-01	EL8	cups	2021-11-09
Red Hat	RHSA-2021:4511-01	EL8	curl	2021-11-09
Red Hat	RHSA-2021:4464-01	EL8	dnf	2021-11-09
Red Hat	RHSA-2021:4153-01	EL8	dnsmasq	2021-11-09
Red Hat	RHSA-2021:4198-01	EL8	edk2	2021-11-09
Red Hat	RHSA-2021:4173-01	EL8	exiv2	2021-11-09
Red Hat	RHSA-2021:4374-01	EL8	file	2021-11-09
Red Hat	RHSA-2021:4179-01	EL8	file-roller	2021-11-09
Red Hat	RHSA-2021:4116-01	EL7	firefox	2021-11-03
Red Hat	RHSA-2021:4123-01	EL8	firefox	2021-11-03
Red Hat	RHSA-2021:4607-01	EL8.1	firefox	2021-11-10
Red Hat	RHSA-2021:4605-01	EL8.2	firefox	2021-11-10
Red Hat	RHSA-2021:4386-01	EL8	gcc	2021-11-09
Red Hat	RHSA-2021:4587-01	EL8	gcc	2021-11-10
Red Hat	RHSA-2021:4592-01	EL8	gcc-toolset-10-annobin	2021-11-10
Red Hat	RHSA-2021:4589-01	EL8.4	gcc-toolset-10-annobin	2021-11-10
Red Hat	RHSA-2021:4588-01	EL8.4	gcc-toolset-10-binutils	2021-11-10
Red Hat	RHSA-2021:4585-01	EL8	gcc-toolset-10-gcc	2021-11-10
Red Hat	RHSA-2021:4591-01	EL8	gcc-toolset-11-annobin	2021-11-10
Red Hat	RHSA-2021:4594-01	EL8	gcc-toolset-11-binutils	2021-11-10
Red Hat	RHSA-2021:4586-01	EL8	gcc-toolset-11-gcc	2021-11-10
Red Hat	RHSA-2021:4385-01	EL8	glib2	2021-11-09
Red Hat	RHSA-2021:4358-01	EL8	glibc	2021-11-09
Red Hat	RHSA-2021:4451-01	EL8	gnutls and nettle	2021-11-09
Red Hat	RHSA-2021:4156-01	EL8	go-toolset:rhel8	2021-11-09
Red Hat	RHSA-2021:4226-01	EL8	grafana	2021-11-09
Red Hat	RHSA-2021:4256-01	EL8	graphviz	2021-11-09
Red Hat	RHSA-2021:4339-01	EL8	grilo	2021-11-09
Red Hat	RHSA-2021:4257-01	EL8	httpd:2.4	2021-11-09
Red Hat	RHSA-2021:4537-01	EL8	httpd:2.4	2021-11-10
Red Hat	RHSA-2021:4235-01	EL8	jasper	2021-11-09
Red Hat	RHSA-2021:4135-01	EL8	java-17-openjdk	2021-11-10
Red Hat	RHSA-2021:4382-01	EL8	json-c	2021-11-09
Red Hat	RHSA-2021:4356-01	EL8	kernel	2021-11-09
Red Hat	RHSA-2021:4140-01	EL8	kernel-rt	2021-11-09
Red Hat	RHSA-2021:4404-01	EL8	kexec-tools	2021-11-09
Red Hat	RHSA-2021:4122-01	EL8	kpatch-patch	2021-11-03
Red Hat	RHSA-2021:4597-01	EL8.1	kpatch-patch	2021-11-10
Red Hat	RHSA-2021:4325-01	EL8	lasso	2021-11-09
Red Hat	RHSA-2021:4326-01	EL8	libX11	2021-11-09
Red Hat	RHSA-2021:4409-01	EL8	libgcrypt	2021-11-09
Red Hat	RHSA-2021:4288-01	EL8	libjpeg-turbo	2021-11-09
Red Hat	RHSA-2021:4513-01	EL8	libsepol	2021-11-09
Red Hat	RHSA-2021:4408-01	EL8	libsolv	2021-11-09
Red Hat	RHSA-2021:4387-01	EL8	libssh	2021-11-09
Red Hat	RHSA-2021:4241-01	EL8	libtiff	2021-11-09
Red Hat	RHSA-2021:4231-01	EL8	libwebp	2021-11-09
Red Hat	RHSA-2021:4321-01	EL8	linuxptp	2021-11-09
Red Hat	RHSA-2021:4510-01	EL8	lua	2021-11-09
Red Hat	RHSA-2021:4526-01	EL8	mingw-glib2	2021-11-09
Red Hat	RHSA-2021:4181-01	EL8	mutt	2021-11-09
Red Hat	RHSA-2021:4426-01	EL8	ncurses	2021-11-09
Red Hat	RHSA-2021:4251-01	EL8	openjpeg2	2021-11-09
Red Hat	RHSA-2021:4368-01	EL8	openssh	2021-11-09
Red Hat	RHSA-2021:4424-01	EL8	openssl	2021-11-09
Red Hat	RHSA-2021:4373-01	EL8	pcre	2021-11-09
Red Hat	RHSA-2021:4142-01	EL8	pcs	2021-11-09
Red Hat	RHSA-2021:4213-01	EL8	php:7.4	2021-11-09
Red Hat	RHSA-2021:4161-01	EL8	python-jinja2	2021-11-09
Red Hat	RHSA-2021:4158-01	EL8	python-lxml	2021-11-09
Red Hat	RHSA-2021:4149-01	EL8	python-pillow	2021-11-09
Red Hat	RHSA-2021:4455-01	EL8	python-pip	2021-11-09
Red Hat	RHSA-2021:4324-01	EL8	python-psutil	2021-11-09
Red Hat	RHSA-2021:4151-01	EL8	python27:2.7	2021-11-09
Red Hat	RHSA-2021:4399-01	EL8	python3	2021-11-09
Red Hat	RHSA-2021:4150-01	EL8	python36:3.6	2021-11-09
Red Hat	RHSA-2021:4162-01	EL8	python38:3.8 and python38-devel:3.8	2021-11-09
Red Hat	RHSA-2021:4160-01	EL8	python39:3.9 and python39-devel:3.9	2021-11-09
Red Hat	RHSA-2021:4172-01	EL8	qt5	2021-11-09
Red Hat	RHSA-2021:4139-01	EL8	resource-agents	2021-11-09
Red Hat	RHSA-2021:4489-01	EL8	rpm	2021-11-09
Red Hat	RHSA-2021:4270-01	EL8	rust-toolset:rhel8	2021-11-09
Red Hat	RHSA-2021:4590-01	EL8	rust-toolset:rhel8	2021-11-10
Red Hat	RHSA-2021:4315-01	EL8	spamassassin	2021-11-09
Red Hat	RHSA-2021:4396-01	EL8	sqlite	2021-11-09
Red Hat	RHSA-2021:4292-01	EL8	squid:4	2021-11-09
Red Hat	RHSA-2021:4236-01	EL8	tcpdump	2021-11-09
Red Hat	RHSA-2021:4134-01	EL7	thunderbird	2021-11-04
Red Hat	RHSA-2021:4130-01	EL8	thunderbird	2021-11-04
Red Hat	RHSA-2021:4133-01	EL8.1	thunderbird	2021-11-04
Red Hat	RHSA-2021:4132-01	EL8.2	thunderbird	2021-11-04
Red Hat	RHSA-2021:4413-01	EL8	tpm2-tools	2021-11-09
Red Hat	RHSA-2021:4517-01	EL8	vim	2021-11-09
Red Hat	RHSA-2021:4191-01	EL8	virt:rhel and virt-devel:rhel	2021-11-09
Red Hat	RHSA-2021:4316-01	EL8	zziplib	2021-11-09
Scientific Linux	SLSA-2021:4116-1	SL7	firefox	2021-11-04
Scientific Linux	SLSA-2021:4134-1	SL7	thunderbird	2021-11-05
SUSE	SUSE-SU-2021:3637-1	OS8 OS9 SLE12	binutils	2021-11-09
SUSE	SUSE-SU-2021:3616-1	SLE15 SES6	binutils	2021-11-04
SUSE	SUSE-SU-2021:3643-1	SLE15 SES6	binutils	2021-11-10
SUSE	SUSE-SU-2021:3640-1	SLE15	kernel	2021-11-09
SUSE	SUSE-SU-2021:3641-1	SLE15	kernel	2021-11-09
SUSE	SUSE-SU-2021:3642-1	SLE15	kernel	2021-11-09
SUSE	SUSE-SU-2021:3619-1	SLE15	libvirt	2021-11-05
SUSE	SUSE-SU-2021:3635-1	OS9 SLE12	qemu	2021-11-09
SUSE	SUSE-SU-2021:3605-1	SLE15	qemu	2021-11-03
SUSE	SUSE-SU-2021:3604-1	SLE15	qemu	2021-11-03
SUSE	SUSE-SU-2021:3613-1	SLE15	qemu	2021-11-04
SUSE	SUSE-SU-2021:3614-1	SLE15 SES6	qemu	2021-11-04
SUSE	SUSE-SU-2021:3634-1	SLE15	rubygem-activerecord-5_1	2021-11-09
SUSE	SUSE-SU-2021:3611-1	SLE12	systemd	2021-11-04
SUSE	SUSE-SU-2021:3602-1	OS9 SLE12	tomcat	2021-11-03
SUSE	SUSE-SU-2021:3603-1	SLE15	webkit2gtk3	2021-11-03
Ubuntu	USN-5134-1	18.04 20.04 21.04 21.10	docker.io	2021-11-09
Ubuntu	USN-5131-1	18.04 20.04 21.04 21.10	firefox	2021-11-03
Ubuntu	USN-5133-1	14.04 16.04 18.04	icu	2021-11-05
Ubuntu	USN-5130-1	14.04	kernel	2021-11-08
Ubuntu	USN-5135-1	20.04 21.04 21.10	linux, linux-aws, linux-aws-5.11, linux-azure, linux-azure-5.11, linux-gcp, linux-gcp-5.11, linux-hwe-5.11, linux-kvm, linux-oem-5.13, linux-oracle, linux-oracle-5.11	2021-11-08
Ubuntu	USN-5137-1	18.04 20.04	linux, linux-aws, linux-aws-5.4, linux-azure, linux-azure-5.4, linux-gcp, linux-gcp-5.4, linux-gke, linux-gkeop, linux-gkeop-5.4, linux-hwe-5.4, linux-ibm, linux-kvm	2021-11-08
Ubuntu	USN-5136-1	14.04 16.04 18.04	linux, linux-aws, linux-aws-hwe, linux-azure, linux-azure-4.15, linux-dell300x, linux-gcp-4.15, linux-hwe, linux-kvm, linux-oracle, linux-raspi2, linux-snapdragon	2021-11-08
Ubuntu	USN-5132-1	21.10	thunderbird	2021-11-03

Full Story (comments: none)

Greg Kroah-Hartman Linux 5.15.1 Nov 06

Greg Kroah-Hartman Linux 5.14.17 Nov 06

Greg Kroah-Hartman Linux 5.10.78 Nov 06

Greg Kroah-Hartman Linux 5.4.158 Nov 06

Greg Kroah-Hartman Linux 4.19.216 Nov 06

Clark Williams 4.19.215-rt94 Nov 05

Ard Biesheuvel static call support for arm64 Nov 05

Mark Rutland arm64 / sched/preempt: support PREEMPT_DYNAMIC with static keys Nov 09

Greentime Hu riscv: Add vector ISA support Nov 09

Peter Oskolkov sched,mm,x86/uaccess: implement User Managed Concurrency Groups Nov 04

Beau Belgrave user_events: Enable user processes to create and write to trace events Nov 04

Dmitry Osipenko Introduce power-off+restart call chain API Nov 08

Qin Jian Add Sunplus SP7021 SoC Support Nov 04

Kohei Tarumizu Add hardware prefetch driver for A64FX and Intel processors Nov 04

Sander Vanheule Add Realtek Otto WDT support Nov 04

Martin Kaistra Add PTP support for BCM53128 switch Nov 04

amirmizi6@gmail.com Add tpm i2c ptp driver Nov 04

Antoniu Miclaus Add support for AD7293 Power Amplifier Nov 05

Bo Jiao add mt7916 support Nov 05

Bryan O'Donoghue Add pm8150b TPCM driver Nov 05

Mike Christie vhost: multiple worker support Nov 04

Cosmin Tanislav Add AD74413R driver Nov 05

Ricardo Neri Thermal: Introduce the Hardware Feedback Interface for thermal and performance management Nov 05

Tony Huang Add mmc driver for Sunplus SP7021 SOC Nov 06

LH.Kuo Add SD/SDIO control driver for Sunplus SP7021 SoC Nov 09

LH.Kuo Add SPI control driver for Sunplus SP7021 SoC Nov 09

Vincent Shih Add RTC driver for Sunplus SP7021 SoC Nov 09

LH.Kuo Add I2C control driver for Sunplus SP7021 SoC Nov 09

Hammer Hsieh Add UART driver for Suplus SP7021 SoC Nov 10

Adam Ford arm64: imx8mm: Enable Hantro VPUs Nov 06

Ming Qian amphion video decoder/encoder driver Nov 09

Chen Yu Introduce Platform Firmware Runtime Update and Telemetry drivers Nov 07

Ansuel Smith Adds support for PHY LEDs with offload triggers Nov 07

Sam Protsenko watchdog: s3c2410: Add Exynos850 support Nov 07

Arnaud Pouliquen Restructure the rpmsg_char driver and introduce rpmsg_ctrl driver Nov 08

Jarrett Schultz platform: surface: Introduce Surface XBL Driver Nov 08

Anand Ashok Dumbre Add Xilinx AMS Driver Nov 08

Florian Eckert leds: Add KTD20xx RGB LEDs driver from Kinetic Nov 09

Antoniu Miclaus Add support for ADMV8818 Nov 09

Akhil R Add NVIDIA Tegra GPC-DMA driver Nov 09

Stefan Mätje can: esd: add support for esd GmbH PCIe/402 CAN interface Nov 09

Sunil Muthuswamy PCI: hv: Hyper-V vPCI for arm64 Nov 09

Apeksha Gupta drivers/net: add NXP FEC-UIO driver Nov 10

Bhupesh Sharma Enable Qualcomm Crypto Engine on sm8150 & sm8250 Nov 10

Shawn Guo Add QCM2290 interconnect support Nov 10

Dafna Hirschfeld staging: media: wave5: add wave5 codec driver Nov 10

Chaitanya Kulkarni block: add support for REQ_OP_VERIFY Nov 03

Jane Chu Dax poison recovery Nov 05

Matthew Wilcox (Oracle) iomap/xfs folio patches Nov 08

Christoph Hellwig decouple DAX from block devices Nov 09

David Howells netfs, 9p, afs, ceph: Support folios, at least partially Nov 09

Qi Zheng Free user PTE page table pages Nov 10

Baskov Evgeniy Handle UEFI NX-restricted page tables Nov 10

Maciej Machnikowski Add RTNL interface for SyncE Nov 05

Sean Christopherson KVM: Scalable memslots implementation Nov 04

Atish Patra Add SBI v0.2 support for KVM Nov 05

Andrii Nakryiko libbpf: add unified bpf_prog_load() low-level API Nov 03

Andrii Nakryiko Future-proof more tricky libbpf APIs Nov 07

David Sterba Btrfs progs release 5.15 Nov 05

Simon Ser libdrm 2.4.108 Nov 08

Michal Kubecek ethtool 5.15 released Nov 09

LWN.net Weekly Edition for November 11, 2021

Idea

PEP 671

Winding down

Architecture-specific

Core kernel

Filesystems and block I/O

Hardware support

Networking

Security-related

Internal kernel changes

Development process

Using AMX

Multithreading

Distributed processing

Tasks and task migration

Conclusion

Brief items

Security

Kernel development

Development

Announcements

Newsletters

Distributions and system administration

Development

Meeting minutes

Calls for Presentations

CFP Deadlines: November 11, 2021 to January 10, 2022

Upcoming Events

Events: November 11, 2021 to January 10, 2022

Security updates

Kernel patches of interest

Kernel releases

Architecture-specific

Core kernel

Device drivers

Filesystems and block layer

Memory management

Networking

Virtualization and containers

Miscellaneous