LWN.net Weekly Edition for September 24, 2020
Welcome to the LWN.net Weekly Edition for September 24, 2020
This edition contains the following feature content:
- OpenPGP in Thunderbird: a 21-year-old feature request is finally satisfied.
- Python 3.9 is around the corner: what's coming in the next major Python release.
- Removing run-time disabling for SELinux in Fedora: turning off SELinux may get a little harder.
- The seqcount latch lock type: an introduction to an obscure kernel concurrency mechanism.
- Four short stories about preempt_count(): a change to how preemption is tracked in the kernel has wide implications.
- Accurate timestamps for the ftrace ring buffer: a detailed look at what was required to fix a timestamping problem in the kernel's tracing subsystem.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
OpenPGP in Thunderbird
It is a pretty rare event to see a nearly 21-year-old bug be addressed—many projects are nowhere near that old for one thing—but that is just what has occurred for the Mozilla Thunderbird email application. An enhancement request filed at the end of 1999 asked for a plugin to support email encryption, but it has mostly languished since. The Enigmail plugin did come along to fill the gap by providing OpenPGP support using GNU Privacy Guard (GnuPG or GPG), but was never part of Thunderbird. As part of Thunderbird 78, though, OpenPGP is now fully supported within the mail user agent (MUA).
The enhancement request actually asked for Pretty Good Privacy (PGP) support; PGP is, of course, the progenitor of OpenPGP. The standards effort that resulted in OpenPGP started in 1997. Back in 1999, PGP was the only real choice for email encryption, though the initial version of GnuPG had been released a few months before the request.
Early on, the main concerns expressed in the bug tracker were about the legality of shipping cryptographic code. The US government's attempts to restrict the export of cryptographic systems, known as the "crypto wars", were still fresh in the minds of many. It was not entirely clear that adding "munitions-grade crypto" to a MUA like Thunderbird was legal or wise. Early in 2000, the US revised its export-control regulations, which removed that particular concern.
There was work done toward adding support for OpenPGP and Secure/Multipurpose Internet Mail Extensions (S/MIME), which is another email encryption standard, over 2000 and 2001, but the code never actually landed. Thunderbird (called "mailnews" in those days) was in fire-fighting mode; fixing bugs and getting basic functionality working took precedence over new features like encryption. There was also a need to design a reasonable plugin mechanism.
Eventually, Enigmail showed up, which took some of the pressure off the Mozilla developers. Enigmail could be used on all of the supported platforms for Thunderbird to encrypt and decrypt PGP-style email (either inline or PGP/MIME) using GnuPG. Its initial maintainer, Ramalingam Saravanan, updated the bug with new information about Enigmail several times.
In the bug, multiple people suggested that Enigmail be incorporated into Thunderbird and the Enigmail developers were not opposed. In 2003, Patrick Brunschwig, who was a new maintainer for the plugin, said that doing so would help in getting rid of some of the "hacks" that were done to make Enigmail work with Thunderbird. But nothing like that ever happened.
Thunderbird itself has had something of a checkered past with
regard to its parent, Mozilla. On two separate occasions Thunderbird has
been spun out from the Mozilla nest. In 2007,
it left to allow Mozilla to focus on
Firefox. That led to the creation of Mozilla
Messaging as the new home for Thunderbird, which was reabsorbed
in 2011. But in 2012, support
for Thunderbird from Mozilla was reduced and in 2015 Thunderbird was given its walking papers again. Then, in 2017,
it was determined that the right place
for "Thunderbird’s legal, fiscal and cultural home
" was the
Mozilla Foundation.
All of that upheaval was likely not entirely conducive to focused development, but plenty of good work was done on the MUA over the intervening years, including adding S/MIME support along the way. However, integrating Enigmail or otherwise supporting OpenPGP never quite made the list. People would periodically pop up in the bug report to ask that it be resolved and occasionally Brunschwig would note that the decision was in the hands of the Thunderbird developers. That went on for many years, until an October 2019 blog post announcing the project's plans with respect to OpenPGP.
The announcement said that Thunderbird will be releasing a version in (northern hemisphere) summer 2020 with support for OpenPGP built right in. It will not be based on Enigmail, which will not be updated to the new Thunderbird plugin (or add-on) interface; Enigmail will effectively be in maintenance mode. It will be supported on the then-current Thunderbird 68 release, until that reaches end of life six months after 78 is released. But Brunschwig will be working on the OpenPGP support for Thunderbird and the plans were to help ensure that Enigmail keys and settings could make the transition.
In addition, the project plans to leave GnuPG behind, as explained by Kai Engert on the tb-planning mailing list. It comes down to licensing, at least in part. GnuPG is available under GPLv3, which means that shipping it as part of Thunderbird, which is under the Mozilla Public License 1.1, could be tricky to do right. But there is also a complexity factor:
If Thunderbird decided to distribute GnuPG software, the situation might get even more complicated. If users already have a copy of GnuPG installed on their system, we'd have to be careful to avoid any potential conflicts that might occur by having two competing copies of GnuPG installed on a computer.
It may be possible, eventually, to use GnuPG for Thunderbird cryptographic operations, but that is not a priority—except to support OpenPGP smartcards. The RNP library for OpenPGP, which is what is being used for Thunderbird, does not support smartcards, at least yet. In the interim, using GnuPG for smartcards will be supported for Thunderbird.
The OpenPGP wiki page lays
out the overall vision for the feature. As planned, OpenPGP
support was released as part of Thunderbird 78 in early
September. It comes with a migration tool to help Enigmail users make the
switch. In addition, the Mozilla Open Source Support
program provided a grant to security audit both RNP and the related
Thunderbird code. "We are happy to report that no critical or major
security issues were found, all identified issues had a medium or low
severity rating, and we will publish the results in the future.
"
There is an extensive HOWTO and FAQ document, a wiki status page, and a discussion forum for the "end-to-end encryption" (e2ee) feature in Thunderbird. The e2ee feature covers both OpenPGP and S/MIME in Thunderbird, though a support document for the feature only covers OpenPGP at the time of this writing.
The main difference, from a user perspective, between OpenPGP and S/MIME is the matter of keys. As with everything in the cryptography world, it seems, key management for email is a difficult problem. S/MIME takes a certificate approach to keys, like with TLS keys for HTTPS; keys are signed by certificate authorities, which can be done in-house or by third parties. OpenPGP depends on the decentralized web of trust, where keys are verified and signed by other users' keys. A key that is signed by a trusted key may also be trusted and those trust relationships can extended in a transitive fashion if desired.
Existing users of Enigmail will encounter some changes. For example, Enigmail "junior mode", which was added by the p≡p foundation, is not supported. Also, OpenPGP in Thunderbird does not support the web of trust directly:
It has been a long time coming, but it seems that OpenPGP has made its way into Thunderbird proper. It would be nice to believe that it will help broaden the adoption of email encryption, but that is probably a forlorn hope. Adding the feature will serve to highlight encryption, however, which may eventually pay dividends. But the key-management problem, in particular, is difficult and is likely the largest barrier to widespread adoption of email encryption.
Python 3.9 is around the corner
Python 3.9.0rc2 was released on September 17, with the final version scheduled for October 5, roughly a year after the release of Python 3.8. Python 3.9 will come with new operators for dictionary unions, a new parser, two string operations meant to eliminate some longstanding confusion, as well as improved time-zone handling and type hinting. Developers may need to do some porting for code coming from Python 3.8 or earlier, as the new release has removed several previously-deprecated features still lingering from Python 2.7.
Python 3.9 marks the start of a
new release cadence. Up until now, Python has done releases on an
18-month cycle. Starting with Python 3.9, the language has shifted to an
annual release cycle as defined by PEP 602 ("Annual
Release Cycle for Python
").
A table
provided by the project shows how Python performance has changed in a number
of areas since Python 3.4. It is interesting to note that Python 3.9 is worse
than 3.8 on almost every benchmark in that table, though it does perform
generally better than 3.7. That said, it is claimed that several Python
constructs such as range, tuple, list, and
dict will see improved performance in Python 3.9, though no specific
performance benchmarks are given. The boost is credited to the language
making more use of a fast-calling protocol for CPython that is described in
PEP 590
("Vectorcall: a fast calling protocol for CPython
").
As the PEP explains, Vectorcall replaces the existing tp_call convention which has poor performance because it must create intermediate objects for a call. While CPython has special-case optimizations to speed up this process for calls to Python and built-in functions, those do not apply to classes or third-party extension objects. Additionally, tp_call does not provide a function pointer per object (only per class), again requiring the creation of several intermediate objects when making calls to classes. Vectorcall is faster because it does not have the same intermediate-object inefficiencies that are found in tp_call. Vectorcall was introduced in Python 3.8, but starting with version 3.9 it is used for the majority the Python calling conventions.
New operators and methods
Python 3.9 includes new dictionary union operators, | and |=, which we have previously covered; they are used to merge dictionaries. The | operator evaluates as a union of two dictionaries, while the |= operator stores the result of the union in the left-hand side of the operation:
>>> z = {'a' : 1, 'b' : 2, 'c' : 3} >>> y = {'c' : 'foo', 'd' : 'bar' } >>> z | y {'a': 1, 'b': 2, 'c': 'foo', 'd': 'bar'} >>> z |= y >>> z {'a': 1, 'b': 2, 'c': 'foo', 'd': 'bar'}
There are many ways dictionaries can be merged in Python, but Andrew Barnert
said that the operator is designed to address the "copying
update
":
The problem is the copying update. The only way to spell it is to store a copy in a temporary variable and then update that. Which you can’t do in an expression. You can do _almost_ the same thing with {**a, **b}, but not only is this ugly and hard to discover, it also gives you a dict even if a was some other mapping type, so it’s making your code more fragile, and usually not even doing so intentionally.
In situations where the two dictionaries share a common key, the last-seen value for a key "wins" and is included in the merge as shown above for key c. While the standard union operator | only allows unions between dict types, the assignment variety |= can be used to update a dictionary with new key-value pairs from an iterable object:
>>> z = {'a' : 'foo', 'b' : 'bar', 'c' : 'baz'} >>> y = ((0, 0), (1, 1), (2, 8)) >>> z |= y >>> z {'a': 'foo', 'b': 'bar', 'c': 'baz', 0: 0, 1: 1, 2: 8}
PEP 584 ("Add
Union Operators To dict
") provides complete documentation of the new
operators.
Two new string methods have also been added in version 3.9: removeprefix()
and removesuffix().
These convenience methods make it easy to remove an unwanted prefix or suffix
from string data. As described in PEP 616 ("String
methods to remove prefixes and suffixes
"), these functions are being
added to address user confusion regarding the
str.lstrip() and
str.rstrip() methods, which are often mistaken as a means to
remove a prefix or suffix from a string. The confusion around
str.lstrip() and str.rstrip() comes from its optional
string parameter.
According to the PEP, the confusion for users stems from the fact that the
parameter passed to str.lstrip() and str.rstrip() is
interpreted as a set of individual characters to remove, rather than as a
single substring. With the additions, the project hopes to provide a
"cleaner redirection of users to the desired behavior.
" Using
these new methods is straightforward, as shown below:
>>> a = "PEP-616" >>> a.removeprefix("PEP-") '616'
Deprecation and porting
Developers should be aware of some features that are being deprecated and
removed in 3.9, as well as some more deprecations that are coming in 3.10.
Many Python 2.7 functions that emit a DeprecationWarning in version 3.8 "have been removed or will be removed soon
" starting with
version 3.9. The project recommends testing applications with the -W
default command-line option, which will show these warnings, before
upgrading. As we previously
covered, certain backward-compatibility layers, such as the aliases to
Abstract Base Classes in the collections
module, will remain for one last release before being removed in Python 3.10.
The complete
listing of removals in version 3.9 is available for interested readers.
Further, the release includes numerous new
deprecations of language features that will be removed in a future
release. An additional recommendation is to run tests in Python Development
Mode using the -X dev option to prepare code bases for future
changes.
Other goodies
As we reported, Python 3.9
ships with a new parsing expression
grammar (PEG) parser to replace the current LL(1) parser in version 3.8. In
PEP 617 ("New
PEG parser for CPython
") describing the change, the switch to the PEG
parser will eliminate "multiple 'hacks' that exist in the current
grammar to circumvent the LL(1)-limitation.
" This should help the
project substantially reduce the maintenance cost for the parser.
Python introduced type hinting in
version 3.5; the 3.9 release allows types like List and
Dict to be replaced with the built-in list and
dict varieties. Type hints in Python are mostly for linters and code
checkers, as they are not enforced at run time by CPython. PEP 585 ("Type Hinting
Generics In Standard Collections
") provides a listing of
collections that have become generics. Note that, with version 3.9, importing
the types (from typing) that are now built-in is deprecated. It
sounds like developers will have plenty of time to update their code,
however, as according to the PEP: "the deprecated functionality will be
removed from the typing module in the first Python version released 5 years
after the release of Python 3.9.0.
"
Thanks to flexible function and variable annotations, as described in
PEP 593
("Flexible function and variable annotations
"), Python 3.9 has a
new Annotated type. This allows the decoration of existing types
with context-specific metadata:
charType = Annotated[int, ctype("char")]
This metadata can be used in either static analysis or at run time; it is ignored entirely if it is unused. It is designed to enable tools like mypy to perform static type checking and provides access to the metadata at run time via get_type_hints(). To provide backward compatibility with version 3.8, a new include_extras parameter has been added to the get_type_hints() function with a default value of False, retaining the same behavior as existed in version 3.8. When include_extras is set to True, get_type_hints() will return the defined Annotation type for use.
Various other language changes can be expected in Python 3.9. __import__()
now raises ImportError instead of ValueError when a
relative import went past the top-level package. Decorators
have also been improved as described in PEP 614 ("Relaxing
Grammar Restrictions On Decorators
"), allowing any valid expression
(defined as "anything that's valid as a test in if, elif, and while
blocks
") to be used to invoke them. In Python 3.8, the expressions
available for use to invoke a decorator is limited. While the decorator
grammar limitations "were rarely encountered in practice
",
according to the PEP, they occurred often enough over the years to be worth
fixing in 3.9. The PEP has an example showing
how PyQt5 currently works
around the limitations.
Two new
modules are provided as part of the Python 3.9 standard library: zoneinfo
and graphlib.
The zoneinfo module, which we have previously covered, provides support
for the IANA time zone database
and includes zoneinfo.ZoneInfo,
which is a concrete datetime.tzinfo implementation allowing users to
load time zone data identified by an IANA name. The graphlib module
provides
graphlib.TopologicalSorter, a class that implements topological
sorting of graphs. In addition to these two new modules, many
existing modules were improved in various ways. One notable change
involves the asyncio module,
which no longer supports the reuse_address parameter of
asyncio.loop.create_datagram_endpoint() due to
"significant security concerns.
" The bug report describes a problem when
using SO_REUSEADDR on UDP in Linux environments. Setting
SO_REUSEADDR allows multiple processes to listen on sockets for the
same UDP port, which will pass incoming packets to each randomly; setting
reuse_address to True in a Python script would enable this
behavior.
There are a lot of interesting things worth checking out in Python 3.9, and the project's "What's new in Python 3.9" document is recommended for all the details. Additionally, the changelog provides an itemized list of changes between release candidates. Since no more release candidates of Python 3.9 are expected before the final version, developers may want to start testing their existing code to get a head start on the final release.
Removing run-time disabling for SELinux in Fedora
Disabling SELinux is, perhaps sadly in some ways, a time-honored tradition for users of Fedora, RHEL, and other distributions that feature the security mechanism. Over the years, SELinux has gotten easier to tolerate due to the hard work of its developers and the distributions, but there are still third-party packages that recommend or require disabling SELinux in order to function. Up until fairly recently, the kernel has supported disabling SELinux at run time, but that mechanism has been deprecated—in part due to another kernel security feature. Now Fedora is planning to eliminate the ability to disable SELinux at run time in Fedora 34, which sparked some discussion in its devel mailing list.
SELinux is a Linux Security Module (LSM) for enforcing mandatory access control (MAC) rules. But the "module" part of the LSM name has been a misnomer since a 2007 change to make the interface static and remove the option to load LSMs at run time. So kernels are built with a list of supported LSMs, and they can be enabled or disabled at boot time using kernel command-line options. Certain architectures had bootloaders that made it difficult for users to add parameters to the command line, though, so the SELinux developers added a way to disable it at run time. The need for that functionality has faded, and removing it will allow another kernel hardening feature to be used.
The post-init read-only memory feature provides a way to mark certain kernel data structures as read-only after the kernel has initialized them. The idea is that various data structures are prime targets for kernel exploits; function-pointer structures, like those used by the LSM hooks, are of particular interest. So the LSM hooks were protected that way. However, that hardening is only enabled if the ability to disable SELinux at run time is not present in the kernel. The presence of the SELinux feature is governed by the CONFIG_SECURITY_SELINUX_DISABLE kernel build option.
In order to get that hardening feature, Ben Cotton posted a proposal for Fedora 34 to remove the support for disabling SELinux at run time. The proposal is owned by Petr Lautrbach and Ondrej Mosnacek; it would migrate users to the selinux=0 command-line option if they are currently disabling SELinux via the SELINUX=disabled setting in /etc/selinux/config. The proposal, which has been updated on the Fedora wiki based on feedback, would not change the ability to switch SELinux between enforcing and permissive modes at run time using setenforce
The 5.6 kernel deprecated the run-time-disable feature for SELinux. The kernel currently prints a message to that effect, but there are plans to make using it even more painful by sleeping for five seconds when it is used. It may get even more obnoxious over time; eventually the plan is to remove it altogether. Red Hat distributions (Fedora, CentOS, RHEL) are the only known users of the feature at this point, so once they have all moved away, the feature can be removed from the kernel. RHEL and CentOS systems will stick around for a lot longer than Fedora systems, since it is only supported for a bit over year. But Red Hat will just continue to maintain the feature in the RHEL/CentOS kernels; removing the run-time disable from Fedora presumably means that the next RHEL/CentOS major release will no longer support it either.
The proposal seeks to smooth the path for users who upgrade but have SELinux already disabled via the config file. The kernel command-line parameter will not be automatically added, but it will be documented as the only real way to disable SELinux. Systems without selinux=0 at boot, but that disable it in the config file, will simply get a system that has SELinux in it, but without any policy loaded, so the run-time impact should be minimal. In addition, the SELinux filesystem will not be mounted at /sys/fs/selinuxfs.
In general, the reaction was positive, though there were some concerns.
James Cassell wondered
about forcing systems that have SELinux disabled in user space, but not in
the kernel, to loudly fail. He was concerned that the performance impact
of SELinux without any policy loaded could affect "certain real-time
use cases
", as he had found with SELinux in permissive mode.
But Mosnacek said
that the impact should be minimal. He did think
it would be useful to alert the system administrator to that situation, but
was not sure how to go about it beyond just documenting it in the release
notes. Lautrbach suggested
possibly adding
a systemd unit that would warn users once if that situation was found, but
he also pointed out that the existing description in the comments of
/etc/selinux/config accurately describe the situation:
# disabled - No SELinux policy is loaded.^^ this is exactly what will happen when this change is accepted. SELinux will be enabled internally in kernel but no policy will be loaded and as it was already explained for [users] this situation should be almost the same as SELinux disabled.
There were also some suggestions for improvements to the proposal document from Vít Ondruch and Michal Schorm. In particular, Schorm wanted to see more clarity in the document that the change does not affect the ability to switch to SELinux permissive mode at run time. Those changes have been made and the Fedora Engineering Steering Committee (FESCo) ticket shows five "+1" votes at the time of this writing, so its adoption is all-but-assured.
Silent denials
An interesting sidelight came up in the discussion, though. Richard Hughes asked about SELinux denials that do not generate audit messages. Typically, when SELinux denies an operation, it puts out an audit log entry (known as an access vector cache, or AVC, denial), but sometimes that does not happen.
Whilst I'm of course in favour of fixing the lockdown issue, would it also be fair to say that any selinux regression not triggering an AVC (which is fixed using selinux=0) would block this kind of proposal?
It turns out that there is a class of SELinux denials that are marked as "dontaudit", Mosnacek said:
It should not require rebooting to turn off SELinux in order to test whether that is occurring; switching to permissive mode should be sufficient. He also noted that the semodule command can be used to disable the dontaudit rules (and re-enable them). That should at least provide a useful error message, which should help Hughes (and others) figure out mysterious failures from SELinux:
Neal Gompa helped fill in some of the background for how these dontaudit rules get added. Based on conversations that he has witnessed, Red Hat customers are often the driving force:
"Hiding" denials is certainly unfortunate—and counterproductive, at least for Fedora. The distribution should perhaps consider its developer audience and disable the dontaudit rules by default. That may risk swamping the audit log with harmless denials, as described by Mosnacek and in this section of the Red Hat SELinux documentation, but it can be seriously maddening to try to debug a problem where SELinux is quietly denying some action. Raising the profile of dontaudit rules is clearly called for; that may help avoid more sudden baldness in frustrated users.
The ability to disable SELinux at run time would seem to be something that is unlikely to be missed. There are other, better ways to accomplish the same goals; removing it paves the way for Fedora systems to get more hardening in the form of read-only protection on the LSM hooks.
The seqcount latch lock type
The kernel contains a wide variety of locking primitives; it can be hard to stay on top of all of them. So even veteran kernel developers might be forgiven for being unaware of the "seqcount latch" lock type or its use. While this lock type has existed in the kernel for several years, it is only being formalized with a proper type declaration in 5.10. So this seems like a good time to look at what these locks are and how they work.
Seqcounts and seqlocks
Seqcounts (and seqlocks, which are built on top of seqcounts) are among the many primitives used to reduce locking overhead in specific situations; their use is most indicated when reads to protected data far outnumber writes, and updates to the data are quick when they do happen. Rather than preventing concurrent access to data, seqcounts and seqlocks work by detecting when a reader and a writer collide and forcing readers to retry in such situations. They were first introduced for the 2.5.60 development kernel in 2003, and have grown considerably in complexity since then.
Seqcounts are the lowest-level piece of this mechanism; at their core, they are a simple counter that is incremented whenever the protected data is modified. Indeed, the counter is incremented twice, once before the process of modifying the data begins with a call to:
static inline void raw_write_seqcount_t_begin(seqcount_t *s) { s->sequence++; smp_wmb(); }
and once after modification is complete by calling:
static inline void raw_write_seqcount_t_end(seqcount_t *s) { smp_wmb(); s->sequence++; }
(Some debugging instrumentation has been removed from the above). As can be seen, write-side seqcount operations come down to incrementing the counter, plus some carefully placed write barriers (the calls to smp_wmb()) to ensure the correct ordering between changes to the counter and to the protected data. One key point to note here is that the counter, which starts at zero, will be odd while modification is taking place, and even otherwise.
Before a reader can access the protected data, it must enter the critical section with a call to:
static inline unsigned __read_seqcount_t_begin(const seqcount_t *s) { unsigned ret; repeat: ret = READ_ONCE(s->sequence); if (unlikely(ret & 1)) { cpu_relax(); goto repeat; } return ret; }
(Again, debugging code has been removed; note also that real users will call higher-level functions built on the above). This function starts by checking whether modification is currently taking place (as indicated by the sequence counter having an odd value); if so, it will spin until the sequence count is incremented again (the cpu_relax() call serves a few functions, including inserting a compiler barrier and potentially letting an SMT sibling run). Then the current counter value is returned and the caller can provisionally read the protected data. Once that has been done, the section is exited with a call to:
static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start) { return unlikely(READ_ONCE(s->sequence) != start); }
The return value from this function tells the caller whether modification of the data has occurred while it was being read; if __read_seqcount_t_retry() returns true, the caller must go back to the beginning and try again. For this reason, accesses to seqcount-protected data is normally coded as a do..while loop that repeats until the data has been successfully read.
Upon this simple foundation has been built a whole array of variants for specific use cases. Many callers in the kernel use the higher-level seqlock_t type, which handles details like concurrency among writers among other things. See include/linux/seqlock.h for lots of details.
The seqcount latch type
While the above interface works in most situations, there is one important case where things fall apart: if a reader ever preempts a writer on the same CPU. For example, if a writer is preempted by an interrupt handler, and that handler attempts to enter a read section for the same data, the CPU will deadlock while the reader spins waiting for an update that will never complete. This situation is normally avoided by disabling preemption and interrupts while the write is taking place; that is one of the many things handled by the higher-level seqlock interfaces.
There are times, though, when it is not possible to completely block interrupts; in particular, code that might be called within a non-maskable interrupt is, as the name suggests, not maskable. Blocking preemption and interrupts also tends to be unwelcome in realtime kernels; another solution must be found for those cases. One such solution, introduced by Mathieu Desnoyers in 2014, is the seqcount latch. It avoids the possibility of an infinite spin at the cost of maintaining two copies of the protected data.
In particular, if a structure of type struct mydata is to be protected with a seqcount latch, that structure will need to be declared as:
struct mydata data[2];
At any given time, one entry in that array will be considered live and available, while the other is reserved for modifications by a writer. The least-significant bit in the sequence counter indicates which element should be read at any given time. Code for the read side now looks something like this:
do { seq = raw_read_seqcount_latch(&seqcount); index = seq & 0x01; do_something_with(data[index]); } while (read_seqcount_retry(&seqcount, seq));
There is still a loop here, which detects concurrent modification of the data. But if a writer has been interrupted by a reader, the count will not change and there will be no need to retry the access.
To update the protected data, the writer simply makes any modifications to the entry in the data array that is not currently being used by the readers. Nobody should be looking at that entry, so there should be no need for any particular protection (unless concurrent writers are a possibility, of course). When the new data is ready, the writer calls:
static inline void raw_write_seqcount_t_latch(seqcount_t *s) { smp_wmb(); /* prior stores before incrementing "sequence" */ s->sequence++; smp_wmb(); /* increment "sequence" before following stores */ }
After this call, readers will be directed to the new version of the data. For an example of how seqcount latches are used, see the handling of timekeeping data (read side and write side) in kernel/time/sched_clock.c.
The 5.10 kernel will see the merging of a patch series from Ahmed Darwish that formalizes the seqcount latch API. Since it was first introduced, the seqcount latch has been implemented as a sort of "off-label" use of the seqcount type, changing its semantics in ways that, one hopes, all users understand. Darwish, instead, has concluded that the seqcount latch is a separate type of lock that should be handled independently of seqcounts.
Thus, his patch set introduces a new seqcount_latch_t type and changes the prototypes of the relevant functions to expect parameters of that type. That helps to nail down the actual semantics of the seqcount latch and ensures that callers won't mix locks of that type up with ordinary seqcounts. The interface still lives in <linux/seqlock.h>, but it could logically be moved elsewhere at this point.
None of this is likely to make the use of seqcount latch locks more popular; the situations where they are needed are rare indeed. There are only four users in the 5.9 kernel, and one of those is removed in Darwish's patch set as an "abuse" of the type (though, if one counts users of the latch tree type, the number goes up slightly). If a kernel developer is wondering if a seqcount latch is needed in a given situation, the answer is almost certainly "no". But it is illustrative of the lengths to which kernel developers must go in order to provide safe-but-fast access to critical system data in all situations.
Four short stories about preempt_count()
The discussion started out as a straightforward patch set from Thomas Gleixner making a minor change to how preemption counting is handled. The resulting discussion quickly spread out to cover a number of issues relevant to core-kernel development in surprisingly few messages; each of those topics merits a quick look, starting with how the preemption counter itself works. Sometimes a simple count turns out to not be as simple as it seems.
preempt_count()
In a multitasking system like Linux, no thread of execution is guaranteed exclusive access to the processor for as long as it would like to run; the kernel can (almost) always preempt a running thread in favor of one that has a higher priority. That new thread might be a different process, but it might also be a hardware interrupt or other outside event. In order to properly coordinate the running of all of a system's tasks, the kernel must keep track of the current execution state, including anything that has been preempted or which might prevent a thread from being preempted.
One piece of the infrastructure for that tracking is the preemption counter that is stored with every task in the system. That counter is accessed via the preempt_count() function which, in its generic form, looks like this:
static __always_inline int preempt_count(void) { return READ_ONCE(current_thread_info()->preempt_count); }
The purpose of this counter is to describe the current state of whatever thread has been running, whether it can be preempted, and whether it is allowed to sleep. To do so, it must track a number of different states, so it is split into several sub-fields:
The least-significant byte tracks the nesting of preempt_disable() calls — the number of times that preemption has been disabled so far. The next three fields track the number of times the running thread has been interrupted by software, hardware, and non-maskable interrupts; they are all probably oversized for the number of interruptions that is likely to ever happen in real execution, but bits are not in short supply here. Finally, the most-significant bit indicates whether the kernel has decided that the current process needs to be scheduled out at the first opportunity.
A look at this value tells the kernel a fair amount about what is going on at any given time. For example, any non-zero value for preempt_count indicates that the current thread cannot be preempted by the scheduler; either preemption has been disabled explicitly or the CPU is currently servicing some sort of interrupt. In the same way, a non-zero value indicates that the current thread cannot sleep, since it is running in a context that must be allowed to run to completion. The "reschedule needed" bit tells the kernel that there is a higher-priority process that should be given the CPU at the first opportunity. This bit cannot be set unless preempt_count is non-zero; otherwise the kernel would have simply preempted the process rather than setting the bit and waiting.
Code throughout the kernel uses preempt_count to make decisions about which actions are possible at any given time. That, as it turns out, can be a bit of a problem for a few reasons.
Misleading counts
It is worth noting that preempt_disable() only applies when a thread is running within the kernel; user-space code is always preemptible. In the distant past, the kernel did not support preemption of kernel-space code at all; when that feature was added (as a way of improving latency), it was also made configurable. There are, as a result, still systems out there that are running without kernel preemption at all; it is a configuration that might make sense for some throughput-oriented workloads.
If kernel code cannot be preempted, there is little value in tracking calls to preempt_disable(); preemption is always disabled. So the kernel doesn't waste its time maintaining that information; in such kernels, the preempt-disable count portion of preempt_count is always zero. The preemptible() function will always return false, since the kernel is indeed not preemptible. It all seems to make sense.
There are some problems that result from this behavior, though. One is that functions like in_atomic(), which indicates whether the kernel is currently running in atomic context, do not behave in the same way. On a kernel with preemption configured in, calling preempt_disable() will cause in_atomic() to return true; if preemption is configured out, preempt_disable() is a no-op and in_atomic() will return false. This can cause in_atomic() to return false when, for example, spinlocks are held — a situation that is indeed an atomic context.
Gleixner, in his patch set, points out some other problems that result from this inconsistency and says that it is a problem overall:
His solution is to remove the conditional compilation for the
preemption-disable tracking, causing that counter to be maintained even in
kernels that do not support kernel preemption. There is a cost in terms of
increased execution time and code size on machines running those
configurations, but Gleixner says that his benchmark testing "did not
reveal any measurable impact
" from the change.
Linus Torvalds was
not convinced about the value of this change, noting that the code
generation for spinlocks is indeed better when preemption is not possible.
Gleixner reiterated
that the effect is not measurable, and Torvalds conceded
that the patch set does make the code simpler and "has its very clear
charm
".
Using preempt_count
Torvalds's larger complaint, though, was about code that uses
preempt_count to change its behavior depending on the context.
Such code, he said,
is "always simply fundamentally wrong
". Code that changes its
behavior depending on the context should have that context passed in as a
parameter, he said, so that callers know what to expect. Thus, the
GFP_ATOMIC flag to the memory-allocation functions is acceptable,
but changing behavior based on the return value from in_atomic()
is not.
To an extent, there is general agreement with this position. Gleixner's
patch posting included a section with future plans to audit and fix callers of
functions like in_atomic() where, he says, "the number of
buggy users is clearly the vast majority
". Daniel Vetter added
that, in his experience, "code that tries to cleverly adjust its
behaviour depending upon the context it's running in is harder to
understand and blows up in more interesting ways
".
Paul McKenney, instead, argued that some code has to be able to operate properly in different contexts; the alternative would be an explosion of the API:
In response, Torvalds clarified that he sees core-kernel code as having different requirements than the rest. Core code has to deal with multiple contexts and should always do the right thing; code in drivers, instead, should not be changing its behavior based on its view of the context.
No hard conclusions were reached in this branch of the discussion. It does seem likely, though, that code with context-dependent behavior will be looked at more closely in the future.
Questioning high memory
One example of questionable use of preempt_count, in the crypto code, was pointed out early in the discussion by Gleixner; it changes a memory allocation mode in strange ways if it thinks that it's not currently preemptible. After some discussion, it turned out, according to Ard Biesheuvel, that the real purpose had been to avoid using kmap_atomic() if possible.
For those who are not immediately familiar with kmap_atomic(), a look at this article on high memory might be helpful. In short: 32-bit machines can only map a limited amount of memory into the kernel's address space; that amount is a little under 1GB on most architectures and configurations. Any memory that is not directly mapped is deemed "high memory"; any page in high memory must be explicitly (and temporarily) mapped into the kernel before the kernel can access its contents. The functions kmap() and kmap_atomic() exist to perform this mapping.
There are a few differences between those two functions, starting with the fact that only kmap_atomic() is callable in atomic context. Beyond that, though, kmap_atomic() is more efficient and is thus seen as being strongly preferable in any situation where it can be used, regardless of whether the caller was running in atomic context before the call (the CPU will always be running in atomic context while the mapping is in place). As Biesheuvel pointed out, though, the documentation doesn't reflect this preference and encourages the use of kmap() instead, so that is what he did.
There is another reason to prefer kmap(), he added; a call to kmap_atomic() disables preemption even on 64-bit architectures, where high memory does not exist and no temporary mapping need be made. Using it would have resulted in much of the WireGuard VPN code running with preemption disabled, entirely unnecessarily. Torvalds pointed out that there is a reason for this behavior: it is there to cause code to fail on 64-bit machines if it does things that would not work on 32-bit machines where high memory does exist. It is essentially a debugging aid that is making up for the fact that few developers run on 32-bit machines.
One way to optimize kmap_atomic() on 64-bit systems, Gleixner said, would
be to make kmap_atomic() sections be preemptible — no longer
atomic, in other words. This approach has been taken in the realtime
kernels, he said, and "it's not that horrible
". The cost
would be to make kmap_atomic() a little slower on systems where
high memory is in use.
That, it seems, is a cost that the development community is increasingly
willing to pay; Torvalds replied
that he would like to start removing kmap() support entirely.
32-Bit systems will be around for some time yet, but they are increasingly
unlikely to be used in situations where lots of memory is needed. Or, as
Torvalds put it: "It's not that 32-bit is irrelevant, it's that
32-bit with large amounts of memory is irrelevant
". Every time that
the cost of supporting high memory (which adds a significant amount of
complexity to the memory-management subsystem) makes itself felt, the
desire to take it out grows.
That said, nobody will be removing high-memory support right away. But a change that penalizes high-memory systems in favor of the systems that are being deployed now, such as making kmap_atomic() no longer be atomic, is increasingly likely to be accepted. Meanwhile, the other issues around preempt_count remain mostly unresolved, but it seems likely that, in the end, changes that bring correctness and reduce complexity will probably win out.
Accurate timestamps for the ftrace ring buffer
The function tracer (ftrace) subsystem has become an essential part of the kernel's introspection tooling. Like many kernel subsystems, ftrace uses a ring buffer to quickly communicate events to user space; those events include a timestamp to indicate when they occurred. Until recently, the design of the ring buffer has led to the creation of inaccurate timestamps when events are generated from interrupt handlers. That problem has now been solved; read on for an in-depth discussion of how this issue came about and the form of its solution.
The ftrace ring buffer
The ftrace ring buffer was added in 2008 and, a little less than a year later, it became completely lockless. The design of the ring buffer split it into per-CPU buffers; each per-CPU buffer has a series of sub-buffers, the size of which happens to be the architecture's page size. This sizing was not a requirement of the design, but it is a convenient size for the splice() system call. Each sub-buffer begins with a header that includes, among other things, a full timestamp for the first event stored there.
Writes to a specific per-CPU buffer can only happen on the CPU for that buffer. That ensures that any contention between writers will always be in stack order. That is: a write being done in normal context could only have a contending writer running in an interrupt context, and that write must completely finish before returning back to normal context. There is no need to worry about parallel writers that are executing on other CPUs. Interrupted writes will thus always appear as shown below:
The design of the ring buffer depends on the fact that writers that interrupt other writers will completely finish before the interrupted writer may continue. This allows for some flexibility in how the writers can remain lockless. Although this simplified the coordination between writes, it added extra complexity to the tracking of time.
Before going into timestamp management, an understanding of how space is reserved on the ring buffer is necessary. An index is used to denote where the last event in the sub-buffer was written. The length of each new event is added to the index with local_add_return() (which can be used since this is a per-CPU index) and the location for the new event is simply the returned value minus the length of the event.
Obviously, if the value returned is greater than the size of the sub-buffer, it means there's no more room on the sub-buffer for this event, and the logic to move to the next page in the ring buffer is invoked.
Timestamps
A 64-bit timestamp requires eight bytes to store. The bigger an event is, the longer it takes to write it and the fewer of them a ring buffer may hold. To keep the event header small, the ring buffer code tries to avoid storing the full timestamp. An event on the ring buffer looks like this:
struct ring_buffer_event { u32 type_len:5, time_delta:27; u32 array[]; }
The first five bits of the event header determines its type and size, where a value of 29 means it is a padding event, 30 is a time-extend event, and 31 is an absolute timestamp. If the value is between one and 28, it represents an event with a data payload that starts at the array field, and the total event size is the type_len times four. If the total event size is greater than 112 (or 4*28) bytes, then type_len is set to zero, the 32-bit array field will hold the length of the event, and the data payload starts immediately after the array field. With most events having a size of 112 bytes or less, this helps keep the events compacted. Note that all events are four-byte aligned.
The next 27 bits of the first integer of the event is the time_delta. This field holds the delta of time since the last event (or zero if it's the first event on the sub-buffer, which holds a full eight-byte timestamp in its header). If the timestamp is in nanoseconds, the largest delta that can be stored is 134.217728 milliseconds (227 nanoseconds). If an event comes in after 134.217728 milliseconds, a time-extend event is added, which uses both the time_delta and the 32 bits of the array field to create a delta of up to 18 years (259 nanoseconds).
Tom Zanussi needed full timestamps from the events at the time they were recorded to get histograms to work. As the events only held deltas, a new event was created to store 59 bits of the full timestamp since boot up to allow the histograms to store the exact timestamp used for an event. The type 31 was used to denote this new event, which has the same make up as a time extend but, instead of being a delta, it would hold the time since boot. In actuality, this new time stamp could replace time extends since they could only fail if a machine was running for over 18 years without a reboot.
The problem with nested writes
Using a delta from the previous event proved to be a troublesome design; it requires saving the timestamp of the last event written into the ring buffer for use in calculating the delta stored in the next event. This put several actions in play that need to be atomic but cannot be:
- Reading the timestamp to use for the current event.
- Reserving a spot on the ring buffer to store the current event.
- Calculating the delta of the timestamp of the current event from the timestamp of the previous event.
- Saving the timestamp used for the current event to calculate the delta for the next event.
Any of the above steps can be interrupted by another context, such as an interrupt or non-maskable interrupt (NMI). This makes it difficult to know if the delta stored for the current event was really the delta since the timestamp of the previous event. After the last timestamp is retrieved for the delta calculation, an interrupt may occur and several events may be injected into the ring buffer before storage is allocated for the current event:
The timestamp for the new event must be taken before the allocation, so that it can be used to calculate deltas for events that may come in via an interrupt that occurs right after the storage was allocated. Even if a full timestamp were written for the interrupt events, the timestamp used for the interrupted event, if retrieved after the space allocation, would be later than the interrupt-event timestamps, even though the interrupted event itself happened first, as shown below:
Regardless of whether the timestamp is taken before or after the allocation is performed, the interrupt situations described above will cause time to appear to go backward in the ring buffer. That is considered unacceptable because it breaks the merge sort used when all of the per-CPU buffers are shown together as a single output.
The approach taken to avoid this problem was simply to write a zero delta for events that interrupt the writing of another event. Unfortunately, this makes it look as if time stood still. The obvious problem with this approach is that you lose the time between events when they interrupted the writing of another event. The output will look like all the events happened instantaneously. This approach has been satisfactory for the last 12 years, but it was a design flaw that needed to be fixed.
To see this problem in real use, try running a command like:
trace-cmd record -p function
for a while and then running:
trace-cmd report --debug -l -t --ts-diff --cpu 4
on the output file. Here, --debug shows where the sub-buffer breaks are, -l shows latency information (interrupt context), -t keeps the timestamps in nanosecond format (otherwise it will truncate to microseconds), --ts-diff shows the delta between events, and --cpu 4 is used just because I found what I was looking for on CPU 4 (I searched for the time delta of zero). This gives a good idea of the impact of what happens when events occur after interrupting the writing of another event.
trace-cm-1724 4.... 137.210588990: (+84) function: kfree [84:0xf44:24] trace-cm-1724 4.... 137.210589078: (+88) function: wakeup_pipe_writers [88:0xf60:24] trace-cm-1724 4d.h. 137.210589709: (+631) function: __sysvec_apic_timer_interrupt [631:0xf7c:24] trace-cm-1724 4d.h. 137.210589709: (+0) function: hrtimer_interrupt [0:0xf98:24] trace-cm-1724 4d.h. 137.210589709: (+0) function: _raw_spin_lock_irqsave [0:0xfb4:24] trace-cm-1724 4d.h. 137.210589709: (+0) function: ktime_get_update_offsets_now [0:0xfd0:24] CPU:4 [SUBBUFFER START] [137210590461:0x27c53000] trace-cm-1724 4d.h. 137.210590461: (+752) function: __hrtimer_run_queues [0:0x10:24] trace-cm-1724 4d.h. 137.210590461: (+0) function: __remove_hrtimer [0:0x2c:24] trace-cm-1724 4d.h. 137.210590461: (+0) function: _raw_spin_unlock_irqrestore [0:0x48:24] trace-cm-1724 4d.h. 137.210590461: (+0) function: tick_sched_timer [0:0x64:24] trace-cm-1724 4d.h. 137.210590461: (+0) function: ktime_get [0:0x80:24] trace-cm-1724 4d.h. 137.210590461: (+0) function: tick_sched_do_timer [0:0x9c:24] trace-cm-1724 4d.h. 137.210590461: (+0) function: tick_sched_handle.isra.0 [0:0xb8:24] trace-cm-1724 4d.h. 137.210590461: (+0) function: update_process_times [0:0xd4:24] [...] trace-cm-1724 4d.s. 137.210590461: (+0) function: rcu_segcblist_pend_cbs [0:0x940:24] trace-cm-1724 4d.s. 137.210590461: (+0) function: rcu_disable_urgency_upon_qs [0:0x95c:24] trace-cm-1724 4d.s. 137.210590461: (+0) function: rcu_report_qs_rnp [0:0x978:24] trace-cm-1724 4d.s. 137.210590461: (+0) function: _raw_spin_unlock_irqrestore [0:0x994:24] trace-cm-1724 4..s. 137.210590461: (+0) function: rcu_segcblist_ready_cbs [0:0x9b0:24] trace-cm-1724 4d.s. 137.210590461: (+0) function: irqtime_account_irq [0:0x9cc:24] trace-cm-1724 4.... 137.210590461: (+0) function: kill_fasync [0:0x9e8:24] trace-cm-1724 4.... 137.210605019: (+14558) function: pipe_unlock [14558:0xa04:24] trace-cm-1724 4.... 137.210606026: (+700) function: __x64_sys_splice [700:0xa58:24]
Looking at this output, I can tell that the call to __sysvec_apic_timer_interrupt() happened from an interrupt that came in as the call to kill_fasync() started to be recorded but before it reserved space on the ring buffer. I know this because __sysvec_apic_timer_interrupt() has a time delta, thus it was able to reserve space on the ring buffer before kill_fasync() was able to, but after the processing of the event for kill_fasync() started. Once the processing of events happen, only the first event to get on the ring buffer will have a delta timestamp, then all events after that (including the one that was interrupted because its storage comes later), gets a zero delta.
The --debug option for trace-cmd report is what caused the extra data to show in the output, which includes this line:
CPU:4 [SUBBUFFER START] [137210590461:0x27c53000]
This output indicates that the trace crossed over a sub-buffer page at this point. As each sub-buffer stores an absolute timestamp, the first event on the sub-buffer will also have a delta as shown above.
Over the years, this flaw really bothered me; I would spend countless hours thinking about how to find a way to reliably make the nested timestamps meaningful. The fact that we only needed to worry about stacked writes and not concurrent writes made me believe there was a solution. As there are only realistically four levels of the stack to worry about, I thought I could make a state for each level and use the above and below states to synchronize the timestamps. Those four levels are: normal context, software-interrupt (softirq) context, interrupt context, and NMI context.
Theoretically, you could have a machine check during an NMI, making a fifth level, but odds of a softirq interrupting the writing of an event, and it too writing an event that gets interrupted by an interrupt, that then writes an event, where an NMI were to trigger and it too were to write an event is extremely low, and to put a machine check on top of that was even lower. Even with running function tracing that traces every function in all contexts, I had trouble finding one nested level, let alone four levels. And we could detect the nesting level, so the worst that could happen is that we store zero for the delta on detecting it. This turned out not to be a worry as my solution does not need to know about the levels.
Avoiding cmpxchg()
In all my prior attempts to solve this problem, I tried hard to avoid the use of local_cmpxchg() (the local CPU version of cmpxchg()). cmpxchg() is an architecture-specific function that will atomically read a value from a given location, compare it with a given value and, if the two are equal, it will write a third value back to that location. If the values do not match, then the location is not updated. The original value read from the location is the return value of the cmpxchg(); it can be used to determine if the cmpxchg() succeeded in updating the location or not.
When I first started working on the ring buffer, all of my benchmarks would show a slight but noticeable overhead when using local_cmpxchg() over local_add_return(). The goal was thus to not use a cmpxchg() and have, instead, a timestamp that would be used for each level of nesting. Starting with a four-element array of timestamps, I tried various approaches of a nesting counter and storing timestamps in each level. Upon detecting nesting, I thought that a context that interrupted another context could fix up the timestamps of the contexts that were interrupted without needing local_cmpxchg(). But this became much more complex, and had to deal with issues like this:
Having to deal with an array of timestamps just added one more variable that needed to be synchronized with the other variables.
The above figure shows a case where an interrupt comes in right after the timestamp was taken and the storage was allocated for the first event, but before the event is actually stored. Then an NMI comes in after the timestamp and storage is allocated for an event happening in interrupt context. At this point, because the allocation during the NMI would not be the first event in the commit, and because two other contexts were interrupted below, it is difficult to know if it should update the timestamp of the event that happened in the interrupt context or not; the timestamp may have already been updated. On top of this, another event is recorded in interrupt context after the NMI added an event, and the state for this event would have to deal with an event injected from another context since the previous event recorded in the interrupt context. The number of states that are added by keeping track of four levels of context and how they relate to inserting events into the ring buffer grew so numerous that it became obvious this was not going to be a viable solution.
The twelve year old puzzle solved
Julia Lawall reported a bug where she recorded a trace with trace-cmd and found that time went backward. Looking into it, I discovered that it was due to the addition of the full timestamp used by Zanussi's histograms; the change allowed the time extensions to not be reset to zero if they occurred in a nested event. Writing the fix for that issue triggered another idea for solving the nested timestamp issue.
All my previous attempts tried to avoid using cmpxchg(). While debugging the issue that Lawall reported, I realized that nested events were extremely uncommon and, because they can be detected, it should be possible to separate the slow path from the fast path. A fast path is the common case, which is when an event being written did not interrupt another event, and also was not interrupted itself. Otherwise the slow path is run. cmpxchg() should not be a performance problem if it were only to be used in the slow path. Not restricting what can be done in the slow path allowed me to think about other possible solutions. This gave me new hope, and inspired me to look for a solution in this direction.
While incorporating cmpxchg() back into the solution, I found that the array of four states still added too much complexity. I looked into whether it would be possible to consolidate the array, and only care about an event that interrupts another event or the event being interrupted. Upon interrupting an event in a lower context, it is known that the interrupted event is, in essence, "frozen in time". That is, it will not proceed until the current context returns to it. For being interrupted, there are only two states; before being interrupted and after being interrupted. What that means is that once interrupted, when processing resumes, everything that happened in the interrupt would have come to a conclusion. With the above characteristics, a defined set of states can be calculated for every step of the algorithm, by keeping track of two different timestamps: one that happens before allocating storage on the ring buffer, and one that happens afterward.
Thus, the solution deals with three players:
- write_tail: the index used to reserve space on the buffer for the event.
- before_stamp: a timestamp saved by all events as they start the recording process.
- write_stamp: a timestamp updated after an event has successfully reserved space.
The following code is run in this order to determine the next decisions to be made:
w = local_read(write_tail); before = local_read(before_stamp); after = local_read(write_stamp); ts = clock();
Before doing anything else, this code saves the current value of write_tail for later use. At this point, we can decide whether this event needs to go into the slow path or not. If before does not equal after, one of two possibilities is indicated: this event interrupted another event while it was updating its timestamps, or this event was interrupted by another context after reading before_stamp and before reading the write_stamp. In either case, the code would fall into the slow path.
if (before != after) { event_length += RB_LEN_TIME_EXTEND; add_timestamp = RB_ADD_STAMP_FORCE | RB_ADD_STAMP_EXTEND; } else {
One part of this solution requires injecting absolute timestamps instead of using a delta. For this slow path, the event length is increased by the size of the absolute timestamp event (which is the same size as a time extend). The ADD_STAMP_FORCE and ADD_STAMP_EXTEND flags are saved for later use in the algorithm.
Even if this event did not interrupt another event, a check still must be made to see if the delta since the last event stored can fit in the time_delta portion of the event. Otherwise a time extend is required.
delta = ts - after; if (delta & ~((1 << 27) - 1)) { event_length += RB_LEN_TIME_EXTEND; add_timestamp = RB_ADD_STAMP_EXTEND; } }
Now write to the before_stamp and allocate storage on the ring buffer by adding to the write_tail.
local_set(before_stamp, ts); write = local_add_return(event_length, write_tail); tail = write - event_length;
The start of the event can be found by subtracting its length from write, which is the index of the end of the event. This is stored in tail. Now compare the saved write_tail from the start of this algorithm with the calculated value of the start of the event. If they match, we know that no event interrupted this algorithm between the saving of write_tail and the allocation of the storage for the event. This is the fast path. But we are not out of the woods yet. We still need to update the write_stamp. Note, that the before_stamp has already been updated, making it different than the write_stamp. Any nested event that interrupts this event will now fall into the slow path and use an absolute timestamp.
The next step is to simply update the write_stamp:
local_set(write_stamp, ts);
But wait! What if an interrupt came in just before writing into the write_stamp and that interrupt wrote an event? Wouldn't that cause the write_stamp to now be incorrect, as it would not contain the timestamp of the last event written to the ring buffer? The answer is yes, but we don't care. The reason is that write_stamp is not used for any calculation unless it equals before_stamp. Because before_stamp does not equal write_stamp, any nesting events will not use it for their calculations.
This is how stacked interrupting events (where all interrupting events finish before this event can continue) helps in our algorithm. before_stamp is always updated by all events, including nested events that interrupted this event, so the contents of before_stamp now contains the timestamp representing the last event stored in the ring buffer, and is what write_stamp also needs to be set to. Updating write_stamp still needs some care, but it is still easy to detect if this event was interrupted by another, and if so, the slow path is entered, and cmpxchg() can be taken advantage of:
save_before = local_read(before_stamp); if (add_timestamp & RB_ADD_STAMP_FORCE) delta = ts; // will use the full timestamp else delta = ts - after; // remember, not force means not nested if (ts != save_before) { after = local_read(write_stamp); if (save_before > after) local_cmpxchg(write_stamp, after, save_before); }
The above code first re-reads before_stamp; it runs after write_stamp was updated. If another event came in between reserving space for the buffer and updating write_stamp, then before_stamp will not equal the read timestamp (ts). If the timestamp is still equal to before_stamp, then write_stamp was updated without worrying about racing with other interrupting events. At this point, the delta for the event is also calculated. If the ADD_STAMP_FORCE flag is set, that means this event interrupted another event and an absolute timestamp is required. Otherwise, it is safe to calculate the delta from the write_stamp and the clock value that was read.
If before_stamp is not equal to the read clock (ts), that tells us that an event came in and updated before_stamp as well as write_stamp sometime after the storage for this event was allocated (the update of write_tail). As there is no way of knowing when that occurred, it must be assumed that it could have occurred before the update to write_stamp. To solve this, a simple cmpxchg() is performed by re-reading write_stamp; if write_stamp is less than the last read before_stamp then we have to update it. If write_stamp is greater than or equal to the last read before_stamp or the cmpxchg() fails, then there is nothing to be done. That is because this can only happen if this event was interrupted by another event after the update to write_stamp and that nested event would have taken care of the correctness of write_stamp.
This is the end of the algorithm for the case of not being interrupted between taking the timestamps and allocating space on the ring buffer. But what happens if this event was interrupted before the allocation of its space on the ring buffer?
The case of the interrupted event before allocating storage
In this path, an interrupt came in and other events were injected into the ring buffer somewhere between the first read of the write_tail and reserving space on the ring buffer for this event. At this moment, nothing can be trusted. Some work needs to be done to get back to some kind of known state.
after = local_read(write_stamp); ts = clock();
As this event was interrupted and nested events made it into the ring buffer, the original recording of the clock (ts) is useless. Also, because this is the path of being interrupted by another event, the nested event (or events) would make sure that the write_stamp is the timestamp of the last event added to the ring buffer. Thus we reread both the clock and write_stamp to get into some kind of known state.
if (write == local_read(write_tail) && after < ts) { delta = ts - after; local_cmpxchg(write_stamp, after, ts);
} else { delta = 0; }
If the value returned by local_add_return does not match write_tail, that means an interrupt came in between the allocation of this event and the re-reading of write_tail. In this case this event was recorded between two interrupts that had nested events. One before the event was allocated, and again after it was allocated on the buffer. As there is now no way to know what timestamp to use for calculating its delta, we have no choice but to go back to a zero delta, but this is actually the best thing to do. If this event was sandwiched between two sets of events, what timestamp it has really does not matter in any use case, as long as it is shown to have happened between the two sets of nested events.
One might think that the above code is a little ambitious, and it may be fine to simply use a zero delta if an interrupt happens between the start of the processing and the allocation of the event. Why not just set the delta to zero in this case? The reasoning behind that is because it is not that uncommon of a case to hit. While tracing several hackbench runs, this situation happens a few times. The problem with just using zero for a delta is that, if the event recorded in the interrupt happened at the start of the interrupt, and the interrupt itself ran for some time before returning, then the zero delta would make it seem that the interrupt was much shorter than it actually was.
But doesn't that excuse make any delta zero a problem? Unfortunately, yes. But the case of being interrupted by two different interrupts just before and just after storage is highly unlikely. It may still happen, but as stated, there's not much we can do about it. After running several traces of hackbench, I could not find a single occurrence of that happening. The only way I was able to test this last code path was by artificially injecting an event in a "fake" context and seeing if the algorithm performed as expected.
At this point the problem has been solved — on 64-bit systems. It turns out that there was an additional obstacle to overcome for the 32-bit case; those looking for the details can find them in this supplemental article.
Conclusion
For several years I was afraid that correct timestamps for ftrace ring-buffer events would end up being impossible for a Turing machine to achieve. But as I agonized over the zero-delta flaw, I knew I had only two options to make the pain go away. Prove that it is an impossible solution and walk away from it with my tail between my legs, or find a solution that actually works. The first was not an option, as I also know that impossible problems can have possible solutions if you can put restrictions on the requirements. For instance, we still have one zero-delta path. But that path is so uncommon, and only affects a single event, thus it is not worth agonizing over.
What I felt was most interesting from this experience, was that my solution was the least complex of those that I tried. That shouldn't be surprising. A lot of problems never get solved because people tend to overthink the solutions. All it took for me was to debug something slightly related to the issue to help me not overthink it as much, and everything just fell into place after that.
Brief items
Security
Security quote of the week
I remember concluding that the most likely, if still rather improbable, explanation was that the 9-less messages were dummy fill traffic and that the random number generator used to create the messages had a bug or developed a defect that prevented 9s from being included. This would be, to say the least, a very serious error, since it would allow a listener to easily distinguish fill traffic from real traffic, completely negating the benefit of having fill traffic in the first place. It would open the door to exactly the kind of traffic analysis that the system was carefully engineered to thwart. The 9-less messages went on for almost ten years. (If I were reporting this as an Internet vulnerability, I would dub it the "Nein Nines" attack; please forgive the linguistic muddle). But I was resigned to the likelihood that I would never know for sure.
And this brings us to the second observation from [Peter] Strzok's book.
Compromised doesn't say anything about missing nueves, but it does mention that the FBI exploited a serious error on the part of the sender: the FBI was able to tell when messages were and weren't being sent during the weekly timeslot when the suspect couple was observed in the room where they copied traffic. Even worse (for the illegals), empty message slots correlated perfectly with times that the suspect couple was traveling and not able to copy messages. This observation helped confirm the FBI's suspicions and ultimately led to their arrest and expulsion (along with the rest of the Russian illegals network).
[...] So remember this story next time someone tries to sell you their super-secure one-time-pad-based crypto scheme. If actual Russian spies can't use it securely, chances are neither can you.
Kernel development
Kernel release status
The current development kernel is 5.9-rc6, released on September 20. "The one thing that does show up in the diffstat is the softscroll removal (both fbcon and vgacon), and there are people who want to save that, but we'll see if some maintainer steps up. I'm not willing to resurrect it in the broken form it was in, so I doubt that will happen in 5.9, but we'll see what happens."
Stable updates: 5.8.10, 5.4.66, and 4.19.146 were released on September 17, followed by 5.8.11, 5.4.67, 4.19.147, 4.14.199, 4.9.237, and 4.4.237 on September 23.
Bottomley: Creating a home IPv6 network
James Bottomley has put together a detailed recounting of what it took to get IPv6 fully working on his network. "One of the things you’d think from the above is that IPv6 always auto configures and, while it is true that if you simply plug your laptop into the ethernet port of a cable modem it will just automatically configure, most people have a more complex home setup involving a router, which needs some special coaxing before it will work. That means you need to obtain additional features from your ISP using special DHCPv6 requests."
Cook: Security things in Linux v5.7
Kees Cook catches up with the security-related changes in the 5.7 kernel. "The kernel’s Linux Security Module (LSM) API provide a way to write security modules that have traditionally implemented various Mandatory Access Control (MAC) systems like SELinux, AppArmor, etc. The LSM hooks are numerous and no one LSM uses them all, as some hooks are much more specialized (like those used by IMA, Yama, LoadPin, etc). There was not, however, any way to externally attach to these hooks (not even through a regular loadable kernel module) nor build fully dynamic security policy, until KP Singh landed the API for building LSM policy using BPF. With this, it is possible (for a privileged process) to write kernel LSM hooks in BPF, allowing for totally custom security policy (and reporting)."
Quote of the week
/* * The worst case is that all tasks preempt one another in a migrate_disable() * region and stack on a single CPU. This then reduces the available bandwidth * to a single CPU. And since Real-Time schedulability theory considers the * Worst-Case only, all Real-Time analysis shall revert to single-CPU * (instantly solving the SMP analysis problem). */
Development
Firefox 81.0
Firefox 81.0 is out. This version allows you to control media from the keyboard or headset, introduces the Alpenglow theme, adds ArcoForm support to fill in, print, and save supported PDF forms, and more. See the release notes for details.GNOME's new versioning scheme
The GNOME Project has announced a change to its version-numbering scheme; the next release will be "GNOME 40". "After nearly 10 years of 3.x releases, the minor version number is getting unwieldy. It is also exceedingly clear that we're not going to bump the major version because of technological changes in the core platform, like we did for GNOME 2 and 3, and then piling on a major UX change on top of that. Radical technological and design changes are too disruptive for maintainers, users, and developers; we have become pretty good at iterating design and technologies, to the point that the current GNOME platform, UI, and UX are fairly different from what was released with GNOME 3.0, while still following the same design tenets."
Precursor: an open-source mobile hardware platform
Andrew "bunnie" Huang has announced a new project called "Precursor"; it is meant to be a platform for makers to create interesting new devices. "Precursor is unique in the open source electronics space in that it’s designed from the ground-up to be carried around in your pocket. It’s not just a naked circuit board with connectors hanging off at random locations: it comes fully integrated—with a rechargeable battery, a display, and a keyboard—in a sleek, 7.2 mm (quarter-inch) aluminum case." You can't get one yet, but the crowdfunding push starts soon.
Development quotes of the week
Miscellaneous
Linux Journal is Back
Linux Journal has returned under the ownership of Slashdot Media. "As Linux enthusiasts and long-time fans of Linux Journal, we were disappointed to hear about Linux Journal closing its doors last year. It took some time, but fortunately we were able to get a deal done that allows us to keep Linux Journal alive now and indefinitely. It's important that amazing resources like Linux Journal never disappear."
Page editor: Jake Edge
Announcements
Newsletters
Distributions and system administration
- DistroWatch Weekly (September 21)
- Lunar Linux Weekly News (September 18)
- openSUSE Tumbleweed Review of the Week (September 18)
- Ubuntu Weekly Newsletter (September 19)
Development
- Emacs News (September 21)
- These Weeks in Firefox (September 19)
- These Weeks in Firefox (September 23)
- What's cooking in git.git (September 16)
- What's cooking in git.git (September 18)
- What's cooking in git.git (September 22)
- LLVM Weekly (September 21)
- LXC/LXD/LXCFS Weekly Status (September 21)
- OCaml Weekly News (September 22)
- Perl Weekly (September 21)
- Python Weekly Newsletter (September 17)
- Weekly Rakudo News (September 21)
- Ruby Weekly News (September 17)
- Wikimedia Tech News (September 21)
Meeting minutes
- openSUSE board meeting minutes (September 15)
Miscellaneous
- Free Software Foundation Europe Newsletter (September)
Calls for Presentations
CFP Deadlines: September 24, 2020 to November 23, 2020
The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.
Deadline | Event Dates | Event | Location |
---|---|---|---|
October 1 | December 1 December 3 |
Open Source Firmware Conference | online |
October 7 | November 28 November 29 |
EmacsConf 2020 | Online |
October 11 | November 7 November 8 |
OpenFest 2020 | online |
October 12 | October 19 October 23 |
EPICS collaboration meeting 2020 | Virtual |
October 14 | October 28 October 29 |
eBPF Summit | online |
October 25 | November 5 November 7 |
Ohio LinuxFest | Online |
October 30 | November 21 November 22 |
MiniDebConf - Gaming Edition | Online |
October 31 | February 6 February 7 |
FOSDEM 2021 | Online |
November 1 | November 14 November 15 |
Battlemesh v13 | online |
November 5 | November 10 | S&T 2020 (SQLite & TCL) | Online |
November 6 | January 23 January 25 |
linux.conf.au 2021 | Online |
November 6 | November 16 November 22 |
Guix Days | Online |
November 6 | February 18 February 20 |
DevConf.CZ | Online |
November 10 | December 3 | Live Embedded Event | Online |
November 11 | March 20 March 21 |
LibrePlanet 2021 | Online |
If the CFP deadline for your event does not appear here, please tell us about it.
Upcoming Events
Events: September 24, 2020 to November 23, 2020
The following event listing is taken from the LWN.net Calendar.
Date(s) | Event | Location |
---|---|---|
September 22 September 24 |
Linaro Virtual Connect | online |
September 29 October 1 |
ApacheCon 2020 | Online |
October 2 October 5 |
PyCon India 2020 | Virtual |
October 2 October 3 |
PyGotham TV | Online |
October 3 October 4 |
Handmade Seattle 2020 | Online |
October 6 October 8 |
2020 Virtual LLVM Developers' Meeting | online |
October 8 October 9 |
PyConZA 2020 | Online |
October 10 October 11 |
Arch Linux Conf 2020 Online | Online |
October 13 October 15 |
Lustre Administrator and Developer Workshop 2020 | Online |
October 15 October 17 |
openSUSE LibreOffice Conference | Online |
October 19 October 20 |
[Virtual] All Things Open 2020 | Virtual |
October 19 October 23 |
EPICS collaboration meeting 2020 | Virtual |
October 19 October 23 |
Open Infrastructure Summit | Virtual |
October 20 October 23 |
[Canceled] PostgreSQL Conference Europe | Berlin, Germany |
October 24 October 25 |
[Cancelled] T-Dose 2020 | Geldrop (Eindhoven), Netherlands |
October 26 October 29 |
Open Source Summit Europe | online |
October 28 October 29 |
[Canceled] DevOpsDays Berlin 2020 | Berlin, Germany |
October 28 October 29 |
eBPF Summit | online |
October 28 October 30 |
[Virtual] KVM Forum | Virtual |
October 29 October 30 |
[Virtual] Linux Security Summit Europe | Virtual |
November 5 November 7 |
Ohio LinuxFest | Online |
November 7 November 8 |
OpenFest 2020 | online |
November 7 November 8 |
RustFest Global | Online |
November 10 | S&T 2020 (SQLite & TCL) | Online |
November 12 November 14 |
Linux App Summit | Online |
November 14 November 15 |
Battlemesh v13 | online |
November 16 November 22 |
Guix Days | Online |
November 21 November 22 |
MiniDebConf - Gaming Edition | Online |
If your event does not appear here, please tell us about it.
Event Reports
Netdev 0x14: slides and papers posted
The slides and papers from the recent Netdev conference have been posted and are available through the schedule.
Security updates
Alert summary September 17, 2020 to September 23, 2020
Dist. | ID | Release | Package | Date |
---|---|---|---|---|
Arch Linux | ASA-202009-6 | chromium | 2020-09-17 | |
Arch Linux | ASA-202009-7 | netbeans | 2020-09-17 | |
Debian | DLA-2375-1 | LTS | inspircd | 2020-09-20 |
Debian | DSA-4764-1 | stable | inspircd | 2020-09-18 |
Debian | DSA-4765-1 | stable | modsecurity | 2020-09-18 |
Fedora | FEDORA-2020-9b9e8e5306 | F32 | chromium | 2020-09-19 |
Fedora | FEDORA-2020-5ed5af6275 | F31 | cryptsetup | 2020-09-19 |
Fedora | FEDORA-2020-e2deb72e0f | F32 | dotnet3.1 | 2020-09-16 |
Fedora | FEDORA-2020-30cd8d9ad6 | F31 | gnutls | 2020-09-19 |
Fedora | FEDORA-2020-5920a7a0b2 | F31 | kernel | 2020-09-16 |
Fedora | FEDORA-2020-3c6fedeb83 | F32 | kernel | 2020-09-16 |
Fedora | FEDORA-2020-48a1ae610c | F31 | mbedtls | 2020-09-16 |
Fedora | FEDORA-2020-7dd29dacad | F31 | mingw-libxml2 | 2020-09-19 |
Fedora | FEDORA-2020-b60dbdd538 | F32 | mingw-libxml2 | 2020-09-19 |
Fedora | FEDORA-2020-16167a66a2 | F32 | python35 | 2020-09-16 |
Fedora | FEDORA-2020-3813e1317b | F31 | seamonkey | 2020-09-20 |
Fedora | FEDORA-2020-15999f707a | F32 | seamonkey | 2020-09-18 |
Mageia | MGASA-2020-0368 | 7 | libraw | 2020-09-17 |
Mageia | MGASA-2020-0369 | 7 | mysql-connector-java | 2020-09-21 |
openSUSE | openSUSE-SU-2020:1183-2 | ark | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1310-2 | ark | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1048-1 | chromium | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1181-1 | chromium | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1215-1 | chromium | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1322-1 | chromium | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1032-1 | chromium | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1499-1 | 15.1 15.2 | chromium | 2020-09-22 |
openSUSE | openSUSE-SU-2020:1192-1 | claws-mail | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1494-1 | 15.2 | curl | 2020-09-21 |
openSUSE | openSUSE-SU-2020:1433-1 | docker-distribution | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1478-1 | 15.1 15.2 | fossil | 2020-09-20 |
openSUSE | openSUSE-SU-2020:1438-1 | hylafax+ | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1427-1 | inn | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1232-1 | knot | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1505-1 | libetpan | 2020-09-22 | |
openSUSE | openSUSE-SU-2020:1454-1 | 15.2 | libetpan | 2020-09-19 |
openSUSE | openSUSE-SU-2020:1458-1 | 15.2 | libjpeg-turbo | 2020-09-19 |
openSUSE | openSUSE-SU-2020:1500-1 | libqt4 | 2020-09-22 | |
openSUSE | openSUSE-SU-2020:1452-1 | 15.1 | libqt4 | 2020-09-19 |
openSUSE | openSUSE-SU-2020:1501-1 | 15.2 | libqt4 | 2020-09-22 |
openSUSE | openSUSE-SU-2020:1428-1 | librepo | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1455-1 | 15.2 | libvirt | 2020-09-19 |
openSUSE | openSUSE-SU-2020:1465-1 | 15.2 | libxml2 | 2020-09-19 |
openSUSE | openSUSE-SU-2020:1506-1 | lilypond | 2020-09-22 | |
openSUSE | openSUSE-SU-2020:1453-1 | 15.2 | lilypond | 2020-09-19 |
openSUSE | openSUSE-SU-2020:1439-1 | mumble | 2020-09-16 | |
openSUSE | openSUSE-SU-2020:1439-2 | mumble | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1459-1 | 15.2 | openldap2 | 2020-09-19 |
openSUSE | openSUSE-SU-2020:1509-1 | otrs | 2020-09-23 | |
openSUSE | openSUSE-SU-2020:1475-1 | 15.1 15.2 | otrs | 2020-09-20 |
openSUSE | openSUSE-SU-2020:1055-1 | pdns-recursor | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1101-1 | pdns-recursor | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1502-1 | 15.1 | perl-DBI | 2020-09-22 |
openSUSE | openSUSE-SU-2020:1483-1 | 15.2 | perl-DBI | 2020-09-20 |
openSUSE | openSUSE-SU-2020:1423-1 | python-Flask-Cors | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1446-1 | python-Flask-Cors | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1100-1 | singularity | 2020-09-18 | |
openSUSE | openSUSE-SU-2020:1497-1 | 15.1 15.2 | singularity | 2020-09-22 |
openSUSE | openSUSE-SU-2020:1468-1 | 15.2 | slurm_18_08 | 2020-09-19 |
openSUSE | openSUSE-SU-2020:1486-1 | 15.2 | virtualbox | 2020-09-20 |
Oracle | ELSA-2020-3732 | OL8 | mysql:8.0 | 2020-09-17 |
Oracle | ELSA-2020-3631 | OL7 | thunderbird | 2020-09-17 |
Red Hat | RHSA-2020:3803-01 | EL7.4 | bash | 2020-09-22 |
Red Hat | RHSA-2020:3804-01 | EL7.4 | kernel | 2020-09-22 |
Red Hat | RHSA-2020:3810-01 | MRG2 | kernel-rt | 2020-09-22 |
Slackware | SSA:2020-266-01 | seamonkey | 2020-09-22 | |
SUSE | SUSE-SU-2020:2715-1 | SES5 | grafana | 2020-09-22 |
SUSE | SUSE-SU-2020:2690-1 | SLE12 | jasper | 2020-09-21 |
SUSE | SUSE-SU-2020:2689-1 | SLE15 | jasper | 2020-09-21 |
SUSE | SUSE-SU-2020:2687-1 | SLE12 | less | 2020-09-21 |
SUSE | SUSE-SU-2020:2711-1 | SLE12 | libmspack | 2020-09-22 |
SUSE | SUSE-SU-2020:2660-1 | OS8 OS9 SLE12 SES5 | libsolv | 2020-09-16 |
SUSE | SUSE-SU-2020:0079-2 | OS8 OS9 SLE12 SES5 | libzypp | 2020-09-16 |
SUSE | SUSE-SU-2020:2712-1 | SLE15 | openldap2 | 2020-09-22 |
SUSE | SUSE-SU-2020:2714-1 | OS9 SLE12 | ovmf | 2020-09-22 |
SUSE | SUSE-SU-2020:2691-1 | SLE15 | ovmf | 2020-09-21 |
SUSE | SUSE-SU-2020:2713-1 | SLE15 | ovmf | 2020-09-22 |
SUSE | SUSE-SU-2020:2718-1 | OS8 | pdns | 2020-09-23 |
SUSE | SUSE-SU-2020:2661-1 | OS7 OS8 OS9 SLE12 SES5 | perl-DBI | 2020-09-16 |
SUSE | SUSE-SU-2020:2698-1 | OS6 OS7 SLE12 | python-pip | 2020-09-21 |
SUSE | SUSE-SU-2020:2699-1 | OS7 OS8 OS9 SLE12 SES5 | python3 | 2020-09-21 |
SUSE | SUSE-SU-2020:2710-1 | SLE15 | rubygem-actionpack-5_1 | 2020-09-22 |
SUSE | SUSE-SU-2020:2686-1 | OS6 OS7 | rubygem-actionview-4_2 | 2020-09-21 |
SUSE | SUSE-SU-2020:2678-1 | OS7 | rubygem-rack | 2020-09-18 |
SUSE | SUSE-SU-2020:2724-1 | OS7 SLE12 | samba | 2020-09-23 |
SUSE | SUSE-SU-2020:2721-1 | OS8 OS9 SLE12 SES5 | samba | 2020-09-23 |
SUSE | SUSE-SU-2020:2673-1 | SLE12 | samba | 2020-09-17 |
SUSE | SUSE-SU-2020:2720-1 | SLE12 | samba | 2020-09-23 |
SUSE | SUSE-SU-2020:2719-1 | SLE15 | samba | 2020-09-23 |
SUSE | SUSE-SU-2020:2722-1 | SLE15 SES6 | samba | 2020-09-23 |
Ubuntu | USN-4513-1 | 16.04 | apng2gif | 2020-09-17 |
Ubuntu | USN-4531-1 | 18.04 20.04 | busybox | 2020-09-22 |
Ubuntu | USN-4528-1 | 16.04 18.04 | ceph | 2020-09-22 |
Ubuntu | USN-4530-1 | 18.04 | debian-lan-config | 2020-09-22 |
Ubuntu | USN-4529-1 | 18.04 | freeimage | 2020-09-22 |
Ubuntu | USN-4516-1 | 18.04 | gnupg2 | 2020-09-17 |
Ubuntu | USN-4533-1 | 20.04 | ldm | 2020-09-22 |
Ubuntu | USN-4534-1 | 12.04 14.04 16.04 18.04 | libdbi-perl | 2020-09-23 |
Ubuntu | USN-4509-1 | 14.04 | libdbi-perl | 2020-09-16 |
Ubuntu | USN-4517-1 | 16.04 18.04 | libemail-address-list-perl | 2020-09-18 |
Ubuntu | USN-4523-1 | 16.04 | libofx | 2020-09-21 |
Ubuntu | USN-4521-1 | 16.04 18.04 20.04 | libpam-tacplus | 2020-09-21 |
Ubuntu | USN-4505-1 | 18.04 | libphp-phpmailer | 2020-09-16 |
Ubuntu | USN-4514-1 | 16.04 18.04 20.04 | libproxy | 2020-09-17 |
Ubuntu | USN-4526-1 | 14.04 16.04 18.04 | linux, linux-aws, linux-aws-hwe, linux-azure, linux-azure-4.15, linux-gcp, linux-gcp-4.15, linux-gke-4.15, linux-hwe, linux-oem, linux-oracle, linux-raspi2, linux-snapdragon | 2020-09-21 |
Ubuntu | USN-4525-1 | 20.04 | linux, linux-azure, linux-gcp, linux-oracle | 2020-09-21 |
Ubuntu | USN-4506-1 | 16.04 | mcabber | 2020-09-16 |
Ubuntu | USN-4507-1 | 16.04 | ncmpc | 2020-09-16 |
Ubuntu | USN-4532-1 | 18.04 | netty-3.9 | 2020-09-22 |
Ubuntu | USN-4522-1 | 16.04 | novnc | 2020-09-21 |
Ubuntu | USN-4504-1 | 16.04 18.04 | openssl, openssl1.0 | 2020-09-16 |
Ubuntu | USN-4519-1 | 16.04 | pulseaudio | 2020-09-17 |
Ubuntu | USN-4515-1 | 16.04 | pure-ftpd | 2020-09-17 |
Ubuntu | USN-4511-1 | 16.04 18.04 20.04 | qemu | 2020-09-17 |
Ubuntu | USN-4520-1 | 16.04 | sa-exim | 2020-09-18 |
Ubuntu | USN-4510-2 | 14.04 | samba | 2020-09-17 |
Ubuntu | USN-4510-1 | 16.04 18.04 | samba | 2020-09-17 |
Ubuntu | USN-4508-1 | 16.04 18.04 20.04 | storebackup | 2020-09-16 |
Ubuntu | USN-4524-1 | 16.04 | tnef | 2020-09-21 |
Ubuntu | USN-4512-1 | 18.04 | util-linux | 2020-09-17 |
Ubuntu | USN-4518-1 | 16.04 | xawtv | 2020-09-17 |
Kernel patches of interest
Kernel releases
Architecture-specific
Build system
Core kernel
Development tools
Device drivers
Device-driver infrastructure
Filesystems and block layer
Memory management
Networking
Security-related
Virtualization and containers
Miscellaneous
Page editor: Rebecca Sobol