Leading items

Welcome to the LWN.net Weekly Edition for September 8, 2022

This edition contains the following feature content:

Lazy imports for Python: a discussion on ways to reduce the startup overhead of Python programs.
A framework for code tagging: unifying a common kernel functionality and adding some new features as well.
What's in a (type) name?: a disagreement over the type names that should be used in man pages for Linux system calls.
Concurrent page-fault handling with per-VMA locks: a new approach to easing page-fault contention.
A look at Linux Mint 21: reviewing the current release of this Ubuntu derivative distribution.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Lazy imports for Python

By Jake Edge
September 7, 2022

Starting a Python application typically results in a flurry of imports as modules from various locations (and the modules they import) get added into the application process. All of that occurs before the application even gets started doing whatever it is the user actually launched it for; that delay can be significant—and annoying. Beyond that, many of those imports may not be necessary at all for the code path being followed, so eagerly doing the import is purely wasted time. A proposal back in May would add a way for applications to choose lazy imports, where the import is deferred until the module is actually used.

PEP 690

The lazy imports proposal was posted to the Python discussion forum by one of its authors, Germán Méndez Bravo. He noted that the feature is in use in the Cinder CPython fork at Meta, where it has demonstrated "startup time improvements up to 70% and memory-use reductions up to 40%" on real-world Python command-line tools. So he and Carl Meyer teamed up on PEP 690 ("Lazy Imports") to propose the feature for CPython itself; since neither of them is a core developer, Barry Warsaw stepped up as the PEP's sponsor. The PEP has changed some since it was posted based on feedback in the discussion, some of which will be covered below; the May 3 version can be found at GitHub, along with other historical versions.

The core of the idea is the concept of a lazy reference that is not visible to Python programs; it is purely a construct in the C code of the interpreter. When run with lazy imports enabled, a statement like import foo will simply add the name foo to the global namespace (i.e. globals()) as a lazy reference; any access to that name will cause the import to be executed, so the lazy reference acts like a thunk. Similarly, from foo import bar will add bar to the namespace, such that when it is used it will be resolved as foo.bar, which will import foo at that time.

The original proposal enabled lazy imports by way of a command-line flag (-L) to the interpreter or via an environment variable. But Inada Naoki pointed out that it would be better to have an API to enable lazy imports. For an application like Mercurial, which already uses a form of lazy importing but might want to switch to the new mechanism, setting an environment variable just for the tool is not sensible and adding a command-line argument to the "#!/usr/bin/env python" shebang (or hashbang) line of a Python script is not possible on Linux. Meyer agreed that an API (e.g. importlib.set_lazy_imports()) should be added.

The PEP specifically targets application developers as the ones who should choose lazy imports and test their applications to ensure that it works. The PEP says:

Since lazy imports are a potentially-breaking semantic change, they should be enabled only by the author or maintainer of a Python application, who is prepared to thoroughly test the application under the new semantics, ensure it behaves as expected, and opt-out any specific imports as needed (see below). Lazy imports should not be enabled speculatively by the end user of a Python application with any expectation of success.
It is the responsibility of the application developer enabling lazy imports for their application to opt-out any library imports that turn out to need to be eager for their application to work correctly; it is not the responsibility of library authors to ensure that their library behaves exactly the same under lazy imports.

The environment variable for enabling the feature fell by the wayside, but the -L option remains and an explicit API was added. There are also ways for developers to opt out of lazy imports; the PEP proposes a few different mechanisms. To start with, there are several types of imports that will never be processed lazily, regardless of any settings. For example:

    import importlib     # not lazy (aka "eager")
    
    # make subsequent imports lazy
    importlib.set_lazy_imports()

    import foo           # lazy
    from bar import baz  # lazy
    from xyz import *    # star imports are always eager

    try:
        import abc       # always eager in try except block
    except ImportError:
        import def       # eager

    with A() as a:
        import ghi       # always eager inside with block

Imports that are not at the top level (i.e. outside of any class or function definition) are also always eager. If a developer knows that an import needs to be done eagerly, presumably because the side effects from importing it need to happen before the rest of the code is executed—or perhaps because the module does not work correctly when lazily imported—the import can be done in a try block or the proposed new context manager can be used:

    from importlib import eager_imports

    with eager_imports():
        import foo       # eager

For third-party code that cannot (or should not) be modified, there is an exclude list available, which will force modules on the list to be eagerly imported when they are encountered:

    from importlib import set_lazy_imports

    set_lazy_imports(excluding=['foo', 'bar.baz'])

In that example, foo and bar.baz will be eagerly imported, though bar is still lazily imported, as are all of the imports contained in foo and bar.baz.

Libraries

Several library authors expressed concerns that they would effectively be forced to support (and test) lazy imports of their library, which is an added burden for maintainers. Thomas Kluyver put it this way:

Realistically, we won't get to tell everyone that if they want to use our library they can't use this new lazy import thing that Python just added. Especially as it's meant to make startup faster, and performance tricks always get cargo-culted to people who don't want to think about what they mean (one weird trick to make your Python scripts start 70% faster!). Within a year or so of releasing a version of Python with this option, we'll probably have to ensure our libraries and examples work with and without it. I'm sure we'd manage, but please remember that opt-in features for application developers aren't really optional for library developers.

Marc-Andre Lemburg said that it makes more sense to explicitly choose which imports are done lazily. Enabling it globally for a large code base is potentially dangerous; instead something like "lazy import foo" should be used for imports that are only accessed from some subset of the program. He acknowledged that can already be done, by placing the import where the functionality is being used, but thought that explicitly calling out the lazy imports was a better approach. Gregory P. Smith disagreed: "The startup time benefit of lazy imports only comes from enabling them broadly, not by requiring all code in your application's entire transitive dependencies to be modified to declare most of their imports as lazy."

Meyer wondered about real world examples of the kinds of problems that library authors might encounter. There is a persistent idea in the discussion about libraries opting into being imported lazily, but he does not think that makes sense. However, library authors may not really want to determine whether their library can be imported that way, Paul Moore said:

My concern is more that as a library developer I have no intention of even thinking about whether my code is "lazy import safe". I just write "normal" Python code, and if my test suite passes, I'm done. I don't particularly want to run my test suite twice (with and without lazy imports) and even if I did, what am I supposed to do if something fails under lazy imports? The fact that it works under "normal" imports means it's correct Python, so why should make my life harder by avoiding constructs just to satisfy an entirely theoretical possibility that someone might want to import my code lazily?

Meyer said that was a reasonable position for a library developer to take, but also recognized that the maintainer "might get user complaints about it, and this is a significant cost of the PEP". He also pointed out that most of the concerns being raised also apply to the existing importlib.util.LazyLoader class, which provides a more limited kind of lazy imports. Beyond that, there is no real way to decide that a module is "safe" for lazy import:

What I think the discussions of "library opt-out" are missing is that "safe for lazy imports" is fundamentally not even a meaningful or coherent property of a single module or library in isolation. It is only meaningful in the context of an actual application codebase. This is because no single module or library can ever control the ordering of imports or how the import-time code path flows: it is an emergent property of the interaction of all modules in the codebase and their imports.
[...] I think the nature of the opt-out in PEP 690 is not well understood. It is not an exercise in categorizing modules into neatly-defined objective categories of "safe for lazy import" and "not safe for lazy import." (If it were, the only possible answer would be that no module is ever fully lazy import safe.) Rather, it is a way for an application developer to say "in the context of my specific entire application and how it actually works, I need to force these particular imports to be eager in order for the effects I actually need to happen in time."

Warsaw agreed with that; library authors "can't declare their modules safe for lazy import because they have no idea how their libraries are consumed or in what order they will be imported". On the other hand, application authors are in a position to work all of that out:

As an application author though, I know everything I need to know about what modules I consume, how they are imported, and whether they are safe or not. At least theoretically. Nobody is in a more advantageous position to understand the behavior of my application, and to make declarations about what modules can and cannot be safely lazily imported. And nobody else is in a position to actually test that assumption.
To me, the PEP gives the application author end-consumer the tools they need to build a lazy-friendly application.

As one data point on the real-world prevalence of lazy-import problems, Meyer said: "the Instagram Server codebase is multiple million lines of code, uses lazy imports applied globally, and has precisely five modules opted out". Méndez Bravo published a lengthy blog post that described the process of converting that code base to use lazy imports. For the most part, the problems encountered were not due to importing libraries; even third-party and standard library modules largely just worked when lazy imports was enabled globally.

Toward the end of June the discussion picked back up when Matplotlib developer Thomas A Caswell reiterated the concerns about the feature's impact on libraries and their authors, though he is "still enthusiastic about this proposal". Matplotlib and other SciPy libraries have lengthy import times and have tried various ways of deferring imports but "at every step discovered a subtle way that a user was relying on a side-effect". He expects that PEP 690 will "produce a stream of really interesting bugs across the whole ecosystem", though he would be happy to be wrong about that.

David Lord, who helps maintain Flask, Jinja, and other libraries, focused on the push from users to support lazy imports in libraries. He said that other features added to Python over the years (asyncio and typing) had created a lot of extra work when users clamored for them to be supported. "I really hope this doesn't add a third huge workload to my list of things to juggle as a maintainer." Moore is worried that users will perceive lazy imports as a magic button they can press for better performance:

My fear is that most users will get the impression that "enable lazy imports" is essentially a "go_faster=True" setting, and will enable them and have little or no ability to handle the (inevitable) problems. They will therefore push the issue back to library maintainers.

He is in favor of improving startup time and reducing the cost of imports but would prefer to see it done with some form of opt-in for library authors. Meyer reminded everyone that a form of the feature already exists in the language "in a very similar global opt-in way" with LazyLoader. The PEP makes the feature "more usable and more effective and faster", however, which may make it more popular, thus library developers may see more user requests. Furthermore, the existence of LazyLoader has not led to the problems envisioned: "The Python ecosystem doesn't seem to have been overwhelmed by people trying it 'just to see if it makes things faster.'" But it may be that LazyLoader is not all that well-known so it has not been (ab)used much.

The discussion has wound down at this point, though in early August, Mark Shannon argued that the PEP "just feels too magical" because it does not use an explicit mechanism to mark lazy imports (e.g. lazy import foo). He said that "explicit approaches have been rejected with insufficient justification". Warsaw disagreed and thought that the PEP did justify its choices; he encouraged those who want to see an explicit approach to create a competing PEP. Méndez Bravo recounted the process he went through when converting the Instagram code, which started out with an explicit approach. As he worked through it, he realized that nearly all of the imports could be done lazily so he switched to the global approach. All in all, it worked well:

There are many different types of uses of Python and some communities have different patterns, but all the evidence we do have is that the percentage of modules we tried and worked without any issues out of the box with lazy imports enabled was high, and that just enabling lazy imports in a few modules doesn't yield many benefits at all. The true power comes when you enable laziness in whole systems. We've saved terabytes of memory in some systems and reduced start times from minutes to just a few seconds, just by making things lazy.

Opinions are split on PEP 690, but it seems clear that it provides a useful tool for some. Python creator Guido van Rossum is in favor: "I am eager to have this available even if there are potential problems", but others are less enthusiastic even though the underlying problem is widely acknowledged. The PEP is targeted at the 3.12 release, which does not have a feature freeze until next May, so there is still plenty of time. One might guess that the next step is to ask the steering council to decide on the PEP. The outcome of that is not obvious, though if more people start using lazy imports in Cinder without major problems, it might help sway the decision. Time will tell.

Comments (31 posted)

A framework for code tagging

By Jonathan Corbet
September 1, 2022

Kernel code can, at times, be quite inward looking; it often refers to itself. To enable this introspection, the kernel has evolved several mechanisms for identifying specific locations in the code and carrying out actions related to those locations. The code-tagging framework patch set, posted by Suren Baghdasaryan and Kent Overstreet, is an attempt to replace various ad hoc implementations with a single framework, and to add some new applications as well.

There are a number of reasons for the kernel to need to identify specific locations within the code. For example, kernel code is not normally allowed to incur page faults, but the functions that access user-space memory will often do just that. To do the right thing in that situation, the kernel build process makes a note of the location of every user-space access operation; when a page fault happens, that list is checked and, if the fault happened in an expected location, it is handled normally. The kernel's dynamic debugging mechanism is another example; each debugging print statement is tracked and can be enabled independently.

The usual trick for implementing this kind of mechanism is to create a special ELF section in the kernel binary; that section is then populated with structures recording the points of interest within the kernel. At run time, the kernel can locate that section, where it will find an array of structures with the needed information. At its core, the tagging framework is a set of functions and macros that make the creation of and access to this special section easier.

A code tag denotes a location within the code itself; that location is represented by a new structure:

    struct codetag {
	unsigned int flags;
	unsigned int lineno;
	const char *modname;
	const char *function;
	const char *filename;
    };

This structure tracks a location but has no other information; it is meant to be embedded within another structure specific to the tagging application. For example, a large part of the patch set is dedicated to the creation of a mechanism to track memory allocations; it can record how much memory is allocated and freed at each call site, and thus be used to track down memory leaks. To do this, it will create a tag at each allocation location with a structure like:

    struct alloc_tag {
	struct codetag			ct;
	unsigned long			last_wrap;
	struct raw_lazy_percpu_counter	call_count;
	struct raw_lazy_percpu_counter	bytes_allocated;
    };

The raw_lazy_percpu_counter is a new counter type that is also added by the patch set. At this point we have a structure that can associate these counters with the location stored in the codetag structure.

One of these structures is placed into the special alloc_tags ELF section with a bit of macro magic:

    #define DEFINE_ALLOC_TAG(_alloc_tag)				\
	static struct alloc_tag _alloc_tag __used __aligned(8)		\
	__section("alloc_tags") = { .ct = CODE_TAG_INIT }

A bit more macro trickery is then used to replace the existing alloc_pages() function with a version that places the tag and remembers allocation calls:

    #define alloc_tag_add(_ref, _bytes)					\
    do {								\
	DEFINE_ALLOC_TAG(_alloc_tag);					\
	if (_ref && !WARN_ONCE(_ref->ct, "alloc_tag was not cleared"))	\
	    __alloc_tag_add(&_alloc_tag, _ref, _bytes);			\
    } while (0)

    #define pgtag_alloc_pages(gfp, order)				\
    ({									\
	struct page *_page = _alloc_pages((gfp), (order));		\
									\
	if (_page)							\
	    alloc_tag_add(get_page_tag_ref(_page), PAGE_SIZE << (order));\
	_page;								\
    })

    #define alloc_pages(gfp, order) pgtag_alloc_pages(gfp, order)

The end result is that each call to alloc_pages() is changed to create a static alloc_tag structure that records the location of the call site; this structure is placed in the alloc_tags section. When an allocation call is made, the two counters in that structure are incremented accordingly (in the not-shown __alloc_tag_add() function). Behind the scenes, the code also makes a note (in the page_ext structure for the allocated pages) of the tag location for the allocation call site; this lets the kernel track which call site allocated each page. When the allocated pages are later freed, that information can be used to decrement the counts for that call site.

What comes out of all this work is an array of alloc_pages() call sites, each of which tracks the amount of memory that was allocated there and which has not yet been freed. The framework also includes infrastructure for iterating through this array and for presenting its contents in the debugfs filesystem. It is not hard to see how this information could be useful for a developer trying to track down a memory leak. Other patches in this series add similar tracking to the slab allocator and the ability to store the call stack for each allocation, giving more information on where the real source of a memory leak might be.

An entirely different application of this framework is dynamic fault injection. Driver code could, for example, include a sequence like:

    if (dynamic_fault("foo-driver-init"))
        return -EIO;  /* Simulate a failure */

The dynamic_fault() function, once again, places a code tag at the call site. It normally returns false, so the simulated failure code is not run. There is a knob that will appear under /sys/kernel/debug/dynamic_faults, though, that can be used to enable this fault site and test whether the driver's error handling works correctly.

There is even more in the patch series, including a latency-tracking mechanism and a reimplementation of the dynamic debugging facility. The point that is being made is that the code-tagging framework makes it relatively easy to add this sort of feature to the kernel in a way that has a minimal performance impact.

Most of the early discussion around this patch set has been inspired by Peter Zijlstra's question about just what this facility adds that is not already provided by the kernel's tracepoint mechanism. Overstreet responded, somewhat defensively, that there are a number of advantages to the code-tagging mechanism. They include capturing all activity from boot rather than just from when tracing was started, better performance, better ease of use, and no problems with dropped events. He said that the question should be asked the other way around: tracing proponents should show how that subsystem could be used to provide a similar capability with comparable performance and ease of use.

In response, Zijlstra pointed out that use of ftrace is not necessary to attach to tracepoints; attaching custom handlers to tracepoints would address concerns about performance and dropped events. Mel Gorman added that the tracepoint approach is more flexible, works with older kernels, and is more widely available. He also pointed to a patch set from Oscar Salvador implementing a different approach to memory-leak detection. Michal Hocko worried about the difficulties of reviewing and maintaining a patch set of this size.

This is a new and large patch set; it is likely to be under discussion for some time. The code-tagging part itself seems like it should be a relatively uncontroversial cleaning up of the code; it can, in theory, replace a number of independent implementations in the kernel with a single framework. Each of the add-on changes is likely to require additional discussion, though; one doesn't just walk into the memory-management subsystem and change the core allocator code without having to answer some questions. Chances are that this patch set will end up being split into its various components somewhere along the way so that each can be considered on its own merits.

Comments (none posted)

What's in a (type) name?

By Jonathan Corbet
September 2, 2022

The kernel's manual pages are in a bit of an interesting position. They are managed as a separate project, distinct from the kernel's documentation, and have the task of documenting both the kernel's system-call interface and the wrappers for that interface provided by the C library. Sometimes the two objectives come into conflict, as can be seen in a discussion that has been playing out over the course of the last year on whether to use C standard type names to describe kernel-defined structures.

The C <stdint.h> header file defines a number of types for developers who need to specify exactly how they need an integer variable to be represented. For example, int16_t is a 16-bit, signed type, while uint64_t is a 64-bit, unsigned type. This level of control is needed when defining data structures that are implemented by hardware, are exchanged through communications protocols — or are passed between user and kernel space.

The kernel, though, does not use these types to define its system-call interface. Instead, the kernel has its own types defined internally. Rather than use uint64_t, for example, the kernel's API definitions use __u64. That has been the situation for a long time — since before the standard C types existed — and is simply part of how the kernel project does things.

As a general rule, the man pages reflect the kernel's definition of data types. So, for example, the bpf() man page defines one piece of the bpf_attr union as:

    struct {    /* Used by BPF_MAP_*_ELEM and BPF_MAP_GET_NEXT_KEY
                   commands */
	__u32         map_fd;
	__aligned_u64 key;
	union {
	    __aligned_u64 value;
	    __aligned_u64 next_key;
	};
	__u64         flags;
    };

These types are familiar to kernel developers, but they may look a bit strange to user-space developers. Back in April of 2021, man-pages co-maintainer Alejandro Colomar decided to make things look more familiar by rewriting the man pages to use the standard C types instead. Perhaps out of love for a challenge, Colomar started with the bpf() man page; after applying the patch, the above structure was defined as:

    struct {    /* Used by BPF_MAP_*_ELEM and BPF_MAP_GET_NEXT_KEY commands */
        uint32_t                     map_fd;
        uint64_t [[gnu::aligned(8)]] key;
        union {
            uint64_t [[gnu::aligned(8)]] value;
            uint64_t [[gnu::aligned(8)]] next_key;
         };
         uint64_t                     flags;
     };

This patch was immediately vetoed by BPF maintainer Alexei Starovoitov, who said: "The man page should describe the kernel api the way it is in .h file". Colomar answered that the actual types used are the same either way, and that his change was better for users:

If we have a standard syntax for fixed-width integral types (and for anything, actually), the manual pages should probably follow it, whenever possible. Any deviation from the standard (be it C or POSIX) should have a very good reason to be; otherwise, it only creates confusion.

Starovoitov stood firm in his opposition, though, saying that the man pages should describe the types as they will be defined when code includes the associated kernel header file.

Colomar returned in May 2021 with a new version of the patch that was little changed from its predecessor. Also unchanged was the reception it got. This time, Greg Kroah-Hartman also expressed his opposition, saying that the types involved "are not the same, they live in different namespaces, and worlds, and can not always be swapped out for each other on all arches". GNU C Library developer Zack Weinberg disagreed, though:

Manpage documentation of C structs is *not* expected to match the actual declaration in the headers. The documented field type is usually assignment-compatible with the actual type, but not always. There's no guarantee whatsoever that the fields are in the same order as the header, or that the listed set of fields is complete.

This argument failed to convince the kernel community, though, which remained strongly against the change. This discussion then died down for over a year.

Colomar returned with a new patch converting many more files in August 2022; he included the Nacked-by tags he had received from three different developers. Unsurprisingly, those developers had not become more sympathetic toward the idea during the pause. Starovoitov repeated his opposition and asked Colomar to stop sending the patch.

In response, Colomar went ahead and applied the patch to the man-pages repository. A kernel patch that had encountered such opposition would almost certainly never have been applied, but the man pages are not a kernel project. Colomar appears to be the only active man-pages maintainer at the moment; longtime maintainer Michael Kerrisk has seemingly vanished from the scene since the man pages 5.13 release in August 2021. So there is nobody who is in a position to overrule Colomar when it comes to decisions in this area.

Much of the discussion covered the same ground as with the previous versions, but this time Linus Torvalds jumped in as well. He pointed out that the kernel's types simply cannot be the same as the standard C types without creating namespace problems: the kernel cannot include <stdint.h> to define those types, but also cannot define those types itself in files used by user space without creating conflicts there. Torvalds agreed with the others that the documentation should match the actual types used.

Honestly, I don't think it makes a *huge* amount of difference, but documentation that doesn't actually match the source of the documentation will just confuse somebody in the end. Somebody will go "that's not right", and maybe even change the structure definitions to match the documentation.

This message, along with a request from Kroah-Hartman to revert the change, was enough to convince Colomar to back down. His concluding words were:

You convinced me. The man-pages will document the types exactly as they are in kernel. It's just simpler.
As the patch was recently reverted after Greg asked me to do, I'll keep it that way. I guess this closes the man-pages discussion.

The interesting thing, of course, is that the kernel does, indeed, define many of the standard types internally, and there are thousands of variables defined using those types. Using standard C types in the kernel is not, itself, a problem; only using them in the user-space API definitions is. With sufficient will, this might well be a problem that could be overcome, but it would not be a small job. Meanwhile, it seems that the man pages will continue to document the types that are actually used in the kernel's user-space API header files.

Comments (42 posted)

Concurrent page-fault handling with per-VMA locks

By Jonathan Corbet
September 5, 2022

The kernel is, in many ways, a marvel of scalability, but there is a longstanding pain point in the memory-management subsystem that has resisted all attempts at elimination: the mmap_lock. This lock was inevitably a topic at the 2022 Linux Storage, Filesystem, Memory-Management and BPF Summit (LSFMM), where the idea of using per-VMA locks was raised. Suren Baghdasaryan has posted an implementation of that idea — but with an interesting twist on how those locks are implemented.

The mmap_lock (formerly called mmap_sem) is a reader/writer lock that controls access to a process's address space; before making changes there (mapping in a new range, for example), the kernel must acquire that lock. Page-fault handling must also acquire mmap_lock (in reader mode) to ensure that the address space doesn't change in surprising ways while a fault is being resolved. A process can have a large address space and many threads running (and incurring page faults) concurrently, turning mmap_lock into a significant bottleneck. Even if the lock itself is not contended, the constant cache-line bouncing hurts performance.

Many attempts at solving the mmap_lock scalability problem have taken the form of speculative page-fault handling, where the work to resolve a fault is done without taking mmap_lock in the hope that the address space doesn't change in the meantime. Should concurrent access occur, the speculative page-fault code drops the work it has done and retries after taking mmap_lock. Various implementations have been shown over the years and they have demonstrated performance benefits, but the solutions are complex and none have managed to convince enough developers to be merged into the mainline kernel.

An alternative approach that has often been considered is range locking. Rather than locking the entire address space to make a change to a small part of it, range locking ensures exclusive access to the address range of interest while allowing accesses to other parts of the address space to proceed concurrently. Range locking turns out to be tricky as well, though, and no implementation has gotten close to being considered for merging.

VMA locking

A process's address space is described by a sequence of virtual memory areas (VMAs), represented by struct vm_area_struct. Each VMA corresponds to an independent range of address space; an mmap() call will normally create a new one, for example. Consecutive VMAs with the same characteristics can be merged; VMAs can also be split if, for example, a process changes the memory protections on a portion of the range. The number of VMAs varies from one process to the next, but it can grow to be quite large; the Emacs process within which this article is being written has over 1,100 of them, while gnome-shell has over 3,100.

At LSFMM this year, Matthew Wilcox suggested that the range-locking problem could be simplified by turning it into a VMA-locking problem. Since each VMA covers a range of the address space, locking the VMA would be equivalent to locking that range. The result would have much coarser resolution than true range locking, but it might still be good enough to be worth the effort.

Baghdasaryan's patch set is the attempt to find out if that is the case. But, of course, it immediately ran into the complexities of memory-management subsystem locking. There are two distinct types of locks that need to be taken on a VMA:

Page-fault handling needs to ensure that the VMA remains present while a fault is being resolved and that it doesn't change in problematic ways. This work can be done concurrently with the handling of other faults or a number of other tasks, though. So the page-fault handler needs to take what is essentially a read lock.
Address-space changes will need exclusive access to one or more VMAs; while (for example) a VMA is being split, no other part of the kernel can be allowed to do anything with any of the parts. So these types of changes require a write lock.

The original idea had been to use a reader/writer lock for this task, but that led to another problem: write locks often need to be applied to multiple VMAs at once. It would be possible to implement this with reader/writer locks but, as Baghdasaryan pointed out in the cover letter: "Tracking all the locked VMAs, avoiding recursive locks and other complications would make the code more complex". There is surprisingly little desire for more complexity in the core memory-management code, so he went in search of a different solution.

The implementation

The scheme that emerged was a combination of a reader/writer lock and a sequence number that is added to every VMA, but also to the mm_struct structure that describes the address space as a whole. If the sequence number in a given VMA is equal to the mm_struct sequence number, then that VMA is considered locked for modification and inaccessible for concurrent page-fault handling. If the two numbers disagree, no lock exists and concurrent access is possible.

When a page fault occurs, the handler will first attempt to read-lock the per-VMA lock; if that fails then it falls back to acquiring the full mmap_lock as is done now. If the read lock succeeds, though, the handler must also check the sequence numbers; if the sequence number for the relevant VMA matches that in the mm_struct (which cannot change as long as mmap_lock is held), then other changes are afoot and handling must, once again, fall back to taking mmap_lock. Otherwise the VMA is available and the fault can be handled without locking the address space as a whole. The read lock will be released once that task is complete.

When the memory-management system must make address-space changes, instead, it must lock each of the VMAs that will be affected. The first step is to take a write lock on mmap_lock, then, for each VMA, it will acquire the reader/writer lock in write mode (potentially waiting for any existing readers to let go of it). That lock is only held for long enough to set the VMA's sequence number equal to the mm_struct sequence number, though. Once that change has been made, the VMA is locked even after the reader/writer lock is released.

Another way to describe this is to say that the per-VMA reader/writer lock really only exists to protect access to the per-VMA sequence number, which is the real per-VMA lock.

After the kernel has locked all of the relevant VMAs, whatever changes need to be made can proceed. It will not be possible to handle page faults within those VMAs during this time (as is the case now), but other parts of the address space will be unaffected. Once the work is complete, all of those VMAs can be unlocked by simply increasing the mm_struct sequence number. There is no need to go back to each locked VMA — or even to remember which ones they are.

There are, of course, plenty of other details that have been glossed over here, including the need to bring VMAs under read-copy-update protection so that they can be looked up without holding mmap_lock. But the locking scheme is the core that makes it all work. According to Baghdasaryan, the resulting performance increase is about 75% of that achieved with the speculative page-fault patches, so it's still leaving some performance on the table. But, he said: "Still, with lower complexity this approach might be more desirable".

This work is deemed to be a proof-of-concept at this point. Among other things, it only handles faults on anonymous pages and, even then, only those that are not in swap. Support for swapped and file-backed pages can be added later, he said, if the approach seems worth pursuing. Answering that question may take a while; core memory-management patches tend not to be merged quickly, and this discussion is just beginning. But if it works out, this patch set could be a step in the direction of the long-wished-for range-locking mechanism for process address spaces.

Comments (18 posted)

A look at Linux Mint 21

September 6, 2022

This article was contributed by Sam Sloniker

Linux Mint 21 "Vanessa" was released on July 31. There are no real headline-grabbing features that come with the new release, as the project generally seeks to make incremental changes, rather than larger, potentially disruptive ones. Changes in this release include a new Bluetooth manager that brings several improvements, driverless printing and scanning by default, a process monitor to inform the user about resource-intensive background tasks, new functionality for the Timeshift system backup tool, and several major under-the-hood improvements to the Cinnamon desktop environment.

Like previous releases, Linux Mint 21 is available in editions for the Cinnamon, MATE, and Xfce desktop environments. Cinnamon is based on GNOME 3, but with major changes to make it more like other desktop environments with a bottom panel and menu similar to that used in Windows 7 and earlier, rather than the heavily redesigned interface of GNOME 3 and later. Cinnamon was created by the Linux Mint developers due to criticisms of changes in GNOME 3 that were seen as unnecessary. MATE is a continuation of GNOME 2 that was originally forked by an Arch Linux user for the same reason, and Xfce is a lightweight desktop environment designed for computers that may not run other environments well. All of the editions in Linux Mint use X11; Wayland support is not currently on the development roadmap for Cinnamon. MATE and Xfce are working on Wayland support separately from the Mint project, and a future release of Linux Mint is likely to include that work in those editions.

For many users, especially those new to Linux, who have suitable systems (which includes most computers from the last several years except for some low-end netbooks), Cinnamon is probably the best choice because its primary focus is user-friendliness. MATE and Xfce are specifically designed to be lightweight, which sometimes comes at the expense of features and user-friendliness. If a change could be made to improve usability at the cost of system resources, Cinnamon would likely make the change, while MATE and Xfce may not. Cinnamon is also the most popular option, making it easier to find help with problems. For these reasons, the Cinnamon edition is the focus of this article and some of the changes mentioned only apply to Mint Cinnamon.

Mint 21 is based on Ubuntu 22.04 LTS, which was released in April. This will be the package base for all 21.x Mint releases as well, so upgrades to those future versions will be straightforward. As with what shipped in Ubuntu 22.04, Mint 21 is based on the 5.15 Linux kernel.

Desktop tools

Mint 21 replaces Blueberry, the Bluetooth manager used in previous versions, with Blueman, which has more features and provides more information than Blueberry. Blueman also has better headset compatibility and improved audio profile support. Additionally, while Blueberry is a wrapper around the GNOME-specific gnome-bluetooth, Blueman is specifically designed to be cross-desktop.

Several of the applications included in Linux Mint are part of the distribution's "XApps" project, which develops desktop-independent GTK software. These programs are especially useful for distributions like Mint that offer several desktop environments, because developing an application for each desktop environment for a given function would lead to significant duplication of effort.

Cinnamon has relatively few tools of its own, other than things like the settings program that cannot really be separated from the desktop environment; it relies on XApps for other tools. Mint uses XApps in the other editions when necessary to fill gaps in features provided by those desktops, although desktop-specific tools are generally used by default if they are available.

XApps are also designed to work on any distribution, not just Mint. They are available, along with Cinnamon, for many distributions, including Ubuntu, Fedora, Arch Linux, and Debian. I have used Cinnamon and XApps extensively on my Arch Linux laptop, and it works quite well; I have also used several XApps in GNOME on Arch, and they work just as well as they do in Cinnamon.

The Timeshift system backup tool, which was formerly an independent project, is now developed by Linux Mint as an XApp. It has a new feature to completely avoid filling disks with backups when it is used in rsync mode; it now skips making automated backups that would leave less than 1GB of free space on the target filesystem. Timeshift creates system snapshots using rsync on all filesystems except Btrfs, which supports snapshots directly. Mint uses ext4 by default, so most users will benefit from the rsync mode improvement.

Linux Mint 21 adds support for displaying thumbnails for several previously unsupported file types, thanks to a new XApps project called "xapp-thumbnailers". AppImages now show the application icon, EPUBs show the book's front cover, MP3s show the album cover, while WebP and most RAW images show a typical image thumbnail. WebP support was also added to Xviewer, the XApp image viewer.

The new version of Mint also includes a process monitor panel applet in the Cinnamon edition. Mint has several resource-intensive processes that can run automatically, such as system updates and Timeshift snapshots. If these run while the computer is in use, they can slow the computer down. The process monitor does not solve this problem, but it does notify the user that the tasks are running. This way, while the computer is still slower while these tasks are running, the user is not left to figure out why.

Starting with this release, Mint uses the Internet Printing Protocol (IPP) for driverless printing and scanning by default. IPP has been supported since Mint 20, but drivers were still used by default in that release. The majority of printers and scanners will now work out of the box without the need to install drivers. Traditional drivers are still supported, though, so users with devices that do not support IPP can still use them.

Cinnamon/Muffin

Several major internal changes were made to the Cinnamon desktop and Muffin, its window manager. Muffin is based on Mutter, GNOME's window manager, but the code bases have diverged significantly over time. Muffin forked from Mutter 3.2 eleven years ago, and both projects have made changes over time, making it increasingly difficult to merge Mutter updates into Muffin. With Cinnamon 5.4, the new version used in Mint 21, Muffin has been rebased on Mutter 3.36, and has fewer differences from the upstream code. This release also moves the handling of display settings from Cinnamon to Muffin.

Window decoration handling was also improved. GTK applications can have their header bars rendered in two different ways: by the client application (client-side decorations, or CSD) or by the window manager (server-side decorations, or SSD). Traditionally, server-side decorations have been used, but many applications are switching to client-side decorations. Advantages of CSD include improved use of screen space and applications can have buttons for functions other than window controls in their header bars. Of course, these windows' header bars will look different because of the extra information shown, but some desktops also theme them differently, leading to an inconsistent user interface.

To reduce this inconsistency, the Mint developers made CSD and SSD window decorations as similar as possible with the Mint-X and Mint-Y themes. However, prior to this release, the title bars were rendered differently; CSD windows' decorations were rendered with GTK, while Muffin used Metacity to render those of SSD windows. The use of different rendering engines prevented the themes from making the decorations identical, and because GTK's antialiasing and rendering of rounded corners are smoother than Metacity's, SSD windows' decorations look slightly more pixelated than those of CSD windows. Starting with Cinnamon 5.4, GTK is used for both CSD and SSD windows' decorations, making them look identical.

Upgrading

Because Mint 21 switches to a new package base, the upgrade from 20.3 to 21 is significantly more complicated than, for example, that between 20.2 and 20.3. Previous major version upgrades, such as upgrading from Mint 19.3 to 20, required a lengthy command-line process that could go wrong in several places. However, a new graphical tool called mintupgrade makes the procedure much simpler for this upgrade, although it is still more complicated than a minor version upgrade. The instructions do require the use of the terminal, but no knowledge of the Linux command line is needed.

Only Mint 20.3 can be upgraded directly to 21; other releases must be upgraded to one or more intermediate releases. The Mint user guide has more information on available upgrade paths. As with previous releases, users are not required to upgrade right away. Several older versions are still supported; 19.x releases are supported through April 2023, while 20.x releases are supported through April 2025. Linux Mint 21 is a long-term support (LTS) release that will be supported until April 2027.

Conclusion

While Mint 21 does not have many major, attention-grabbing changes, this release makes several relatively small but still significant improvements. Things like the replacement of Blueberry with Blueman, the addition of the process monitor, or the switch to driverless printing are relatively minor changes, but put together, they improve the user experience. The Linux Mint project tends to avoid large, needless changes, as shown by the original motives for developing Cinnamon; the project continues this philosophy by focusing on small improvements even in major-version updates.

Comments (14 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>