|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for March 21, 2024

Welcome to the LWN.net Weekly Edition for March 21, 2024

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Cranelift code generation comes to Rust

By Daroc Alden
March 15, 2024

Cranelift is an Apache-2.0-licensed code-generation backend being developed as part of the Wasmtime runtime for WebAssembly. In October 2023, the Rust project made Cranelift available as an optional component in its nightly toolchain. Users can now use Cranelift as the code-generation backend for debug builds of projects written in Rust, making it an opportune time to look at what makes Cranelift different. Cranelift is designed to compete with existing compilers by generating code more quickly than they can, thanks to a stripped-down design that prioritizes only the most important optimizations.

Fast compiler times are one of the many things that users want from their programming languages. Compile times have been a source of complaints about Rust (and other languages that use LLVM) for some time, despite continuing steady progress by the Rust and LLVM projects. Additionally, a compiler that produces code quickly enough is potentially viable in applications where it currently makes more sense to use an interpreter. All of these factors are cause to think that a compiler that focuses on speed of compilation, rather than the speed of the produced code, could be valuable.

Cranelift's first use was as the backend of Wasmtime's just-in-time (JIT) compiler. Many languages now come equipped with JIT compilers, which often use specialized tricks to quickly compile isolated functions. For example, Python recently added a copy-and-patch JIT that works by taking pre-compiled sections of code for each Python bytecode and stitching them together in memory. JIT compilers often use techniques, such as speculative optimizations, that make it difficult to reuse the compiler outside its original context, since they encode so many assumptions about the specific language for which they were designed.

The developers of Cranelift chose to use a more generic architecture, which means that Cranelift is usable outside of the confines of WebAssembly. The project was originally designed with use in Wasmtime, Rust, and Firefox's SpiderMonkey JavaScript interpreter in mind. The SpiderMonkey project has since decided against using Cranelift for now, but the Cranelift project still has a design intended for easy incorporation into other programs.

Cranelift takes in a custom intermediate representation called CLIF, and directly emits machine code for the target architecture. Unlike many other JIT compilers, Cranelift does not generate code that relies on being able to fall back to using an interpreter in case an assumption is invalidated. That makes it suitable for adopting into non-WebAssembly-related projects.

Cranelift's optimizations

Despite its focus on fast code generation, Cranelift does optimize the code it generates in several ways. Cranelift's optimization pipeline is based on equality graphs (or E-graphs), a data structure for representing sets of equivalent intermediate representations efficiently. In a traditional compiler, the optimizer works by taking the representation of the program produced by parsing and then applying a series of passes to it to produce an optimized version. The order in which optimization passes are performed can have a large impact on the quality of code produced, since some passes require simplifications made by other passes in order to apply. Choosing the correct order in which to apply optimizations is called the phase-ordering problem, and has been the source of a considerable amount of academic research.

In Cranelift, the part of each optimization that recognizes a simpler or faster way to represent a particular construct is separated from the part that chooses what representation should ultimately be used. Each optimization works by finding a particular pattern in the internal representation, and then annotating it as being equivalent to some simplified version. The E-graph data structure represents this efficiently, by allowing two copies of the internal representation to share the nodes that they have in common, and to allow nodes in CLIF to refer to equivalency classes of other nodes, instead of referring to specific other nodes. This produces a dense structure in which adding an alternate form of a particular section of the program is cheap.

Because optimizations run on an E-graph only add information in the form of new annotations, the order of the optimizations does not change the result. As long as the compiler continues running optimizations until they no longer have any new matches (a process known as equality saturation), the E-graph will contain the representation that would have been produced by the optimal ordering of an equivalent sequence of traditional optimization passes — along with many less efficient representations. E-graphs are more efficient than directly storing every possible alternative (taking O(log n) space on average), but they still take more memory than a traditional intermediate representation. Depending on the program in question and the set of optimizations employed, a fully saturated E-graph could be arbitrarily large. In practice, Cranelift sets a limit on how many operations are performed on the graph to prevent it from becoming too large.

E-graphs pay for this simplicity and optimality when it comes time to extract the final representation from the E-graph to use for code generation. Extracting the fastest representation from an E-graph is an NP-complete problem. Cranelift uses a set of heuristics to quickly extract a good-enough representation.

Trading one NP-complete problem (selecting the best order for a set of passes) for another may not seem like a large benefit, but it does make sense for a smaller project. The order of optimization passes is largely set by the programmers who write the optimizations, because it requires domain knowledge to pick a reasonable sequence. Extracting an efficient representation from an E-graph, on the other hand, is a generic search problem that can have as much or as little computer time applied to it as the application permits. Cranelift's heuristics don't extract the most efficient representation, but they do a good job of quickly extracting a decent one.

Representing optimizations in this way also makes it easier for Cranelift maintainers to understand and debug existing optimizations and their effects, and makes writing new optimizations somewhat simpler. Cranelift has a custom domain-specific language (ISLE) that is used internally to specify optimizations.

While Cranelift does not organize its optimizations in phases, it does have ten different sets of related optimizations defined in their own ISLE files, which allows for a rough comparison with GCC and LLVM. LLVM lists 96 optimization passes in its documentation, while GCC has 372. The optimizations that Cranelift does have include constant propagation, bit operation simplifications, vectorization, floating-point operation optimizations, and normalization of comparisons. Dead-code elimination is done implicitly by extracting a representation from the E-graph.

A paper from 2020 showed that Cranelift was an order of magnitude faster than LLVM, while producing code that was approximately twice as slow on some benchmarks. Cranelift was still slower than the paper's authors' custom copy-and-patch JIT compiler, however.

Cranelift for Rust

Cranelift may have been designed with the aim of being an alternate backend for Rust, but actually making it usable has taken significant effort. The Rust compiler has an internal representation (IR) called mid-level IR that it uses to represent type-checked programs. Normally, the compiler converts this to LLVM IR before sending it to the LLVM code-generation backend. In order to use Cranelift, the compiler needed another library that takes mid-level IR and emits CLIF.

That library was largely written by "bjorn3", a Rust compiler team member who contributed more than 3,000 of the approximately 4,000 commits to Rust's Cranelift backend. He wrote a series of progress reports detailing his work. Development began in 2018, and kept pace with Rust's own rapid development. In 2023, the backend was considered stable enough to ship as part of Rust nightly as an optional toolchain component.

People can now try the Cranelift backend using rustup and cargo:

    $ rustup component add rustc-codegen-cranelift-preview --toolchain nightly
    $ export CARGO_PROFILE_DEV_CODEGEN_BACKEND=cranelift
    $ cargo +nightly build -Zcodegen-backend

The given rustup command adds the Cranelift backend's dynamic library to the set of toolchain components to download and keep up to date locally. Setting the CARGO_PROFILE_DEV_CODEGEN_BACKEND environment variable instructs cargo to use Cranelift for debug builds, and the final cargo invocation builds whatever Rust project lives in the current directory with the alternate code-generation backend feature turned on. The latest progress report from bjorn3 includes additional details on how to configure Cargo to use the new backend by default, without an elaborate command-line dance.

Cranelift is itself written in Rust, making it possible to use as a benchmark to compare itself to LLVM. A full debug build of Cranelift itself using the Cranelift backend took 29.6 seconds on my computer, compared to 37.5 with LLVM (a reduction in wall-clock time of 20%). Those wall-clock times don't tell the full story, however, because of parallelism in the build system. Compiling with Cranelift took 125 CPU-seconds, whereas LLVM took 211 CPU-seconds, a difference of 40%. Incremental builds — rebuilding only Cranelift itself, and none of its dependencies — were faster with both backends. 66ms of CPU time compared to 90ms.

Whether Cranelift will ameliorate users' concerns about slow compile times in Rust remains to be seen, but the initial signs are promising. In any case, Cranelift is an interesting showcase of a different approach to compiler design.

Comments (59 posted)

The first half of the 6.9 merge window

By Jonathan Corbet
March 14, 2024
As of this writing, just over 4,900 non-merge changesets have been pulled into the mainline for the 6.9 release. This work includes the usual array of changes all over the kernel tree; read on for a summary of the most significant work merged during the first part of the 6.9 merge window.

Architecture-specific

  • Support for the Intel Flexible Return and Event Delivery (FRED) mechanism has been merged. FRED improves low-level event delivery, allowing for simpler and more reliable code; see this changelog and this documentation commit for more information.
  • The core kernel has gained support for running AMD Secure Nested Paging (SNP) guests, part of AMD's confidential-computing solution. Full support requires KVM changes which, evidently, have been deferred until 6.10.
  • The kernel can now make use of GCC's named address spaces feature to optimize access to per-CPU data.
  • The latest x86 hardware vulnerability is "register file data sampling", which affects Intel Atom CPUs. The mitigations have been merged; see this documentation commit for details.

Core kernel

  • The kernel is now able to create and use pidfds with threads, rather than just the thread-group leader. This feature can be accessed using the new PIDFD_THREAD flag with pidfd_open() or CLONE_THREAD with clone() or clone3(). The semantics of thread-level pidfds varies a bit from process pidfds; see this merge changelog for details.
  • There is a new kernel virtual filesystem for pidfds; see this article for details.
  • The BPF arena subsystem, which manages shared memory between BPF programs and user space, has been merged. To support use of this memory, the cond_break macro changes have been merged as well.

Filesystems and block I/O

  • The zonefs and hugetlbfs filesystems now support ID-mapped mounts.
  • The new RWF_NOAPPEND flag for pwritev2() allows an offset to be supplied for the write even if the file has been opened in append-only mode. This flag makes pwritev2() behave as POSIX had intended with pwritev().
  • The old ntfs filesystem implementation has been removed in favor of ntfs3.
  • The new FS_IOC_GETUUID ioctl() command will retrieve the UUID for a given filesystem; it is a generic version of the (previously) ext4-only feature. Another new command, FS_IOC_GETFSSYSFSPATH, will retrieve the location for a mounted filesystem under /sys/fs.
  • The io_uring subsystem has gained the ability to truncate a file.
  • Administrators of NFSv4 servers now have the ability to revoke open and lock states on files.

Hardware support

  • Miscellaneous: StarFive JH8100 external interrupt controllers.
  • Networking: RENESAS FemtoClock3 PTP clocks, Qualcomm QCA807x PHYs, Marvell Octeon PCI Endpoint NICs, esd electronics gmbh CAN-PCI(e)/402 controllers, and Freescale QUICC multichannel controllers.

Miscellaneous

  • The kernel now uses version 1.76.0 of the Rust language. This update stabilizes two features (const_maybe_uninit_zeroed and ptr_metadata byte_sub, which can replace the unstable ptr_metadata) used by the kernel, making the list of needed unstable features that much shorter. There have been various other Rust-related changes as well; see this merge changelog for details.
  • Anybody interested in the details of how to use bisection to track down a kernel regression may want to have a look at this new document that covers the topic extensively.

Networking

  • There has been a lot of work done to reduce the use of the networking subsystem's core RTNL lock and reduce contention overall.

Security-related

  • The BPF token mechanism, which allows the delegation of fine-grained permissions for BPF operations, has been merged. This merge was attempted for 6.8 but ended up being reverted; this time it seems likely to stick. This changelog gives an overview of the current state of this work.

Internal kernel changes

  • The BH workqueue mechanism, intended as an eventual replacement for tasklets, has been merged.
  • The timer subsystem has been extensively reworked to better choose the CPU on which an expired timer runs. See this merge changelog for some more information about this change.
  • The UBSAN signed-overflow sanitizer has been restored with the idea of helping to drive compiler development toward better signed wraparound warnings.

The 6.9 merge window can be expected to remain open through March 24. There is still a lot of work waiting to be pulled into the mainline; as usual, that work will be summarized here once the merge window closes.

Comments (none posted)

Toward a real "too small to fail" rule

By Jonathan Corbet
March 18, 2024
Kernel developers have long been told that any attempt to allocate memory might fail, so their code must be prepared for memory to be unavailable. Informally, though, the kernel's memory-management subsystem implements a policy whereby requests below a certain size will not fail (in process context, at least), regardless of how tight memory may be. A recent discussion on the linux-mm list has looked at the idea of making the "too small to fail" rule a policy that developers can rely on.

The kernel is unable to use virtual memory, so it is strictly bound by the amount of physical memory in the system. Depending on what sort of workload is running, that memory could be tied up in various ways and unavailable for allocation elsewhere. Allowing allocation requests to fail gives the kernel the freedom to avoid making things worse when memory pressure is high.

There are some downsides to failing an allocation request, of course. Whatever operation needed that memory will also be likely to fail, and that failure will probably propagate out to user space, resulting in disgruntled users. There is also a significant chance that the kernel will not handle the allocation failure properly, even if the developers have been properly diligent. Failure paths can be hard to test; many of those paths in the kernel may never have been executed and, as a consequence, many are likely to have bugs. Unwinding an operation halfway through can be a complex business, which is not the kind of task one wants to see entrusted to untested code.

Recently, Neil Brown started a sub-thread in a wide-ranging discussion on memory-management policies by suggesting a reconsideration of the rules around GFP_KERNEL allocations. Currently, programmers have to be prepared for those calls to fail, even if, in fact, the kernel will not fail small allocations. Brown proposed to make the "too small to fail" behavior a documented rule, at least for allocations below a predefined size. GFP_KERNEL allocations are allowed to sleep, he said, and thus have access to all of the kernel's machinery for freeing memory. In the worst case, the out-of-memory (OOM) killer can be summoned to remove a few processes from the system. If this code is unable to create some free memory, he said, "the machine is a goner anyway". If, instead, GFP_KERNEL allocations would always succeed, he concluded, it "would allow us to remove a lot of untested error handling code".

Kent Overstreet objected to this idea, though. It is common, he said, for kernel code to attempt to allocate memory to carry out a task efficiently, but to be able to fall back to a slower approach if the memory is unavailable; such mechanisms will not work if memory requests do not fail. Even worse, the kernel's efforts to satisfy such requests may worsen performance elsewhere in the system. Without allocation failure, there is no signal to indicate that memory is tight; the implementation of memory overcommit for user space has, he said, made it impossible to use memory efficiently there.

The real solution, he said, is proper testing of all those error paths; "relying on the OOM killer and saying that because [of] that now we don't have to write and test your error paths is a lazy cop out". James Bottomley disagreed, pointing out that the OOM killer only runs in extreme situation, and that error paths are a problem. "Error legs are the least exercised and most bug, and therefore exploit-prone pieces of code in C. If we can get rid of them, we should." Overstreet was unimpressed: "Having working error paths is _basic_, and learning how to test your code is also basic. If you can't be bothered to do that you shouldn't be writing kernel code."

Dave Chinner, instead, was enthusiastically supportive of the idea. The XFS filesystem, he said, was originally developed for a kernel (IRIX) that provided a guarantee for allocations. "A simple change to make long standing behaviour an actual policy we can rely on means we can remove both code and test matrix overhead - it's a win-win IMO."

Brown later modified his proposal slightly, noting that changing the semantics of GFP_KERNEL might cause problems for existing code. Instead, perhaps, GFP_KERNEL could be deprecated entirely in favor of a new set of allocation types. He later suggested this hierarchy:

  • GFP_NOFAIL would explicitly request the "cannot fail" behavior and could, as a result, wait a long time for an allocation request to be fulfilled.
  • GFP_KILLABLE would be the same as GFP_NOFAIL, with the exception that requests will fail in the presence of a fatal signal.
  • GFP_RETRY would make multiple attempts to satisfy an allocation request, but would eventually fail if no progress is made.
  • GFP_NO_RETRY would only allow a single attempt (which could still sleep) at allocating memory, after which the request would fail.
  • GFP_ATOMIC would not sleep at all (which is the current behavior).

Given these options, he said, GFP_KERNEL could go:

I don't see how "GFP_KERNEL" fits into that spectrum. The definition of "this will try really hard, but might fail and we can't really tell you what circumstances it might fail in" isn't fun to work with.

Overstreet responded, once again, that these changes were not needed: "We just need to make sure error paths are getting tested - we need more practical fault injection, that's all." Chinner, instead, commented that GFP_KILLABLE and GFP_RETRY were essentially the same thing; Brown responded that, perhaps, the key distinguishing feature of those allocation types is that they would not invoke the OOM killer; perhaps both of them could be replaced with a single GFP_NOOOM type. "We might need a better name than GFP_NOOOM :-)".

Matthew Wilcox raised a different sort of objection. The proper allocation policy for any given request depends on the context in which the request is made; a function called from an interrupt handler has fewer options available than one running in process context. Sometimes, the code that knows about that context is several steps back in the call chain from the function doing the allocation. The way to set the allocation type, he said, is through the use of context flags applied to the current thread.

Brown, though, pointed out that this context is not the full picture. If code has been written assuming GFP_NOFAIL behavior, it would be incorrect to allow the context to change an allocation into one that could fail: "context cannot add error handling".

Vlastimil Babka worried that deprecating GFP_KERNEL would be an unending task. Instead, guaranteeing "too small to fail" could be done quickly, and modifying specific call sites to allow allocation failure would be a relatively easy task, so he suggested taking that path. Brown, though, answered that removing the big kernel lock also took a long time: "I don't think this is something we should be afraid of". Since redefining GFP_KERNEL also implies removing error-handling code, he said, it should still be handled one call site at a time.

The discussion wound down at about this point, but there is a good chance that we'll be hearing these ideas again. The kernel, for all practical purposes, already implements GFP_NOFAIL behavior for allocations of eight pages or less. Turning the behavior into a guarantee would allow for significant simplification and the removal of a lot of untested code. That is an idea with significant appeal; it would be surprising if it did not come up at the Linux Storage, Filesystem, Memory-Management and BPF Summit in May.

Comments (41 posted)

Managing Linux servers with Cockpit

By Joe Brockmeier
March 20, 2024

Cockpit is an interesting project for web-based Linux administration that has received relatively little attention over the years. Part of that may be due to the project's strategy of minor releases roughly every two weeks, rather than larger releases with many new features. While the strategy has done little to garner headlines, it has delivered a useful and extensible tool to observe, manage, and troubleshoot Linux servers.

Overview

The Cockpit project started in 2013, sponsored by Red Hat. It was primarily targeted at system administrators new to Linux, such as Windows administrators who were having to adapt to Linux's increasing presence in the server market. It is designed to be distribution-independent, and is available for and tested on many popular Linux distributions including Arch Linux, Debian, Fedora, openSUSE Tumbleweed, and Red Hat Enterprise Linux (RHEL) and its derivatives. See the full list on the running Cockpit page.

The project's goal is to provide a coherent, discoverable GUI to manage systems and make use of tools already present on the system without getting in the way of or conflicting with other tools. Cockpit doesn't save configuration state independently of the system; that is, if an administrator uses Cockpit to make changes to a system firewall and then makes further changes to the firewall using another tool like Ansible, Cockpit won't try to automatically revert Ansible's changes down the road.

One of the project's ideals is that people should "feel they are interacting with the underlying Server OS", and that its user interface should show the server name or operating system name rather than Cockpit branding. As a matter of fact, users are interacting with the underlying operating system when using Cockpit: it uses standard system tools behind the scenes for its tasks if they are available.

While Cockpit is meant to be distribution-independent, its heritage does show through a bit in its choices of default tools. Cockpit uses NetworkManager to manage network configurations, firewalld to manage a system's firewall, storaged for working with system storage, SOS report for creating diagnostic reports, and so forth if they're present on the system. If a distribution defaults to networkd instead of NetworkManager, then the options are to install NetworkManager, forego using Cockpit for network administration, or do the work and contribute the feature to the Cockpit project. For the most part, though, few users contribute features to the project—the majority of contributions to Cockpit come from a handful of Red Hat employees.

If one looks at the past two years of activity in the main Cockpit repository, the majority of commits come from Martin Pitt, Allison Karlitskaya, Jelle van der Waa, Marius Vollmer, and Katerina Koukiou, who are all Red Hat employees. This is not to say that contributions are unwelcome. The project has made a respectable attempt to encourage contributions, or at least make it possible for others to reuse its work. Cockpit has a detailed contribution guide and a developer guide for those interested in contributing to or reusing Cockpit's components.

Cockpit architecture

Cockpit is licensed under the LGPL v2.1 (or later) and consists of the web frontend, along with a web service for communication between the frontend and cockpit-bridge. The cockpit-bridge program is responsible for relaying commands via DBus to systemd, storaged, NetworkManager, and the other system tools used by Cockpit. Its web components are written in JavaScript, with the other components largely written in C or Python. The cockpit-bridge component has implementations in C and Python. The Python bridge was introduced in June 2023, in the Cockpit 294 release. Recent RHEL-based distributions and Fedora have switched to the Python implementation, while Debian 12 and Ubuntu 22.04 are still on the C version of cockpit-bridge.

Once Cockpit is installed and running on a system, its web interface is available on port 9090 by default. The port can be changed, if needed, or the web service can be turned off entirely and Cockpit clients can connect over SSH by using a desktop client, a container that runs the Cockpit web service, or by logging into another server running Cockpit and then connecting to a secondary server.

One might expect problems when mixing and matching versions of Cockpit, but because it is used on so many Linux distributions with varying release schedules, backward compatibility is another of the project's ideals. If a user is running Cockpit on Debian Bookworm, they should be still be able to connect to a server running Fedora Rawhide without problems, or vice versa. Having used Cockpit for years on a variety of systems running Alma Linux, CentOS, Debian, Fedora, RHEL, and Ubuntu, I can confirm that the project does a fine job of maintaining compatibility between releases.

The project recommends using Firefox or Chrome, and has automated tests to ensure compatibility with both browsers. The project also states that it "periodically" tests Edge, Safari, and GNOME's Epiphany web browser. Users trying to connect to Cockpit with Safari on iOS, iPadOS, and Arm-based Macs (at least of this writing), may have a bit of difficulty. Safari on those platforms is missing support for the self-signed certificates that Cockpit uses by default. Users can work around this, or just use a supported browser.

Releases come out roughly every two weeks. The release numbering for the project is somewhat unusual. In 2016 Cockpit adopted a new scheme and started with version 118, with subsequent versions being 119, 120, etc. Note that some releases receive subsequent bugfix releases of their own, such as the 310 release that received a 310.1 release to fix a race condition, and 310.2 with fixes for curl and SELinux. The most recent release as of this writing is Cockpit 313, which comes with a note saying that it contains "assorted bug fixes and improvements" on its release page. The project's blog usually has more descriptive posts that advertise new features such as Btrfs support for 309 or the addition of LVM2 RAID layout support in 304.

The project has a mailing list for development but it is not a lively list. Two messages have been sent in 2024, one in January and one in February. The GitHub discussions page is more lively, and most of the work happens in Cockpit's issue tracker on GitHub.

The fast pace of releases means that the upstream version of Cockpit may be significantly newer than the version packaged for stable or long-term support (LTS) versions of Debian and Ubuntu. The project recommends installing from the Debian backports or Ubuntu backports repositories to use a more up-to-date version if possible.

Taking the controls

Cockpit can be set up to use standard system password or Kerberos authentication when logging in directly. It can use SSH key-based authentication when connecting over SSH to secondary servers. Unlike tools like Webmin, Cockpit doesn't maintain a set of users independently of the system, users log in as themselves, and have the same permissions and privileges as if they had connected via SSH or logged in at the console. Users can also gain administrative access if they have sudo privileges on the host.

After logging in, users will see a system overview page with a dashboard displaying system health, CPU and memory usage, basic system information, and navigation to pages for working with user accounts, system logs, system services, networking, and other tools. The exact services available via Cockpit will vary based on the services available on the host OS and which optional components are installed. This makes it simple to log in and see, at a glance, whether a system has any pending software updates, whether it has logged any errors to be concerned about, and much more.

[The Cockpit
dashboard]

Drilling down into the metrics and history page, Cockpit offers a more detailed set of metrics about the system's load, which applications are consuming the most memory, the disk usage and throughput, and network interfaces and bandwidth consumption. Note that Cockpit's memory-usage information comes from control groups (cgroups) to provide a quick overview of system usage rather than a full accounting of resident set and virtual memory by process. For historical data, Cockpit uses Performance Co-Pilot (PCP) if it's available. If it isn't available, Cockpit provides an option to install PCP to begin collecting metrics and then display system usage and load over time. Cockpit will even note spikes in load or I/O, with links to logs from that timeframe that might help shed light on the cause.

The services tab allows an administrator to browse the available systemd services, targets, timers, sockets, or paths. From there, administrators can manage the various systemd services, see their relationships with other services, and view their logs. If relevant to a service, Cockpit will also show its memory usage. As an example, browsing the apache2.service on my Debian server running a few WordPress blogs shows 628MB memory in use.

As one might expect, Cockpit's accounts tab will let administrators view users on the system and work with users and groups. In addition to the basic operations like adding or removing users, changing passwords, managing a users' groups, administrators can also browse a users' login history and manage authorized SSH keys.

Extending Cockpit

Cockpit also has number of additional applications one might want to install, depending on what workloads a system is running. For instance, the Cockpit project has applications for working with virtual machines, managing containers with Podman, managing OSTree updates, working with SELinux, and a number of others. There are also third-party application add-ons like the Navigator file browser for Cockpit, another to manage Samba and NFS file sharing, a ZFS on Linux administration utility, and more. If none of the available applications fit the bill, users can create their own custom applications and add them to Cockpit's navigation.

Even with the array of third-party applications for Cockpit, the odds are that eventually one will need to do some system task that isn't doable via its interface. Even then, all is not lost: when all else fails, Cockpit has a web-based terminal for quick and easy access to the command line.

As noted, the Cockpit team churns out releases on a regular basis. The developers don't tend to make much noise about the releases, so the project goes unnoticed by a lot of the users who would benefit most from it. It's a handy, deceptively useful tool that's well worth a look as part of a well-balanced administrator's diet.

Comments (15 posted)

"Real" anonymous functions for Python

By Jake Edge
March 19, 2024

There are a number of different language-enhancement ideas that crop up with some regularity in the Python community; many of them have been debated and shot down multiple times over the years. When one inevitably arises anew, it can sometimes be difficult to tamp it down, even if it is unlikely that the idea will go any further than the last N times it cropped up. A recent discussion about "real" anonymous functions follows a somewhat predictable path, but there are still reasons to participate in vetting these "new" ideas, despite the tiresome, repetitive nature of the exercise—examples of recurring feature ideas that were eventually adopted definitely exist.

At the end of January, Dan D'Avella asked why Python did not have "real anonymous functions a la JavaScript or Rust". While Python functions are regular objects that can be assigned to a variable or passed to another function, that is not reflected in the syntax for the language, in his opinion. There is, of course, lambda, "but its usefulness is quite limited and its syntax is frankly a bit cumbersome". He wondered if more flexible, full-on anonymous functions had been proposed before, saying that he had not found any PEPs of that nature.

Python has two ways to define functions, def for named functions and lambda for anonymous functions. While named functions can contain multiple statements, lambda is an expression that cannot contain any statements; it returns a function object that evaluates an expression with the parameters given:

    >>> (lambda x: x * 7)(6)
    42

That creates an anonymous function and calls it with a parameter of six, but that is not a particularly idiomatic use of lambda. As described in an LWN article on proposals for alternative syntax for lambda, a common use of the facility is to provide a way for some library functions to extract the actual arguments they should use, as with:

    >>> tuples = [ ('a', 37), ('b', 23), ('c', 73) ]
    >>> sorted(tuples, key=lambda x: x[1])
    [('b', 23), ('a', 37), ('c', 73)]

That uses a lambda expression to extract the second element of each tuple, which is used as the sort key.

In a response to D'Avella, David Lord wondered if he was asking for "multi-line lambdas", which is an idea that has come up many times over the years. While a lambda expression can actually span more than one physical line, using the \ line-continuation character, the restriction to a single expression greatly limits what lambdas can do. For more complex computation, named functions, which can be limited to the local scope to avoid any name-collision problems, are the way to go, as a 2006 blog post from Guido van Rossum stated. In his opinion, there is no Pythonic way to embed a multi-statement construct in the middle of an expression.

Lord pointed out that "there are years if not decades of previous iterations" of discussions about multi-statement lambdas; he linked to Van Rossum's post to show some of that history and why it is believed that there are unsolvable problems in providing the feature. The problem, in a nutshell, is that Python blocks are delineated using white space, not braces or keywords like begin and end, so any expression used to create a function with multiple statements would need a way to incorporate white space—or it will not look like Python at all. Van Rossum, though, said that even if acceptable syntax could be found, he still objected to the idea of adding lots of complexity to lambda in order to avoid something that is "only a minor flaw".

D'Avella said that Van Rossum's opinion may have been justified 18 years ago, but may no longer be so, "especially in light of developments in other languages". Beyond that, Python itself has changed quite a bit over that time; "Try to imagine using match in Python" back when that post was written. While it is true that there have been plenty of changes in the intervening years, D'Avella did not specify how those changes provided reasons to modify the longstanding decision to maintain the restrictions on lambda, as several in the thread noted.

Terry Jan Reedy said that one of the hallmarks of Python is its readability and that adding more complicated lambda expressions runs counter to that. Both for debugging and testing purposes, named functions are superior, except for the simplest expressions: "Lambda expressions work best when obvious enough to not need testing and when unlikely to be a direct cause of an exception." In practice, the current lambdas are generally expressive enough, Reedy said.

D'Avella dismissed that as a "perfectly valid" opinion, but one that he disagreed with; based on his experience with other languages, he believes that "anonymous function literals are an elegant and useful construct". The problem, though, as Paul Moore pointed out, is that there is a process that needs to be followed so that the core developers will reconsider a longstanding decision of this nature; someone needs to explain why the objections raised before are no longer valid:

Maybe if you picked one or two of those, and explained precisely why they don't apply any more, that might be a better argument than simply "everything changes, so let's propose something that's been rejected before, one more time".

There may be perfectly good reasons why Python should now support anonymous functions. But if no-one ever puts together a proposal that actually addresses the reasons why this idea has failed so many times before, we'll never know.

Part of the problem is that it is difficult to track down the previous discussions of the feature, D'Avella said. He had expected to find a rejected PEP, but did not; his searches for other discussions were not particularly successful either. Reedy pointed to a number of different sources for discussions, which are scattered over several mailing lists and, more recently, the Ideas category on the Python forum. Part of the disconnect may be that D'Avella is wondering about "anonymous functions", but the discussions in Python circles generally revolve around lambda, since that is the existing mechanism in the language. One suspects he would have had more success searching using the term "lambda" since a multi-statement lambda is a real anonymous function, as he agreed.

More links to previous discussions can be found in a post from Mike Miller. A thread he pointed to leads to a 2009 post from Alyssa Coghlan that "sums up the major problem" with any proposal, according to Eric V. Smith. Coghlan said:

However, for Python, the idea of having anonymous blocks runs headlong into the use of indentation for block delineation. You have to either come up with a non-indentation based syntax for the statements inside the anonymous blocks, essentially inventing a whole new language, or you have to embed significant whitespace inside an expression rather than only having it between statements.

Nobody has come up with a solution to that mismatch which is superior to just naming the block and using a normal def statement to create it.

For Smith, there is no point in looking further until that is resolved:

I know people will disagree, but for me, if the syntax problem were resolved, then I'd welcome "expressions that produce functions consisting of multiple statements", commonly called multi-line lambdas.

But without a solution to the syntax issue, the whole argument is not worth having.

D'Avella took a stab at some possible syntax for the feature, which, naturally, relied on delimiters around the anonymous block. He used do and end, but others have tried parentheses or brackets, all of which run aground because they do not look "Pythonic", at least to some. Meanwhile, Clint Hepner argued that some seem to simply want anonymous functions for Python because they are used frequently—and liked, at least by some—in JavaScript. D'Avella, playing the devil's advocate, wondered what the problem was with that line of thinking; multiple different Python features have come from elsewhere after all. But Brendan Barnwell said that those features came to Python, not simply because they existed elsewhere, but because they were useful in the language:

They were added because another language provided some inspiration for a feature to add to Python that might be helpful when writing code in Python. It's totally great to have interchange of ideas between different languages, but each language will choose the features that mesh well with its existing constructs.

In order to try to head off future proposals of this sort, Neil Girdhar suggested adding a page to the Python wiki that would simply describe the proposed change, its pros and cons, and have links to earlier discussions. Despite some skepticism that a page of that nature would actually help, he created a page for commonly suggested features and linked a new page for multi-line lambdas to it. Meanwhile, D'Avella's attempt to attract a core developer to sponsor a PEP that he would write ("with the expectation that it will be rejected") seems to have gone nowhere.

There were some examples of possible use cases for the feature, including one from Chris Angelico and another from "Voidstar", but Cornelius Krupp pointed out that, once again, it is not use cases that are lacking. The use cases need to be compelling enough to overcome the other, syntactic, objections. While D'Avella agreed with that, he noted that there were questions about the justification for the feature in the thread. Krupp said that the syntax was just the first hurdle: "Even if people find a good syntax there is still quite a bit of work to do in [convincing] the python community and the core devs that it is actually a good idea."

Python famously uses white space, rather than delimiters, to structure its code and it is hard to imagine that changing—ever. Python itself will happily agree:

    >>> from __future__ import braces
      File "<stdin>", line 1
    SyntaxError: not a chance

Meanwhile, the cost of a name, quite possibly in a local scope, is pretty low; the readability of the resulting code is likely to be better as well. There would certainly be value in having a rejected PEP to point to when the topic inevitably rears its head again, though it may be hard to motivate people to work on a "doomed" proposal. The topic will undoubtedly be raised again, though; as Krupp pointed out, it has come up at least three other times in the last few years.

It is, however, a persistent problem in the Python world; people show up with new ideas, or one that has been discussed time and time again, without really wanting to put in the work to see it to completion. Sometimes that work is in finding the earlier discussions and showing why the objections raised then are no longer valid—using examples of where existing code, perhaps from the Python standard library, could be improved. But as we have seen before, more than just once, people are often so enamored with their idea that they are surprised that it does not simply sweep all objections aside because of its clear superiority. In a long-established language or other project, however, ideas take a lot of work before they bear fruit; proponents would be well-advised to keep that in mind.

Comments (48 posted)

Page editor: Daroc Alden

Inside this week's LWN.net Weekly Edition

  • Briefs: Code execution on Pixel 8; Python security releases; Firefox 124; Flox 1.0; Quotes; ...
  • Announcements: Newsletters, conferences, security updates, patches, and more.
Next page: Brief items>>

Copyright © 2024, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds