Leading items

Welcome to the LWN.net Weekly Edition for April 11, 2024

This edition contains the following feature content:

The PostgreSQL community debates ALTER SYSTEM: an extensive debate over whether the PostgreSQL server should have an option to disable a powerful administrative command.
A focus on FOSS funding: a look at how sponsorship organizations are supporting development.
Continued attacks on HTTP/2: a new vulnerability exploits an old problem in the HTTP/2 protocol.
Diagnosing workqueues: tips for finding and addressing workqueue-related bottlenecks.
A look at the 2024 Debian Project Leader election: the two candidates seeking to lead the Debian project in the coming year.
Book review: Practical Julia: a longtime LWN contributor has written a book on the Julia language.
The first Linaro Forum for Arm Linux kernel topics: a summary from a gathering of Arm-focused kernel developers.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

The PostgreSQL community debates ALTER SYSTEM

By Jonathan Corbet
April 8, 2024

Sometimes the smallest patches create the biggest discussions. A case in point would be the process by which the PostgreSQL community — not a group normally prone to extended, strongly worded megathreads — resolved the question of whether to merge a brief patch adding a new configuration parameter. Sometimes, a proposal that looks like a security patch is not, in fact, intended to be a security patch, but getting that point across can be difficult.

The PostgreSQL server is a complex beast that can be extensively configured and tuned for the environment in which it runs. There are, naturally, many configuration parameters available for administrators to work with. There are also two ways to set those parameters. In most deployments, perhaps, administrators will edit the postgresql.conf file to configure the system as needed. It is, however, also possible to adjust parameters within a running database with the ALTER SYSTEM command. Any changes made that way will be saved to a separate postgresql.auto.conf file, which is also read by the server at startup; these changes are thus persistent.

Disabling `ALTER SYSTEM`

In early September 2023, Gabriele Bartolini raised the idea of allowing administrators to disable ALTER SYSTEM, either via a command-line parameter or a configuration option in its own right (which, using PostgreSQL jargon, he calls a "GUC", standing for "grand unified configuration" parameter).

The main reason is that this would help improve the "security by default" posture of Postgres in a Kubernetes/Cloud Native environment - and, in general, in any environment on VMs/bare metal behind a configuration management system in which changes should only be made in a declarative way and versioned like Ansible Tower, to cite one.

PostgreSQL core contributor Tom Lane quickly expressed his disagreement with this proposal: "I don't think we need random kluges added to the permissions system. I especially don't believe in kluges to the effect of 'superuser doesn't have all permissions anymore'." He suggested using event triggers to implement such a restriction locally if it is really wanted.

Alvaro Herrera pointed out that ALTER SYSTEM, as a system-wide command, does not invoke event triggers. He also more thoroughly explained the use case that was driving this request:

I've read that containers people consider services in a different light than how we've historically seen them; they say "cattle, not pets". This affects the way you think about these services. postgresql.conf (all the PG configuration, really) is just a derived file from an overall system description that lives outside the database server. You no longer feed your PG servers one by one, but rather they behave as a herd, and the herder is some container supervisor (whatever it's called).
Ensuring that the configuration state cannot change from within is important to maintain the integrity of the service. If the user wants to change things, the tools to do that are operated from outside.

Lane, though, described the feature as "false security", and the discussion wound down for a while. But the fundamental disconnect had already become apparent: the opposition to restricting ALTER SYSTEM was based on the idea that it was intended as a security feature. As such, it would be a failure, since there are many other ways for a PostgreSQL user with the ability to run ALTER SYSTEM to take control of the server. But, as Bartolini said, this restriction is meant as a usability feature, closing off a configuration mechanism that is not meant to be used in systems where declarative configuration systems are in charge.

The new year

Robert Haas restarted the conversation in January, acknowledging that the proposal is not meant as a security feature, but worrying that it would be seen that way anyway:

I have to admit that I'm a little afraid that people will mistake this for an actual security feature and file bug reports or CVEs about the superuser being able to circumvent these restrictions. If we add this, we had better make sure that the documentation is extremely clear about what we are guaranteeing, or more to the point about what we are not guaranteeing.

Even then, he worried, about "security researchers threatening to publish our evilness in the Register". Lane then declared that the project "should reject not only this proposal, but any future ones that intend to prevent superusers from doing things that superusers normally could do". Haas responded, though, that the original proposal might have merit, and should be taken seriously.

The conversation continued inconclusively; two months later, Haas complained that not much progress was being made. That notwithstanding, he said: "As far as I can see from reading the thread, most people agree that it's reasonable to have some way to disable ALTER SYSTEM". There were, though, six competing ways in which that objective could be accomplished. These included the command-line option and configuration parameter originally proposed, along with an event trigger, pushing it into an extension module, recognizing a sentinel file created by the administrator, or just changing the permissions on postgresql.auto.conf. He suggested that the configuration option and the sentinel file were the most viable options.

Lane answered that any such restriction, implemented by any of the above mechanisms, could still be bypassed by a hostile administrator. Haas replied that the proposal was not a security feature, but Lane dismissed it as "a loaded foot-gun painted in kid-friendly colors" that would lead to more bogus CVE numbers being filed against the project.

A new patch

On March 15, Jelte Fennema-Nio posted a patch implementing the restriction as a configuration parameter. It was, in the end, just an updated version of the patch posted by Bartolini in September with some documentation tweaks. Various comments resulted in a number of newer versions; the sixth of which came out a few days later. At this point, the patch consisted mostly of documentation, including this admonition:

Note that this setting cannot be regarded as a security feature. It only disables the ALTER SYSTEM command. It does not prevent a superuser from changing the configuration remotely using other means.

Haas welcomed this version and asked whether there was a consensus on proceeding with it. Lane seemingly had changed his view somewhat at this point, saying: "I never objected to the idea of being able to disable ALTER SYSTEM". Bruce Momjian, instead, worried that an administrator could enter an ALTER SYSTEM command disabling ALTER SYSTEM, at which point recovery could be difficult. In fact, as Fennema-Nio answered, that parameter cannot be set that way, so that particular trap does not exist.

Momjian also took issue with the name of the parameter (which was externally_managed_configuration), saying that it didn't really describe what was being restricted. He suggested sql_alter_system_vars as an alternative. Haas agreed with the complaint, but thought that allow_alter_system made more sense. That is, in the end, the name that was chosen for this option.

The discussion was not over yet, though. Lane wanted the server to ensure that postgresql.conf and postgresql.auto.conf were not writable by the postgres user if allow_alter_system is disabled. Otherwise, the database administrator would still be able to modify the configuration simply by editing one of the files. Fennema-Nio disagreed, saying that disabling ALTER SYSTEM was sufficient for the intended use case, but Lane said that the configuration parameter on its own is "a fig leaf that might fool incompetent auditors but no more". Fennema-Nio reminded Lane that the point of allow_alter_system is not security. Haas complained that: "I don't understand why people who hate the feature and hope it dies in a fire get to decide how it has to work." The file-permission check was, in the end, not added.

Even then, the discussion was not quite done; Momjian questioned merging this change so late in the PostgreSQL development cycle. "My point is that we are designing the user API in the last weeks of the commitfest, which usually ends badly for us". Fennema-Nio pointed out that the API was essentially unchanged from its initial, September form, and that months had been spent discussing alternatives. Haas said that such a small patch would not improve by being held up for another release cycle: "I think it has to be right to get this done while we're all thinking about it and the issue is fresh in everybody's mind."

On March 28, Momjian agreed that the patch could be merged. One day later, Haas did exactly that. And, with that, one of the longest-running development discussions in recent PostgreSQL history came to an end. The success of this effort is a testament to the persistence of a small number of developers who saw it through months of opposition and "helpful" implementation suggestions. Having decided that difficult issue, the project can turn its attention to the small list of simple topics to be resolved during the upcoming July commitfest.

Comments (11 posted)

A focus on FOSS funding

April 4, 2024

This article was contributed by Nathan Willis

SCALE

Among the numerous approaches to funding the development and advancement of open-source software, corporate sponsorship in the form of donations to umbrella organizations is perhaps the most visible. At SCALE21x in Pasadena, California, Duane O'Brien presented a slice of his recent research into the landscape of such sponsorship arrangements, with an overview of the identifiable trends of the past ten years and some initial insights he hopes are valuable for sponsors and community members alike.

O'Brien introduced the session as his personal analysis, noting that it stems from grant-funded research, and that he does not claim to be a data scientist or economist. Rather, he is an advisor who has built open source program offices (OSPOs) and spends a lot of time thinking about funding and sustainability. The research project is a study called Fostering Open Collaboration, which was funded by the Digital Infrastructure Fund Program. (That program has since been rebranded as the Digital Infrastructure Insights Fund.)

The problem

O'Brien proposed the research project for the 2020 grant class, with the expressed aim of providing cross-company visibility into sponsorship trends. This visibility is currently lacking, he said; increasing it can, in the near term, enable sponsors to coordinate their efforts more effectively by identifying under-funded organizations. In the longer term, he added, increased transparency may hold other benefits as well, such as enabling conversations on accountability.

Visibility, here, refers to the overall picture across all projects and all sponsors; the majority of FOSS projects publish some account of their own income sources — but that published data is not uniform. Essentially, any not-for-profit venture that hosts or otherwise supports the development of FOSS software can establish its own palette of "sponsorship levels" at whatever price points it decides. But to understand the overall contributions to those ventures by corporate or individual sponsors, someone would need to systematically scrape all of the "our sponsors" pages or other published data sources and collate that information. There are also regulatory filings, O'Brien noted, such as the IRS Form 990 that tax-exempt organizations in the US are required to complete, although the Form 990 numbers include other sources of income (such as merchandise or training) in addition to sponsorships.

Are a handful of sponsors underwriting more than their fair share of the non-profits — for any particular definition of "fair"? Are such sponsorships broadly the same across different industry segments? Are some sponsored organizations struggling more than others? These and a host of related questions are difficult to answer without collecting and collating the data, which is the principal task that O'Brien undertook.

The method

The process began, O'Brien said, with creating a list of organizations that receive funding to analyze. The initial set included umbrella-style hosting foundations, recurring FOSS events, and a few other program types, such as mentoring projects. For each sponsored party, he located all of the sponsorship-prospectus documents and regulatory filings available for the approximately ten-year time period studied (2013–2024; "approximately" here captures variability in fiscal years, events that were canceled due to the COVID-19 pandemic, and other minutiae).

For the hosting foundations, O'Brien sampled the sponsorship web pages of each organization through the Internet Archive Wayback Machine, using the captured page as close to July 1 of each year as possible. This date was chosen to be far enough into the year that most of the annual sponsorship budget decisions had been made. For events, the pages were sampled as close to that year's event date as possible.

The pages were scraped to record the identity of each sponsor, at each level, and the levels were matched to dollar figures where possible. It was on this point that O'Brien first raised the recurring theme that "every case is weird" in one fashion or another. Between the sponsored organizations analyzed, a wide variety of complicating factors were encountered, everything from sponsorship levels tagged with a dollar range rather than a fixed amount to sponsors who were difficult to identify based only on the logos recorded in the Wayback Machine snapshots.

The actual data collected through this process is available in a public GitHub repository, although O'Brien cautioned interested parties to note that he could not identify an applicable license to use for the collection, given that the data — although accumulated entirely through public records — originated from a wide variety of unrelated sources.

The numbers and the weirdness

O'Brien presented data for four foundations: the Apache Software Foundation (ASF), the Linux Foundation (LF), the Open Source Initiative (OSI), the Python Software Foundation (PSF), plus two events: Open Source Summit North America (OSSNA) and SCALE. He noted that data had also been collected for a few more foundations (including Numfocus and the Software Freedom Conservancy), events (including All Things Open and the Seattle GNU/Linux Conference), and the Outreachy program.

He related the data collected for each of the example organizations, primarily the sponsorship dollar amounts estimated for the ten-year period of the study, the sponsorship dollar amounts for 2023 in isolation, and whatever total revenue numbers were reported from other sources. He also provided a list of the top ten sponsors for each one and noted where data was incomplete or extrapolation had been necessary.

Perhaps most interestingly, he ended each overview with some comments explaining "why it's weird" — and, as it turns out, weird is common in FOSS. The raw numbers themselves are all provided in O'Brien's talk slides, although they might not elicit much of a reaction from those already involved in the FOSS community. The LF, for example, drew in substantially higher corporate sponsorship totals for 2023 ($23.9 million) than the ASF ($2.1 million), PSF ($1.6 million), and OSI ($366,000).

The weirdness begins when those numbers are examined more closely. O'Brien noted that the ten-year total for the LF was difficult to estimate for several reasons. The LF is host to a variety of sub-project organizations with their own sponsorship levels, and it is not always clear when those sub-project totals are or are not included in the overall numbers in each data source. The LF's list of sponsors is also considerably longer than the others, with more than 1,000 Silver sponsors, which made for such a densely packed sponsors page that it consistently triggered the Wayback Machine's rate-limiting threshold and forced a break in the analysis.

The numbers are more difficult to quantify for the LF because it is a 501(c)(6) organization, which is permitted to conduct "more transactional" sponsorship arrangements than 501(c)(3) charitable organizations. This would include arrangements in which (for example) training discounts or event tickets could make up part of the sponsorship package. The PSF also utilized several "in kind" sponsorship arrangements that are difficult to quantify in equivalent dollar amounts, he added, such as content delivery network (CDN) infrastructure from Fastly. The PSF had also permitted several joint-sponsorship arrangements over the years, in which a company was listed both as a PSF sponsor and a sponsor for the PyCon event. How those joint sponsorships are divided between the two revenue totals is unknown.

The LF and the OSI also differed from standard practice by offering a range of prices for particular sponsorship levels. The LF's Silver-level sponsorship is determined by the number of employees the member has, starting at $5,000 for organizations with fewer than 100 employees, and up to $20,000 for organizations with more than 5,000 employees. In some cases it was possible to determine the size of a sponsor from other data sources, such as Crunchbase, O'Brien said, but in many other cases that was impossible. More difficult to assess still was the OSI's approach, which (in some years) had offered sponsors a range of self-selected suggested prices intended to scale up with the sponsor's annual revenue. Though the annual revenue data could be estimated for some sponsors, he observed, what each sponsor chose to donate could not.

Between the two events explored in the talk, O'Brien noted that OSSNA was estimated to operate on a ten-year budget of $8.2 million (i.e., an annual average of $820,000), while SCALE operated on a ten-year budget of $1.7 million. SCALE, he noted, shared many of its top ten sponsors with OSSNA, but received about one-tenth as much in sponsorship dollars from those sponsors.

He also observed that what the Linux Foundation takes in for OSSNA sponsorship is a small portion of what the foundation takes in overall. This was noteworthy, he said, because one of the most common criticisms he hears about the LF in FOSS-funding discussion is the size of OSSNA and other large-scale LF events.

The trends

O'Brien then offered some analysis he hoped could serve as "actionable recommendations" for organizations seeking sponsors and OSPOs (or others involved in sponsoring FOSS organizations).

Head-to-head comparisons of the sponsored organizations are a tricky endeavor, he reminded the audience, due to the differences between organizations and the weirdness discussed earlier. However, he suggested that all companies that are interested in sponsoring periodically sit down together and assess their sponsorship arrangements with regard to whether they are covering the work that they believe to be important. For example, he cited the current popularity of Artificial Intelligence (AI) topics in FOSS and noted that OSI is one of the only FOSS organizations directly investigating practical questions about AI's interaction with FOSS licensing and governance. If sponsoring companies consider that topic important, they should discuss how their various sponsorships could affect it. Such cross-corporation collaboration is not the norm, perhaps, but O'Brien advocates for it nonetheless.

Where the question of who funds FOSS organizations is concerned, he noted that there is considerable overlap to be observed atop most of the sponsor lists each year, with a familiar palette of companies (such as well-known software vendors and service providers) in somewhat-similar permutations for most foundations — but not all. LF's top-sponsors list has the least in common with the other sponsored organizations. In the top ten, half of the companies — Fujitsu, Qualcomm, NEC, Samsung, and Hitachi — did not appear at all on the other top-sponsor lists.

One potentially troubling trend that O'Brien saw in the data is that, while overall sponsorship price levels have increased over the decade studied, the sponsorship levels used by events have remained relatively flat. That puts direct pressure on the event organizers; he shared a series of graphs illustrating that overall inflation, wages, housing, food prices, and other expenses have steadily increased in the past decade. If FOSS event teams cannot keep up with the rising costs, their events may suffer.

Here, again, O'Brien suggested that OSPOs and sponsoring companies take a deliberate look at their sponsorship choices' impact, periodically "spreading all of the sponsorships out on the table" to check that they remain effective and aligned with the goals. Similarly, he noted that event sponsorships may come through a different office inside the company than do foundation sponsorships; comparing all of a company's sponsorship commitments on an equal footing might enable the company to more effectively balance how its funding affects the sponsored parties of interest.

He also offered some advice to organizations seeking sponsorship. There is not a secret answer to the question "how do I ask for money?" but there are things that they can do to be more effective. He recommended being transparent and up-front about rising costs, and to communicate that to sponsors when adjusting sponsorship levels. He also advised them to simplify the sponsorship options offered to companies: too many options or permutations can create confusion or slow down the negotiating process. Finally, he advised organizations to resist the temptation to make "reactive changes." Sponsored organizations should be careful who they compare themselves to, and they should avoid making a change in their own policies or arrangements merely because another FOSS organization made a change.

The future

O'Brien ended the session with some prospects for future research and transparency about corporate funding of FOSS organizations. In particular, he highlighted julia ferrioli's ongoing work conducting video interviews of sponsorship decision-makers. He said ferrioli has conducted six interviews thus far. She has targeted corporations with 200 or more employees, and has asked them open-ended questions about sponsorship-funding decisions. The early interviews, according to O'Brien have revealed some arguably non-obvious motivations at work: risk-reduction, for example, was the most-cited reason for sponsorship of FOSS foundations. Benefits to the corporate brand came in second, and the ability to influence the direction of the technology, which he expected to be the most important reason, was third.

Further research is forthcoming on the motivations and decision-making process, with the likely release of a white paper and perhaps another formal research proposal. In the meantime, O'Brien encouraged all parties interested in a more transparent discussion about corporate sponsorship to look up Shane Cucuru's FOSS Foundations Directory, an extensive information resource for sponsors and beneficiaries alike. O'Brien also facilitates a working group named FOSS Funders that works to increase support of FOSS development through other means.

Where funding of free software and open-source organizations is concerned, it is perhaps tempting to notice the logos of well-known big tech vendors and presume that their massive financial power keeps the community on a stable footing more-or-less automatically. O'Brien's research illustrates how even a direct assessment of the status quo quickly runs into complications and raises nebulous or even unanswerable questions. Transparency in such matters is not trivial, but the open discussions that O'Brien advocates are doubtless a step in the right direction.

Comments (5 posted)

Continued attacks on HTTP/2

By Daroc Alden
April 10, 2024

On April 3 security researcher Bartek Nowotarski published the details of a new denial-of-service (DoS) attack, called a "continuation flood", against many HTTP/2-capable web servers. While the attack is not terribly complex, it affects many independent implementations of the HTTP/2 protocol, even though multiple similar vulnerabilities over the years have given implementers plenty of warning.

The attack itself involves sending an unending stream of HTTP headers to the target server. This is nothing new — the Slowloris attack against web servers using HTTP/1.1 from 2009 worked in the same way. In Slowloris, the attacker makes many simultaneous requests to a web server. Each request has an unending stream of headers, so that the request never completes and continues tying up the server's resources. The trick is to make these requests extremely slowly, so that the attacker has to send relatively little traffic to keep all the requests alive.

In the wake of the Slowloris attack, most web servers were updated to place limits on the number of simultaneous connections from a single IP address, the overall size of headers, and on how long the software would wait for request headers to complete before dropping the connection. In some web servers, however, these limits were not carried forward to HTTP/2.

In 2019, there were eight CVEs reported for vulnerabilities exploitable in similar ways. These vulnerabilities share two characteristics — they involve the attacker doing unusual things that are not explicitly forbidden by the HTTP/2 specification, and they affect a wide variety of different servers. Web servers frequently have to tolerate clients that misbehave in a variety of ways, but the fact that these vulnerabilities went so long before being reported is perhaps an indication that there are few truly broken clients in use.

Two of these vulnerabilities in particular, CVE-2019-9516 and CVE-2019-9518, can involve sending streams of empty headers, which take a disproportional amount of CPU and memory for the receiving server to process compared to the effort required to generate them. Nowotarski's attack seems like an obvious variation — sending a stream of headers with actual content in them. The attack is perhaps less obvious than it seems, given that it took five years for anyone to notice the possibility.

Continuation flooding

HTTP/2 is a binary protocol that divides communications between the client and the server into frames. Headers are carried in two kinds of frame: an initial HEADERS frame, followed by some number of CONTINUATION frames. The continuation flood attack involves sending a never-ending string of continuation frames, with random header values packed inside them. Because they are random, these headers are certainly not meaningful to the receiving server. Despite this, some servers still allocate space for them, slowly filling the server's available memory. Even servers which do place a limit on the size of headers they will accept usually choose a large limit, making it relatively straightforward to consume their memory using multiple connections.

Another wrinkle is that HTTP/2 requests are not considered complete until the last continuation frame is received. Several servers that are vulnerable to this attack don't log requests — failed or otherwise — until they are complete, meaning that the server can die before any indication of what happened makes it into the logs. The fact that continuation flooding requires only one request also means that traditional abuse-prevention tools, which rely on noticing a large number of connections or traffic from one source, are unlikely to automatically detect the attack.

Nowotarski listed eleven servers that are confirmed to be vulnerable to the attack, including the Apache HTTP Server, Node.js, and the server from the Go standard library. The same announcement stated that other popular servers, including NGINX and HAProxy, were not affected.

It is tempting to say that all these attacks — the eight from 2019, and now continuation flooding — are possible because HTTP/2 is a complex, binary protocol. It is true that HTTP/2 is substantially more complicated than HTTP/1.1, but every version of HTTP had had its share of vulnerabilities. The unfortunate truth is that implementing any protocol with as many divergent use cases as HTTP is difficult — especially when context is lost between designers and implementers.

The designers of HTTP/2 were well aware of the potential danger of DoS attacks. In July 2014, Roberto Peon sent a message to the ietf-http-wg mailing list talking about the potential for headers to be used in an attack:

There are three modes of DoS attack using headers:
1) Stalling a connection by never finishing the sending of a full set of headers.
2) Resource exhaustion of CPU.
3) Resource exhaustion of memory.

[...]

I think #3 is the interesting attack vector.

The HTTP/2 standard does not set a limit on the size of headers, but it does permit servers to set their own limits: "A server that receives a larger header block than it is willing to handle can send an HTTP 431 (Request Header Fields Too Large) status code." Yet despite this awareness on the part of the protocol designers, many implementers had not chosen to include such a limit.

In this case, fixing the vulnerability is relatively straightforward. For example nghttp2, the HTTP/2 library used by the Apache HTTP Server and Node.js, imposed a maximum of eight continuation frames on any one request. However, this vulnerability still raises questions about the security and robustness of the web-server software we rely on.

HTTP/2 is a critical piece of the internet. It accounts for somewhere between 35.5% and 64% of web sites, depending on how the measurement is conducted. There are several tools to help implementers produce correct clients and servers. There is a publicly available conformance testing tool — h2spec — to supplement each individual project's unit and integration tests. nghttp2 ships its own load-testing tool, and Google's OSS-fuzz provides fuzz testing for several servers. These tools hardly seem sufficient, however, in light of the ongoing discovery of vulnerabilities based on slight deviations from the protocol.

The continuation flood attack is not particularly dangerous or difficult to fix, but the fact that it affects so many independent implementations nearly nine years after the introduction of HTTP/2 is a stark wakeup call. Hopefully we will see not only fixes for continuation flooding, but also increased attention on web server reliability, and the tests to ensure the next issue of this kind does not catch us by surprise.

Comments (75 posted)

Diagnosing workqueues

By Daroc Alden
April 9, 2024

SCALE

There are many mechanisms for deferred work in the Linux kernel. One of them, workqueues, has seen increasing use as part of the move away from software interrupts. Alison Chaiken gave a talk at SCALE about how they compare to software interrupts, the new challenges they pose for system administrators, and what tools are available to kernel developers wishing to diagnose problems with workqueues as they become increasingly prevalent.

Background on software interrupts

Software interrupts are a mechanism that allows Linux to split the work done by interrupt handlers into two parts. The interrupt handler invoked by the hardware does the minimum amount of work, and then raises a software interrupt for the kernel to run later that does the actual work. This can reduce the amount of time spent in the registered interrupt handler, which ensures that interrupts get serviced efficiently.

Chaiken explained that when a hardware interrupt raises a software interrupt, there are two possible cases. When no software interrupt is already running on the CPU, the new software interrupt can start running immediately. When a software interrupt is already running on the CPU, the new interrupt is enqueued to be handled later — even if the new interrupt would actually be higher priority than the currently running one. There are ten different kinds of software interrupt, and each kind has a specific priority. Chaiken showed a list of these priorities, and remarked that even without knowing anything else about the design of software interrupts, seeing network interrupts listed above timer interrupts might make people "feel some foreboding".

These priority inversions are a problem on their own, because they contribute to latency and jitter for high-priority tasks, but the priority system also introduces other problems. The lowest priority interrupts are part of the kernel's read-copy-update (RCU) system. Chaiken called the RCU system "basically the kernel's garbage collector". This means that not servicing interrupts fast enough can actually cause the kernel to run out of memory.

On the other hand, servicing software interrupts too much can disrupt latency-sensitive operations such as audio processing — a common issue for kernel maintainers is a software interrupt that runs too long and refuses to yield, effectively tying up a core.

To balance these two problems, there are two heuristic limits used to balance latency against fairness. MAX_SOFTIRQ_TIME is the maximum time that a software interrupt is allowed to run; it is set to 2ms. MAX_SOFTIRQ_RESTART is the maximum number of times that a software interrupt that is itself interrupted by something else will be restarted; it is set to ten attempts. Unfortunately, these parameters are hard-coded and built into the kernel. They were supposedly set to good values via experimentation, but Linux runs on so many different kinds of device that no setting could be optimal for all of them. "No one has the nerve to change them", she said, which is "not a great situation". She summed up the problems with software interrupts by saying that they "are not the most beloved feature of the kernel" and that there have already been several attempts to get rid of them across many versions of the kernel.

But progress removing software interrupts is slow. Despite those efforts, there are still 250 call sites of local_bh_disable() — a function which Chaiken called "the villain of this part of the talk". local_bh_disable() prevents software interrupts from being run on a particular CPU. In practice, however, it functions as a lock to protect data structures from being concurrently accessed by software-interrupt handlers. One audience member asked which resources were guarded by the bottom half lock. Chaiken responded that "no one actually knows" because the calls are spread throughout the kernel.

Even worse, software interrupts are largely opaque, because they run in an interrupt context — just like hardware interrupts do. They don't have access to many kernel facilities — such as debug logging. "You can't be printing from interrupt handlers". There are a few ways to get visibility, but they're cumbersome compared to the functionality available to the rest of the kernel.

Even though software interrupts are difficult to work with, there are some observability tools. Chaiken did a demo on her laptop — "On which I am running a kernel which no sane person would use on a computer used for a presentation" — showing how to use the stackcount program to get stack traces for all the software interrupts currently running.

Increasingly, there has been a push to move some of the work done by software interrupts to the workqueue mechanism, which Chaiken called "just an all-around better design".

Workqueues

Workqueues have existed in the kernel for a long time, but they have recently seen a lot of new functionality added. "The hardest part of this presentation has been that workqueues have changed so much in the last 18 months I've had trouble keeping up".

Workqueues are a generic way for drivers and other kernel components to schedule delayed work. Each workqueue is — theoretically — associated with a single component, which can add whatever work to the queue it likes. In actuality, a lot of the kernel uses shared workqueues that are not specific to a component. Each workqueue is also associated with a pool of kernel theads that will service tasks from that queue.

By default, Linux creates two worker pools per CPU, one normal priority and one high priority. These pools contain dedicated workers, which the kernel will spawn more of or remove as required. The fact that these pools are adjusted automatically also means that an administrator who runs into a problem with a misbehaving workqueue item cannot solve the problem by changing the priority of the worker, or pinning it to a separate core. As more functionality gets moved over to workqueues, problems and bug reports will undoubtedly start becoming more common.

The proper way to change what happens with items in workqueues is to use the "workqueue API that manages work" as opposed to managing the workers directly. Chaiken showed a demonstration of how this could be done. She picked out a workqueue and showed that it was running on a particular pool that was also servicing many other workqueues. Then she changed the priority of the workqueue itself, and showed that this had caused the workqueue to change to a different worker pool — one that matched its new attributes. In response to an audience question, she clarified that "the kernel will just create new work pools, if there is no work pool that matches a work queue."

"Treatment of affinity of workqueues has really improved in recent kernels", she remarked. Since pinning individual workers to CPU cores is not possible, recent kernels allow the user to change the CPU affinity of the workqueues themselves. The addition of features like this mean that workqueues in general have gotten substantially more useful over the last 18 months, which Chaiken called a "march of progress".

She also showed a demonstration of the much more flexible tracing and debugging capabilities available with workqueues. She used the LGPL-licensed drgn debugger with a set of workqueue-specific debugging scripts from the kernel. wq_dump.py shows the current workqueue configuration, including which worker pools exist and how they are arranged between cores. wq_monitor.py shows the behavior of workqueues in real time, which can be helpful for diagnosing problems with how work is scheduled.

Workqueues also show up under the sysfs filesystem in /sys/devices/virtual/workqueue, which can be a quick way to get information on a workqueue without breaking out a debugger. Only workqueues configured with the WQ_SYSFS flag appear there, so Chaiken noted that "if a workqueue is giving you heartburn, one of the things you can do is make a tiny kernel patch" to enable the flag.

Finally, workqueue workers run in process context instead of interrupt context — meaning that many of the kernel's normal debugging facilities, such as trace points, debug logs, etc., are available when an item from a workqueue is being processed.

In the Q&A after the talk, one audience member asked what resources they could use to learn more about workqueues. Chaiken responded that "the documentation for workqueues is excellent". "You can learn a lot by just reading the kernel's entry documentation, and using these tools." She also provided a link to her slides, which themselves contain many links to the resources she referenced while putting together the talk.

Another audience member asked whether there were existing tools that could migrate work between pools based on observed latency. Chaiken responded that "a lot of this stuff is so new that people haven't really grokked it yet", but also warned that anyone creating a tool like that would "really need [...] tests which characterize your workload and its performance".

Readers who wish to dive into more of the details can find a recording of Chaiken's talk here. Her talk left me with the impression that workqueues promise to be easier to manage and debug than software interrupts. Despite these benefits, there are downsides to workqueues — such as increased latency — that are hard to mitigate. It will be a long time before software interrupts can be completely eliminated, and switching — when so many different parts of the kernel use software interrupts — will certainly be painful. Kernel developers and system administrators alike will require a good working knowledge of workqueues, but that knowledge is readily available in the form of documentation and new tools.

Comments (3 posted)

A look at the 2024 Debian Project Leader election

By Joe Brockmeier
April 5, 2024

The nominations have closed and campaigning is underway to see who will be the next Debian Project Leader (DPL). This year, two candidates are campaigning for the position Jonathan Carter has held for four eventful years: Sruthi Chandran and Andreas Tille. Topics that have emerged so far include how the prospective DPLs would spend project money, their opinions on handling controversial topics, and project diversity.

The DPL role

The project leader position is defined by Debian's Constitution, and has a one-year term. The DPL is elected by members of the Debian Project, the Debian Developers. The DPL has duties in two broad categories, external and internal duties. The external duties can include attending events and giving talks about Debian, as well as managing relationships with other projects. The DPL's internal duties include coordinating and communicating within the project, and appointing delegates to the various committees, including the Debian Technical Committee, Debian Publicity Team, Debian System Administration (DSA) team, and the Treasurer team, among others.

The DPL is empowered to make decisions that require "urgent action" and those decisions "for whom [no-one] else has responsibility". The DPL is also charged with making decisions about "property held in trust" for the project (such as hardware, or money), and can decide to authorize new "trusted organizations" to hold Debian assets, or to remove organizations from the list of trusted organizations.

The project lead co-appoints a new Project Secretary with the current secretary. If they cannot agree on a delegate for this position, then it is put to a vote by the Debian Developers. The Project Secretary is responsible for, among other things, managing project elections. The current secretary, Kurt Roeckx, has held the position since 2009 and was re-appointed to another term in February.

The Carter years

Carter, the current DPL, has held the position since April 2020 and is the first to hold the position for four consecutive terms. Last year, Carter ran unopposed. This year, Carter did not stand for election, but he posted a lengthy overview of his terms in his final "Bits from the DPL". He covered topics like the things that were accomplished during those terms, and things he felt could have gone better or still need to be done. It provides a great deal of insight for those who would hold the DPL role and for those who need to evaluate candidates.

Communication is at the top of Carter's list of things that could have gone better:

With every task, crisis or deadline that appears, I think that once this is over, I'll have some more breathing space to get back to non-urgent, but important tasks. "Bits from the DPL" was something I really wanted to get right this last term, and clearly failed spectacularly. I have two long Bits from the DPL drafts that I never finished, I tend to have prioritised problems of the day over communication.

His tenure as DPL had plenty of crises and deadlines. Carter's first term began in April 2020, just as COVID-19 began to spread globally and forced the project to hold DebConf20 as a virtual event. The project released Debian 11 and Debian 12 on his watch. He led the project during an episode of attacks on the Debian community by a former Debian Developer that began during Sam Hartman's term and continued into Carter's. But the most difficult period, said Carter, was the loss of Abraham Raji, who passed away during a kayaking trip during DebConf23. "There's really not anything anyone could've done to predict or stop it, but it was devastating to many of us, especially the people closest to him."

Carter said his number-one goal for his last term, which carried over from previous terms but failed to materialize, was for Debian to become a "standalone entity". Currently Debian is affiliated with Software in the Public Interest (SPI), a 501(c)(3) non-profit incorporated in the United States. In addition to SPI, Debian takes donations via Debian France (a French non-profit organization) and debian.ch (a Swiss non-profit). Carter included this in his 2022 campaign platform, citing "difficulties in setting up agreements with external entities, and creating problems in terms of personal legal liability within the project" as reasons Debian needs to have its own legal entity. Carter said it was "something that we need to seriously address together as a project and make a decision based on its merits", but it remains unaddressed.

The DPL winds up having a hand in many project initiatives by encouraging others to do the work, and delegating the authority to do so. In his final "Bits from the DPL," Carter recounted several initiatives that he helped along in this way, including founding the DebianNet Team to provide hosting services to Debian developers, nudging Steve McIntyre to propose the successful non-free firmware general-resolution, and encouraging the creation of the Debian Reimbursements system.

The candidates

Roeckx's call for DPL nominations went out on March 8. Candidates self-nominate for DPL and provide a platform with a biography and goals for voters to consider ahead of the campaign period. In addition to the two candidates who have chosen to run this year, Debian Developers always have a third option as mandated by the Debian constitution: none of the above. The project uses a variation of the Condorcet method for its general resolutions and elections, where voters rank the options instead of simply choosing one. If Debian voters rank "none of the above" over the two candidates, then the election process is started again and run until a winning candidate is selected.

Chandran had run for DPL previously in 2021, and her 2024 platform was updated from that year's platform. She described herself as "a librarian turned Free Software enthusiast and Debian Developer from India". She has worked on Ruby, JavaScript, Go, and font packages for Debian since 2016, though she mentioned that she is not very active at packaging these days. Chandran highlighted that she is a member of several teams and was chief organizer of DebConf23 in India.

Why is Chandran running? She wrote that she is concerned about "skewed gender ratios within the Free Software community (and Debian)" and is doing "whatever I can to better the situation". It may be worth noting that, if elected, Chandran would be the first woman to be DPL—a position that has existed since 1993:

I am aware that Debian is doing things to increase diversity within Debian, but as we can see, it is not sufficient. I am sad that there are only two women Debian Developer[s] from a large country like India. I believe diversity is not something to be discussed only within Debian-women or Debian-diversity. It should come up for discussion in each and every aspect of the project.

Diversity is a cornerstone of Chandran's platform. She stated that Debian spends "a good amount of money on diversity" but without achieving results. Therefore, her first task as DPL would be "to revisit the existing spending pattern to analyse why and where we are going wrong". She would "streamline" the Diversity Team's activities and appoint a delegated team to coordinate all diversity activities within Debian and help make decisions about related spending.

Chandran would also like to focus on outreach as DPL. Debian participates in Google Summer of Code (GSoC) and Outreachy, but she would like to see additional activities, such as a "Debian camp" similar to Free Software Camp, and review the efficacy of participating in GSoC and Outreachy.

She agreed that it may be time for Debian to become its own registered organization or foundation:

While organising DebConf23, I had to face some issues because Debian is not a registered organisation, That is when I started thinking about this concept seriously.

So, as a DPL, I would be definitely interested in exploring the possibilities, advantages and disadvantages of having Debian registered. I am not saying that this is my main agenda, but it will definitely be brought up if I am elected.

Tille wrote in his platform that he has been involved with Debian for more than 25 years, but this is his first run for DPL. Tille has a background as a physicist, which has given him "a keen interest in practical applications of IT solutions in science". He wrote that his primary involvement with Debian has been as a packager, and he is running because he feels "compelled to give back more to my friends and the community".

Keeping Debian "relevant in a changing OS ecosystem" is at the top of Tille's agenda. He wrote that Debian is a "victim of its own success" as "the most frequently derived distribution". If elected, he would like to work on making Debian more widely known by users who "do not consider themselves Linux experts" and try to learn from other Linux distributions to improve Debian. "Maybe we will be able to draw some conclusions, for instance, why ArchWiki is famous for good documentation but wiki.debian.org is not." He would like to encourage better packaging practices and to help address Debian's "smelly packages" (packages that need to be updated to meet newer Debian standards). The Debian Trends page has information about packages in need of refresh and lists of packages with the issues that need to be addressed.

Tille also emphasized outreach, diversity, inclusivity, and a need to foster "a friendly environment inside Debian" in his platform. He cites success in attracting contributors to Debian Med, a project to create a Debian Pure Blend tailored for "all tasks in medical care and research" with software for "medical imaging, bioinformatics, clinic IT infrastructure" and more. His platform includes ideas about lowering the barriers to contribution by "introducing tasks such as bug squashing, autopkgtest writing, and other short-term assignments that require minimal time commitments".

His platform includes an emphasis on shared work on packaging and improving the process of integrating new packages. In particular, Tille wants to see a Debian where "every crucial task" is handled by at least two people to "ensure comprehensive backup and support". Those who prefer a single-maintainer model, he wrote, "should probably rank me below 'None of the above'".

On controversial topics

It is customary for the prospective DPL candidates to take questions on the debian-vote mailing list during the campaign period. Thomas Koch jumped in on March 10, before the official start of the campaign period, and led with an observation that "more and more areas of our lives become political and controversies on such topics [become] more aggressive." He then asked, how would the candidates "try to lead a community that focuses on producing a great distribution without getting divided on controversial topics?"

Tille held his response until the official start of the campaign period and provided a two-part response. Tille wrote that if Koch meant political controversies, "I have a clear statement: Make sure off-topic messages will be reduced to a bare minimum on Debian channels". He suggested a maximum of one, clearly marked, off-topic message that invites discussion elsewhere. Controversial technical topics, he wrote, are "no problem as long as participants of the discussion are following our Code of Conduct".

Politics, wrote Chandran are in "every aspect of our life", including Debian; "using or contributing to Debian itself is a political statement. I do not consider Debian to be 'just' a technical project, it has its social and political aspects too". Like Tille, she said there will be technical disagreements, which is fine as long as the discussions are constructive and do not violate the code of conduct.

Debian as an organization

Nilesh Patra wanted to know what the candidates plans were for managing Debian's finances and accounting. Patra wrote, "the finances in the project do not have a lot of transparency" though there are occasional updates on debian-private and via DPL talks. The candidates' platforms, complained Patra, "have only a (very) vague idea about it and I'd like to know more specifics about it" and if the candidates had ideas about where the money would be best spent.

Indeed, tracking Debian's spending is not as easy as one might hope. One might expect a project like Debian to have an annual budget with projected spending, estimated donations, and all of that to be tracked publicly. However, this is not the case. The bulk of Debian's finances are held by SPI, and Debian's spending via SPI is found in the SPI treasurer reports rather than on Debian's site. The most recent SPI report is from November 2023, and found here. According to that report, Debian held more than $649,000 in reserve.

Chandran said that "deciding in advance where to spend and where not to spend money in advance is not a great idea in our context", because Debian does not have a fixed budget. Her only plan right now, she wrote, is "to revisit the diversity budget and how to increase the efficiency" of spending on diversity. If elected, she wrote that she would spend time to evaluate whether a better system could replace the "delayed, manual and tedious accounting process" the project has today.

To this, Carter replied that "accounting processes have definitely been one of the stumbling blocks", but pointed to the new reimbursement system as a major improvement:

It's still under development, but it's shaping up nicely, so I think in the future, the financial administration will be far less of a burden to the DPL than it has been for years already.

Tille admitted that his understanding is "currently low and incomplete". However, he wrote that he would "love to be transparent about money" and is open to help on that front should he become DPL. He said he did not know how to "measure 'best' objectively" but listed events like DebConf, bug-squashing parties, and team sprints as important, as well as infrastructure hardware for the project. People were not, he said, donating money to Debian for the money to sit unused in a bank account and the project should consider new ways to use its money, including paying people to do Debian work:

Personally, I'm open to discussing whether to compensate contributors for important tasks that either nobody wants to do or lacks people with sufficient time capacity to undertake those tasks. I recall the various pros and cons raised during past discussions on this matter, but if people believe it's time to initiate a fresh discussion, I'm very receptive to that.

Joost van Baal-Ilić asked a related question: what do the candidates think about having a single legal entity to represent Debian worldwide? Pierre-Elliott Bécue, the treasurer of Debian France, followed that question with more detailed inquiries about statements in Chandran's platform.

In her platform, Chandran wrote that she would like to revisit the relationship with Debian's trusted organizations (TOs) that hold its funds and to explore having more TOs instead of a "dependency on one or two", which she identified as a problem while organizing DebConf23. Bécue wrote that he had a "certain memory" of a TO that "disappeared with Debian assets", and observed that it is already difficult managing three TOs. He wondered how more would be an improvement.

Chandran said that she was aware of the TO's disappearance and that "having TOs with just 1-2 people responsible is a warning sign". She wrote that if more TOs were appointed, "it would be ensured that there is a team of people and a good governing structure before committing," and agreements with TOs that "show signs of collapse" would be revoked. Regular reporting, she said, would be a requirement for any new TOs.

Bécue had complained that SPI held 90% of Debian's assets and that he spends "more than 30 to 50%" of his time as treasurer dealing with SPI because it is "very slow to process things". What, he wanted to know, "do we inten[d] to do about it?"

She responded that having the bulk of Debian assets in a single TO "is like putting all the eggs in one basket" and that her platform suggestion of more TOs would balance that. "I know this would be a herculean task, but I would like to at least get it started."

Tille did not respond to Bécue's questions, but did address the idea of Debian becoming a legal entity. He suggested that if a person or persons felt strongly that Debian should have its own legal entity, they should take the lead:

We are a Do-o-cracy. The person who does the job can decide what gets done. Those who really strongly believe that a legal entity is the answer to major problems in Debian might run this effort, find consensus to run a GR changing the [constitution] - whatever seems to be necessary. If we do not find competent volunteers this will not happen.

Personally I decided to become a [physicist] and not a lawyer since I consider the laws of physics simple, easy to describe and perfectly able to verify in practice. This is all very distinct to the laws we have given [ourselves] in society and I'm no expert in the latter. Thus I simply feel not comfortable in giving statements about things I do not full understand.

Instead, Tille wrote, he would like to focus on technical problems he sees and that he understands. His time to devote to being DPL is limited, he noted, so he would decide to focus on areas where he feels competent and more efficient. "I will not stop others solving additional problems and if those people manage to convince me that it is important for Debian I might support this."

Hardware and cloud

Thomas Goirand wanted to know if the candidates would consider, for example, spending $100,000 on "a new Debian cloud". In addition, he asked about spending a similar amount to provide more build servers and systems for reproducible builds.

Carter replied that he didn't want to take attention away from the candidates, but noted that the DSA team had recently filed a request for up to $160,000 for upgrades. He said that "every single hardware request over the last 4 years (whether from DSA or from a DD) has been approved".

Chandran expressed concern about having enough volunteers to take on maintenance of services like Goirand's cloud. "If we do not have a enough volunteers to handle them, it will result in burnout and eventually the services die." She suggested taking up the topic after the elections to evaluate pros and cons before making a commitment. She was more favorable toward spending money on hardware for existing teams and services, but suggested deferring to a discussion with DSA before making decisions.

For his part, Tille said that he had no use case for a Debian cloud but is "perfectly open" to a discussion but it would need to have a "real team" and not a one-person team to care for it. He said that he would be happy to spend money on hardware infrastructure, as long as there are people to "do the actual grunt work of buying, installing and maintaining the hardware". Paying a cloud provider directly for some services instead of trying to build a Debian cloud might be an answer to Tille's personnel requirements, although he said it would need to be discussed and what could be delegated to cloud providers and what needed to be hosted by Debian.

Bandwidth challenges

Patra had a second question for the candidates, this time about addressing bandwidth challenges. Patra observed that teams in Debian "struggle with limited developer time" and that many teams have as few as three or four people sharing the burden, in some cases only one person. This "can lead to exhaustion, burnouts" and can lead to stale packages and other work stagnating when people become busy with real life. Did the candidates have a strategy for addressing this?

Tille responded that he considered this "a crucial problem" and one of the tasks of the DPL to identify areas where work is not sustainable. Step zero, he said, was for one-person teams to admit there is a problem. He pointed out that this does not always work. He cited an example of asking for help with R packaging, but the only response was "two further confirmations of time constraints". But the first step had to be admitting there is a problem, and advertising it:

In general I believe that a DPL is limited in effectiveness if people don't [do] that step zero. It seems that within Debian, there are individuals with exceptional technical skills who may also experience a syndrome where they feel they are the sole individuals capable to do certain tasks. This might make step one even harder: Document what you are doing, seeking actively for more team members and teach them kindly.

This step is time-consuming, especially for individuals with significant time constraints. Investing time without a clear vision of success poses a challenge - ensuring that the new team member can effectively handle the pending tasks while also committing to the role for a long time to make it really sustainable.

He added that he had "no good idea" how to fully solve this problem within a volunteer organization like Debian. Tille did raise the idea of paying people from Debian funds to help take on important work, but said it would be better "if we could convince companies to pay Debian developers and permit them to use their [payed] time to spent on Debian tasks than paying single persons from Debian funds". Chandran has not yet responded to the bandwidth question.

Decision time

The voting period begins on April 6 and ends on April 19. The term for the new DPL begins on April 21 and runs for one year. No matter who wins, the incoming DPL will have no shortage of work to be done.

Comments (37 posted)

Book review: Practical Julia

By Jake Edge
April 10, 2024

A recent book by LWN guest author Lee Phillips provides a nice introduction to the Julia programming language. Practical Julia does more than that, however. As its subtitle ("A Hands-On Introduction for Scientific Minds") implies, the book focuses on bringing Julia to scientists, rather than programmers, which gives it something of a different feel from most other books of this sort.

The book begins with the preliminaries, as one might guess. It gives information on how and where to get Julia. There is also a description of Julia's read-eval-print loop (REPL) that can be used for interacting with the language, along with other options of that sort (e.g. computational notebooks, such as the Julia-specific Pluto or the multi-lingual Jupyter). The book also shows some of the more "modern" features adopted by Julia, like its use of colors in the REPL and its embrace of Unicode for identifiers, including allowing (some) emojis as variable names.

Language intro

Part one of the book ("Learning Julia") provides an introduction to Julia, which starts with numbers—Julia has a whole slew of numeric types, including integer and floating-point types of different widths (e.g. Int32, Float64). Complex values can be created using im for i, so "3 + 4im" is a Complex{Int64} type (at least on a 64-bit system), while 3.14 + 7.2im is a Complex{Float64}. Julia has three different types of division, spelled "÷", "/", and "//", which result in integer, floating-point, or rational-number (i.e. fractional) values.

Entering "÷" is a bit cumbersome, perhaps, though the REPL has a form of completion that can be used instead of compose-key sequences (or other Unicode-input mechanisms). Typing "\div" followed by the TAB key will insert the division sign—whether that is easier or not probably depends on the user. That also works for various other Unicode symbols that might be used as variables to make the calculations in a program look more like the underlying mathematical formulas. To me, it all seems a bit clunky to work with, however.

Numeric types are followed up with a whirlwind tour of some types of expressions, including expression blocks, leading into simple while loops and if statements. After that are aggregate types. Julia has arrays, naturally, including one-dimensional vectors and multi-dimensional matrices. Since array elements can contain any other objects, you can have vectors of vectors (or matrices, etc.). All of that is pretty standard stuff for a programming language these days.

But Julia has a number of quirks with using arrays that the book describes nicely—and at some length. Arrays can be accessed, constructed, and operated on in an almost dizzying number of ways, some of which make more sense (to me) than others, but Phillips goes through them in some detail. Given that Julia is often used in scientific computing—which the book focuses on—having a wide variety of array-manipulation tools is to be expected. The Fortran roots of scientific computing also make a somewhat surprising appearance in Julia: arrays are indexed starting at one and matrices are stored in column-major order.

Next up are characters and strings, with strings being "similar in some ways to a Vector, but with some complications"—arising from the differing widths of UTF-8 characters. Those difficulties lead smoothly into the presentation of for loops, which can be used to step through strings without falling into string-indexing holes, among other uses, of course. Unlike vectors, though, strings in Julia are immutable.

The initial "Language Basics" chapter wraps up with functions, scoping, and mutability. The book carefully describes how to define simple functions and defers one of Julia's signature features, multiple dispatch, to later. As with various other small features and conveniences, Phillips often mentions them in passing, such as using "2x" where other languages require "2*x" or that truly simple functions can be defined on a single line:

    function double(x)
        2x
    end

    # or:

    double(x) = 2x

It is kind of a casual, almost meandering at times, introduction to the core of the language. That is a bit surprising, because, in less than 40 pages, it gives enough information to start writing simple programs. The introduction is not aimed at complete novices, however; a technical, math-oriented background will be needed to come up to speed. The scientists targeted by the book should generally do just fine.

In the first part, there are additional chapters on different facets of the language and its ecosystem. "Modules and Packages" heads that list. The chapter covers both sides of the coin: developing and using modules and packages in projects, as well as picking up new packages and adding them to the local installation. Julia's package system uses the Pkg package manager, which downloads from the official Julia package registry by default.

One of Julia's signature features, graphical output, is the subject of two separate chapters in the first part of the book. The plotting chapter looks at multiple options for both two- and three-dimensional plots. Some choices for different plotting backends beyond GR, which is the default for the Plots package that is the go-to tool for data visualization in Julia, are mentioned as well. Another chapter, perhaps a little oddly separated by two intervening chapters, looks at creating diagrams and animations using Julia.

The two chapters in between add more language features, including collections (such as dictionaries, sets, and named tuples), more operations on arrays, more information on functions, metaprogramming, macros, and so on. The final chapter of the language-introduction is on the type system and how it can be used for multiple dispatch. Those reading the book this far—generally with the REPL open for trying things out—should come out of it having learned the fundamentals of Julia. Up next is the second half, which puts the language to work on a variety of real-world scientific-computing problems.

Application

It seems clear that Phillips sees the first part mostly as an entree to his examples of how to apply Julia to different problem domains. There are seven separate chapters covering problems from areas such as physics, biology, and machine learning. A chapter on parallel processing has some echoes with his 2021 article on Julia concurrency, which is no surprise. Each of the chapters looks at a few problems from the topic area and presents solutions that use packages from the wide array of choices that the Julia ecosystem provides.

While I could pretty easily follow the code in these chapters, some of the underlying math and science went sailing smoothly overhead—my calculus has sadly bitrotted over the years. One that (mostly) did not was the chapter on statistics, which looked at random numbers, complete with a plot of their (lack of) distribution. It also had an interesting treatment of the Monty Hall problem, which is surprisingly still non-intuitive even after having seen it multiple times. "Playing" the game thousands of times shows which door choice makes the most sense—in graphic form, as is true throughout part two.

The pandemic-modeling example in the statistics chapter was also interesting. From there, Phillips looked at using real-world data as a way to describe using CSV files, handling missing data, and introducing the DataFrames package in the context of doing statistical analysis of COVID-19 data. DataFrames provides a tabular data structure similar to that of Python's pandas module. Using those, he creates several different types of graphs that compare infection rates of many countries and over various time scales. It is abundantly clear how versatile Julia is in manipulating and displaying data, which goes hand-in-hand with its visualization focus.

Examples from other chapters range from evolutionary modeling to signal analysis of the call of the endangered Cactus Ferruginous Pygmy Owl. The subject matter of the examples is diverse enough that there are likely at least a few problems in a subject area of interest to each reader. But, even if evolution or bird calls are not particularly compelling, the techniques used can be applied more widely—and there are dozens of other examples.

As in the first part, Phillips sprinkles in relevant tidbits of Julia programming techniques in the examples, so there are multiple things to be learned from them. Each chapter has a "Further Reading" section at the end with references to places to go for more information as well. Overall, the book has well-chosen examples that give plenty of opportunity to demonstrate—show off—all that can be done with the language.

Of course, no book is perfect, so I have some minor complaints. The example chosen to explain the continue statement seemed contrived, for example. And I found the description of the interaction between column-major order and the fill() and repeat() matrix operations confusing, which may be a personal failing. There are others, but the only one of note is that the book may make the reader lament that their math skills have slowly dribbled away—at least it did for me.

One thing that seems clear, though, is that the sprawling nature of the Julia language—a kitchen sink is in there somewhere, I'm sure—makes it hard to contain in a book. Phillips generally does an admirable job; it seems likely that his intended audience will find much to like—and use—in the book. Interested non-scientists will too.

Comments (5 posted)

The first Linaro Forum for Arm Linux kernel topics

April 9, 2024

This article was contributed by Tom Gall, Bill Fletcher, and Arnd Bergmann

On February 20, Linaro held the initial get-together for what is intended to be a regular Linux Kernel Forum for the Arm-focused kernel community. This gathering aims to convene approximately a few weeks prior to the merge window opening and prior to the release of the current kernel version under development. Topics covered in the first gathering include preparing 64-bit Arm kernels for low-end embedded systems, memory errors and Compute Express Link (CXL), devlink objectives, and scheduler integration.

The forum generally follows a "show and tell" format for people to share their plans, bring up issues of the day, and advance ideas for discussion for upcoming versions of the kernel. Ideally, this will help advance coordination, find developers with common interests, and encourage participation. The meetings are public and they are recorded; these notes are meant to give a sense of what occurred.

The meeting agenda, the link to the zoom meeting, and links to the recording can be found on this page.

The meeting is focused on helping the Arm kernel community solve Linux upstreaming issues for the ecosystem. It's understood that this needs to be done with strong community buy-in. Curation and having the right people in the room are important — specifically to avoid any hint of "Ask Maintainers Anything" sessions. The goal is not to ask for a patch set to be tagged or queued, but to share with others and the community what people want to focus on for the next cycle.

Getting Arm64 kernels ready for low-end embedded

Led by Arnd Bergmann

Bergmann asked which kernel changes would be needed now that Arm64 is starting to displace 32-bit Armv7 CPUs in low-end, embedded systems. These are typically mass-market devices, such as camera SoCs, which generally have less than 128MB of RAM; they have previously been the realm of Armv7. With 32-bit systems, we didn't need to optimize the kernel at all; now the immediate need is to shrink the size of the kernel and eliminate overhead. A separate need for footprint optimization is also being driven by embedded systems, which are now running multiple virtual machines. In this case, individual system memory consumption matters a lot more. How small can we make the kernel and system?

Bergmann covered a few options, along with initial results, starting with turning off features while still being able to boot a Debian image and have a normal filesystem and network connectivity; that got down to 7MB for a 64-bit kernel. A 40MB complete system seems achievable. The focus then shifts to shrinking user space. A 32-bit user space saves much more memory than anything that could be done in the kernel, but the factor-of-two overhead for kernel code, data, and heap remains compared to a 32-bit kernel.

Potential overhead reductions that were explored included:

CONFIG_SMP, which saves 1-3MB of RAM if only a single core is enabled.
Execute-In-Place (XIP) can save RAM by putting all the text and read-only data into flash (saving a substantial 5MB in an example case).
XIP has an issue with run-time patching. Modern CPU features are detected (and the loaded kernel patched accordingly) at run time, but we've reached a point where we should reconsider this policy in order to build a smaller kernel that only runs on newer CPUs.
There was a patch for dead-code elimination for 32-bit CPUs which didn't make a huge difference (CONFIG_LD_DEAD_CODE_DATA_ELIMINATION). There could be more potential in there and there are very few downsides.

Memory Errors and Compute Express Link (CXL)

Led by Jonathan Cameron

ACPI/APEI: Sending the correct SIGBUS si_code: Cameron raised an issue with memory-error reporting. A memory error generates an ACPI event; if the error reported is in a user-space process, the expectation is that the process containing the error will be killed. Unfortunately, that doesn't currently work because the ACPI event for memory errors reports the wrong type of error. Hence, as it stands currently, a detected memory failure in a user-space process isn't resulting in that process being killed. It is something that needs to get cleaned up.

Memory scrub control for differentiated reliability: Also related to memory-error reporting, memory scrub is an activity that involves checking ECC memory for errors on a periodic basis. Its purpose is to keep ECC-correctable errors from turning into uncorrectable errors. Generally this is performed autonomously by the memory controller and may even be performed by the memory DIMM itself (Error Check and Scrub (ECS) is implemented in DDR5). Errors — correctable or not — are reported.

Cameron raised the need for user-space control of scrub in CXL's case as there's no way for firmware to configure it for hot-plugged devices. There is an RFC currently at v6. Who else cares? How general can we make it; which market segments beyond server might care? There's a proposed scrub-control system with an ABI, is that the way to go?

NUMA domains post boot: There is an assumption on Arm64 today that all memory hotplugging happens in known NUMA nodes. In contrast, x86 does not assume this but it has an architecture-specific solution to associate memory hotplug and NUMA nodes. CXL memory is hot-pluggable and there's an underlying assumption that we declare a NUMA node for each CXL Fixed Memory Window Structure (CFMWS) memory window. We need to know which of those NUMA nodes is associated with a hotplug event. Currently no one wants to expend the (significant) effort that will be required to address dynamic NUMA-node creation. As a work-around we could re-parse the CXL Early Discovery Table (CEDT) and do a look up. Is this too much of a hack?

Config options with a big blast radius: CXL has kernel features with user-space interfaces that, if misused, could potentially take out the rack or even a data center by, for example, removing everyone's access to memory. There will be security measures in place to mitigate this but a security bug could still risk exposing a way of taking down a rack. There has therefore been pushback from the community on accepting these features.

Where these interfaces are required they should only be wired to a baseboard management controller (BMC) or fabric manager. The issue also applies to the Management Component Transport Protocol (MCTP) stack. How do we prevent people from shooting themselves in the foot with the option to configure these in general purpose systems? Options include either gating all of the features behind a CONFIG_(I_AM_A_)BMC option, or kernel tainting, which is already used elsewhere in CXL. Does anyone else have any ideas?

Current plans for fw_devlink

Led by Saravana Kannan

There are bunch of TODO items that Kannan is planning on working on, including adding post-init-suppliers to the devicetree schema. A devicetree link (devlink) should guarantee correct suspend/resume and shutdown ordering between a "supplier" device and its "consumer" devices. The consumer devices are not probed before the supplier is bound to a driver, and they're unbound before the supplier is unbound.

There are, unfortunately, cyclic dependencies between suppliers and consumers in the devicetree; adding the post-init-suppliers property is about providing a way to give additional information to the firmware devlink interface, or to any kernel, for it to be able to break the cycle. This is to achieve deterministic probe and suspend/resume operation, with an end goal of better stability and reliability and to improve run-time power management. Also, if a supplier is forcefully unbound, consumers don't necessarily get cleaned up correctly unless it's a bus device with a driver. There are some corner cases which will also get fixed during this work.

Another objective is adding support to devlink for "class" devices. Devlink is a driver-core feature where devices can say "don't probe me until my supplier finishes probing". Currently, class devices don't probe, we just add them. The framework currently allows adding class devices as suppliers. There's scope for potentially weird probing behavior. Some of the nuances relate to, for example, what it means to be "ready" for a class device.

Finally, there is clock-framework sync-state support. This was talked about at the 2023 Linux Plumbers Conference. There was some agreement to clean up Kannan's patch series from more than a year ago and address the gaps that were pointed out. He expects to send it out as an RFC or updated patch set.

System pressure on the scheduler

Led by Vincent Guittot

This work has been presented at OSPM and LPC. It aims at consolidating the view that the scheduler has of the CPU's compute capacity and how it maps to actual CPU frequencies.

arch_scale_*() are per-architecture functions that enable architecture-specific code to report some specific behavior to the scheduler. As an example, arch_scale_cpu_capacity() reports the maximum compute capacity of a CPU. The default function returns the same value — 1024 for all CPUs — which is fine for SMP systems, but an architecture can provide its own version when CPUs have different compute capacities, as in big.LITTLE or heterogeneous systems. Similarly, arch_scale_freq_capacity() reports the current compute capacity of a CPU according to the current frequency. With arch_scale_cpu_capacity() and arch_scale_freq_capacity(), the scheduler knows the compute capacity of a CPU and can compare it with others when selecting one for a task. Nevertheless, some inconsistency can appear with some configuration when the maximum frequency of a core changes during boot or at runtime.

The work of consolidating all the arch_scale_*() functions has been split into three parts. The first part, with arch_scale_cpu_capacity() and arch_scale_req_ref(), has been merged in 6.8. It raised a few regressions but now everything is fixed. The second part introduces arch_scale_hw_pressure() and cpufreq_get_pressure(). This is under discussion on the mailing list with a new version (v6).

The next task will be on the new arch_scale_cpu_capped() for 6.10. When user space has capped the maximum frequency of a CPU, we want to take this as a new maximum capacity of the CPU instead of doing some kind of estimation or best-effort decisions. Ultimately the aim is to better take into account user-space capping into the scheduler. Guittot would be interested to get any feedback on this.

Participating in Future Sessions

The next forum will be on April 30th, just prior to the 6.10 kernel merge window opening. If you'd like an email notification, contact tom.gall@linaro.org. A calendar invite is available, along with a reminder a week prior. If you have something for the agenda, please add it to the shared document.