LWN.net Weekly Edition for September 2, 2021

Welcome to the LWN.net Weekly Edition for September 2, 2021

This edition contains the following feature content:

Emacs discusses web-based development workflows: what would have to happen to bring the Emacs development community into a post-email development environment?
Not-a-GPU accelerator drivers cross the line: kernel developers put their collective foot down in response to growing demands from accelerator drivers.
Nftables reaches 1.0: a longstanding development project is finally approaching fruition.
Some 5.14 development statistics: where the code in the 5.14 kernel came from.
Cooperative package management for Python: how Python's package manager can get along with distributions' own package-management systems.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Emacs discusses web-based development workflows

By Jake Edge
September 1, 2021

Discussions on ways to "modernize" the Emacs editor have come up in various guises over the past few years. Changes of that nature tend to be somewhat contentious in the Emacs community, pitting the "old guard" that values the existing features (and keybindings) against those who argue for changes to make Emacs more approachable (and aesthetically pleasing) to newcomers. Those discussions tend toward mega-thread status, so it should be no surprise that a query about possibly moving Emacs development to a "forge" (e.g. GitHub or GitLab) got similar treatment. As always in Emacs-land, there are multiple facets to the discussion, including the desirability of moving away from an email-based workflow, accommodating younger, forge-centric developers without forcing existing developers into that model, and—naturally—licensing.

As a newcomer to the emacs-devel mailing list, Daniel Fleischer may not have expected the voluminous response he got to an August 26 post asking about the status of a "move to a new VC [version control] system, e.g. Gitlab". The somewhat provocative subject of the email, "Gitlab Migration", probably helped draw eyes (and responses) as well. There are no current plans to make a migration of that sort, of course, and a two-year-old feature request at GitLab shows a "pretty daunting" level of work needed, Dmitry Gutov said. Richard Stallman had a different concern:

We used to recommend GitLab as better than GitHub (though only barely acceptable). However, GitLab has got worse, and we should stop recommending it at all.

He suggested that sourcehut or NotABug.org might be a better match from a licensing and philosophy perspective. Other than a request from Emacs co-maintainer Eli Zaretskii for an evaluation of NotABug.org, not much was heard about that development-hosting site in the thread; sourcehut, on the other hand, came up multiple times. Stallman said that any potential solution would need to run on GNU infrastructure, rather than as a hosted service; he was also worried about the free-software status of any JavaScript needed.

In the original post, Fleischer noted two benefits that he saw with a switch to some kind of forge. Because younger developers are more familiar with the forge-style workflow, providing that will lower the barrier to entry for new developers, he said. In addition, not scattering the different portions of the project among multiple systems makes it easier to work with the project:

Having the code + issues + discussions in the same place as opposed to now, where the code and discussions (lists) are in 3 different places (Savannah, Gnu mailing lists and Gnu bug tracker). With a modern VC system, one can jump easily between issues, discussions, code commits back and forth easily as opposed to now, where if it's a bug you can use its number to search lists and commits messages but if it's a discussion, it's not "connected" to anything.

He also noted that an email-based workflow should still be supported, so that developers can use Emacs for all of their project interaction, as they do now. Emacs co-maintainer Lars Ingebrigtsen called that "the biggest hurdle", noting that it is important not to risk losing the current contributors; "Can you interact with Gitlab via a mail-only system?" While that question was not directly answered in response to that, the GitLab feature request and other discussion makes it clear that the answer is "no", at least for now, and it is not clear that there is any real work going on to change that.

Web advantages?

Philip Kaludercic objected to the idea that using the web for development tasks was actually any easier, which is a common response from those who are used to email style. Similarly, he said that while the various parts of the development process were in separate places, they share an important characteristic: all are email messages. He suggested that the "biases against mailing list development might be illegitimate", but Ingebrigtsen said that there is a different dynamic at play:

It seems like it should be easier to just send a patch, but feedback we're getting shows that it's not for a number of developers. Many don't use mail at all for development, and all they're used to is the GitLab/Hub way of doing it.
So it's easier for them -- it feels safe and familiar for them to do development by clicking around in a web browser.

On the other hand, Jim Porter explained that as relatively new contributor, the feeling of intimidation with using an email-based workflow is real, but it turns out not to actually be that hard to adapt to it. He said that maintaining a project using the mailing list might be different, "but that's not my job, so it's not a problem for me". He did suggest some places where the documentation might be improved for those who are used to the pull-request-style workflow and plans to propose some documentation patches. But Tim Cross thinks that email is a dead-end for younger developers:

I think virtually all developers are forced to suffer email, but a [growing] number don't use it. Often, all the discussions, notifications, comments etc are actually consumed via a mobile 'app'. For these users, logging into their inbox is frustrating and inconvenient because their inbox is full of pointless and old messages/notifications/alerts they have already seen/received via other channels. For these users, the primary reason they have an email address is to have something to put into the 'login' box for web services they use. Telling these users to use email to submit a patch is very similar to me being told when I started using email that I had to send in a hard copy via snail mail.

Fleischer concurred, noting that he is in his 30s (half of Cross's self-reported 60s) but only uses email for identity purposes and for receiving official documents:

I don't talk to family, friends or coworkers via mail. Personally, I think it's old, not secure or private by default, very inconsistent (HTML rendering is arbitrary vs. text, multiple MUA [mail user agent]) and just can't imagine using it as a software engineering tool.

The security of email, at least versus the app-centric alternatives, was hotly contested by some. Zaretskii said that the "security issues with email have been solved long time ago" and that the email-based workflow used where he works allows his company to "run circles around those of our partners and competitors" who do not use it. Cross pointedly disagreed about email security, however:

Despite what others have claimed, the security problems with email have NOT been addressed. It is still one of the major vectors for compromising access via social engineering, major vector for infection from [ransomware], frequent source of privacy [breaches] and a common means used to get sensitive data out of an organisation. Even having encrypted messages is overly complex for most users and a nightmare to administer for large organisations.

Stefan Monnier thought that the problem was less about email per se, but that mailing lists are intimidating:

I think the issue is not use of email as such, but use of mailing-lists. In my experience the reluctance to use email is that they feel uncomfortable sending email to a potentially large number of persons. In contrast posting on a forum doesn't impose anything on anyone (in their mind) because those who read it have to actively go and look for it.
[ Of course, it doesn't make much sense, really, but this is about people's perceptions, not about anything rational. ]

There are a number of problems with the email-based workflow that forge-style development alleviates, Clément Pit-Claudel said. One of those is that mistakes can be edited on a web site, which is not the case for mailing list; some state tracking and maintainer tasks may be simplified as well. But Zaretskii was quick to point out various deficiencies he sees with the web interfaces. He does not think that adding GitLab (or similar) will provide much to maintainers, it would simply"be more welcoming to casual contributors", which is worth doing, but the advantages for existing Emacs developers are far less clear.

Whatever choices are made, Stallman is adamant that no "online dis-services" be recommended for doing Emacs development. It is fine to have adjunct mechanisms, but email must remain viable:

It's fine if users have the _option_ of communicating with the maintainers and developers via some interface other than email. (Or ten different interfaces other than email.) But when some of them use it. that must not compel the maintainers and developers to use it too.

Multiple workflows

There is no way to objectively determine which development style is easier, Fleischer said, so it is fruitless to discuss it. "I can say objectively that one form of workflow is more popular than another so people would be more familiar with it." The conversation soon turned to supporting multiple workflow styles; since there had been little progress with GitLab, sourcehut was discussed.

Two threads from 2020 were noted; one where sourcehut creator Drew DeVault self-evaluated the hosted service under the GNU ethical repository criteria and another where Emacs development using sourcehut (either as a service or by hosting the code on GNU systems) was discussed. At the time of the latter thread, email-based workflows were supported well, "but what about the GitLab/Github-like features?", Zaretskii asked. DeVault pointed to a video showing "the sourcehut equivalent of the github/lab pull/merge-request flow".

That led to further discussion of sourcehut's capabilities with respect to the needs of Emacs development. Gutov noted a few different things he saw as "frictions/downsides" compared to "the Git**b workflow". That discussion proceeded to hash out some of those areas, with DeVault participating and being quite interested in trying to bridge the gap between sourcehut's current features and the needs of the Emacs project. In another sub-thread, he summed up the current status:

First, we would be very pleased to host emacs on our service, or with our software on your own infrastructure. It should more-or-less work with everyone's existing workflows, given that emacs is an email-oriented project, and over time we are developing an improved web-based experience which should allow more and more users access to emacs development over time without necessarily requiring the maintainers or those who prefer their email-oriented workflow to have to change their workflow to [accommodate] a platform like GitLab. We should also rate quite highly in terms of free-as-in-freedom, since the entire service is free software, mostly AGPL.

He did note that the sourcehut bug tracker works differently than the existing bug tracker, which is based on an older version of the Debian Bug Tracking System (BTS). But there are plans to make the sourcehut bug tracker "more email-oriented". Though the philosophies of sourcehut and GNU are similar, Stallman would prefer not to host Emacs on the service:

I think Drew DeVault is right in saying that his philosophy is close to ours. If we had to use some outside hosting service, I might well say there is no better choice than his. (I don't want to criticize the other options without knowing what they are; they may be equally good.) But we'd rather have this on a GNU machine with GNU Project people managing it.

The sourcehut service is not the only option, as DeVault said in an earlier message:

You can run the sourcehut software on GNU servers, it is 100% free software, mostly AGPL. I would encourage you to do so.

Gutov said, that adopting sourcehut would be a net win, but that there are still things lacking in sourcehut that will be difficult to add in the email realm:

For example, those quality-of-life features that Gitlab has in the browser which I previously figured would be difficult to translate to email (the code review workflow, with inline comments and updates from the branch; automatically updated CI indicators and links to builds; editing of messages) are predictably absent.

Efforts are being made to describe the kinds of web-based features that Emacs would want in sourcehut (or any other forge it might adopt) so that users of the web-based workflow would feel at home. Fleischer listed some of those types of features. But some commenters are still not entirely convinced that adding support for a web-based workflow will lead to more and better contributions. João Távora posted some thoughts on what he has observed:

In recent $DAYJOBs I worked with these two GL/GH platforms fully, using them liberally and without restrictions. In these recent experiences the undeniable contemporarity and newcomer friendliness of these platforms does NOT seem to translate into quality of code, quality of discussion or any kind of benefic developer agility in any way. Again, just anecdotal evidence which you may take for what it's worth, but in fact I believe that the "slow", unfamiliar, peculiar, old-school whatever-you-want-to-call-them methods used in Emacs development may in fact be "aces up our sleeve", not just a means to appease those that have been using them for a number of years.

On the flipside, though, Pit-Claudel saw the opposite in a project that he works on which switched to GitHub, but also cautioned that it is easy to give these kinds of anecdotes too much weight:

What this doesn't say is whether any "modern" workflow would have helped, or whether it was specifically Github, because of network effects (the barrier for contribution is lower if you already have an account and you are already familiar with the UI).
In fact, it doesn't even say whether it was the move itself that helped, or whether the move was an irrelevant manifestation of a more welcoming approach and a general effort to attract contributions.

For Monnier, the choice of tools is obvious at this point. Rather than continuing to look at other options, he would just like to get on with an evaluation of sourcehut:

Reading this discussion made me realize why I'm rooting for SourceHut: it's the only such tool I've seen whose goals align exactly with ours. Both philosophically (most other tools just happen to be Free Software and but are more philosophically aligned with Open Source) and technically (it aims for first class email support while providing as much as possible of the shiny web features of other systems).
So at this stage, I personally don't see much point looking for other tools. Instead I'm just wondering how to get us to use SourceHut (i.e. how to resolve the problems that might stand in the way, such as getting an instance up and running on GNU machines, starting to use it "on the side" to see in practice what issues we'll need fixed/improved/added before we can really switch, etc...).

It is clear that any change—if one is made—will need to support the email-based workflow unchanged (or nearly so) from today's methods, but that adding other workflow styles might be a "nice to have". Stallman noted that there are both practical and moral considerations at play for keeping email as a first-class option:

You also need to have an internet connection at the moment that you try to use the web forum. Not so for sending email.
[...] To make an account on a web forum, you have to use that platform in the way it is set up to require. You have to accept its terms and conditions, which may be morally unacceptable.
When you send email, you use your own choice of tools and platforms -- the same ones you used to send mail to a different forum the day before.

Network effects

Though some are enthusiastic about sourcehut getting to a point where it can satisfy both old- and new-school developers, it may still be too far afield for many of today's younger developers. The network effects of GitHub (and, to a lesser extent, GitLab) far outstrip those of sourcehut, and it may well be that users of those other forges are put off by whatever differences there are at sourcehut. As Ingebrigtsen put it:

Emacs will change to a new system when we can find a system that's good enough, and offers both an email work flow (for those that prefer that) as well as a non-email work flow (for those that prefer that).
We haven't found such a system yet. sr.ht [sourcehut] is perhaps an option, but my first impression is that it's too different from GitHub/Lab, really.

In the end, it will probably come down to whether or not the Emacs developer community finds the energy to start working with sourcehut (or some alternative, though that seems far less likely at this point). There is clearly work needed even to get it to a point where the email-based workflow is fully supported (e.g. bug tracking); enough web features to bring in users of Git**b needs even more. It is not at all clear that doing that work will change the situation all that much, either; drive-by contributors, or even those who are more serious about doing Emacs development, may not be all that interested in signing up for yet-another web site and learning its interface and quirks.

Emacs is not alone in feeling that workflow modernization is needed for the future, of course. Python has largely already made the switch to GitHub, for example, and the Linux kernel is dipping its toes into that world as well. Other projects are likely struggling with it as well. To the extent that the future can be predicted with any accuracy, at this point one would have to guess that 20—or 30—years from now email-based workflows will be dead or nearly so. Navigating from now until then is going to be somewhat messy for many, generally older, projects.

Comments (144 posted)

Not-a-GPU accelerator drivers cross the line

By Jonathan Corbet
August 26, 2021

As a general rule, the kernel community is happy to merge working device drivers without much concern for the availability of any associated user-space code. What happens in user space is beyond the kernel's concern and unaffected by the kernel's license. There is an exception, though, in the form of drivers for graphical processors (GPUs), which cannot be merged in the absence of a working, freely-licensed user-space component. The question of which drivers are subject to that rule has come up a few times in recent years; that discussion has now come to a decision point with an effort to block some Habana Labs driver updates from entry into the 5.15 kernel.

The GPU-driver rule is the result of a "line in the sand" drawn by direct-rendering (DRM) maintainer Dave Airlie in 2010. The kernel side of most GPU drivers is a simple conduit between user space and the device; it implements something similar to a network connection. The real complexity of these drivers is in the user-space component, which uses the kernel-provided channel to control the GPU via a (usually) proprietary protocol. The DRM maintainers have long taken the position that, without a working user-space implementation, they are unable to judge, maintain, or test the kernel portion of the driver. They have held firm for over a decade now, and feel that this policy is an important part of the progress that this subsystem has made over that time.

At its core, a GPU is an accelerator that is optimized to perform certain types of processing much more quickly than even the fastest CPU can. Graphics was the first domain in which these accelerators found widespread use, but it is certainly not the last. More recently, there has been a developing market in accelerators intended to perform machine-learning tasks; one of those, the Habana Gaudi, is supported by the Linux kernel.

The merging of the Gaudi driver has raised a number of questions about how non-GPU accelerators should be handled. This driver did not go through the DRM tree and was not held to that subsystem's rules; it went into the mainline kernel while lacking the accompanying user-space piece. That was later rectified (mostly — see below), but the DRM developers were unhappy about a process that, they felt, bypassed the rules they had spent years defending. Just over one year ago, the arrival of a couple of other accelerator drivers spurred a discussion on whether those drivers should be treated like GPUs or not; no clear conclusions resulted.

The Habana driver has been the source of a few similar discussions over the last few months, with bursts in late June and early July. The problem now is an expansion of that driver's capabilities that requires using the kernel's DMA-BUF and P2PDMA subsystems to move data between devices. These subsystems were developed to work with GPU drivers and are clearly seen by some DRM developers as being part of the kernel's GPU API; drivers using them should, by this reasoning, be subject to the GPU subsystem's merging rules. Or, as Airlie phrased it in his objection to merging the Gaudi changes:

NAK for adding dma-buf or p2p support to this driver in the upstream kernel. There needs to be a hard line between "I-can't-believe-its-not-a-drm-driver" drivers which bypass our userspace requirements, and I consider this the line.
This driver was merged into misc on the grounds it wasn't really a drm/gpu driver and so didn't have to accept our userspace rules.
Adding dma-buf/p2p support to this driver is showing it really fits the gpu driver model and should be under the drivers/gpu rules since what are most GPUs except accelerators.

The interesting twist here, as acknowledged by DRM developer Daniel Vetter, is that there is, indeed, a free user-space implementation of the Gaudi driver. What is still not available is the compiler used to generate the instruction streams that actually drive this device. Without the compiler, Vetter said, the available code is "still useless if you want to actually hack on the driver stack". He elaborated further:

Can I use the hw how it's intended to be used without it?
If the answer is no, then essentially what you're doing with your upstream driver is getting all the benefits of an upstream driver, while upstream gets nothing. We can't use your stack, not as-is. Sure we can use the queue, but we can't actually submit anything interesting.

Over the course of the discussions, the DRM developers have tried to make it clear that they want a working, free implementation of the user-space side. It does not have to be the code that is shipped to customers, as long as it is sufficient to understand how the driver as a whole works. To some, though, the compiler requirement stretches things a bit far. Habana developer Oded Gabbay has described the DRM subsystem's requirements this way:

I do think the dri-devel merge criteria is very extreme, and effectively drives-out many AI accelerator companies that want to contribute to the kernel but can't/won't open their software IP and patents.
I think the expectation from AI startups (who are 90% of the deep learning field) to cooperate outside of company boundaries is not realistic, especially on the user-side, where the real IP of the company resides.

Cooperating outside of company boundaries is, of course, at the core of the Linux kernel development process. The DRM subsystem is not alone in making such requirements; Vetter responded by pointing out, among other things, that the kernel community will not accept a new CPU architecture without a working, free compiler.

Over the years, there has been no shortage of problems with vendors that want their hardware to work with Linux while keeping their "intellectual property" to themselves. This barrier has been overcome many times, resulting in wider and better hardware support in the kernel we all use. Getting there has required at least two things: demand from customers for free drivers and a strong position in the development community against proprietary drivers. The demand side must develop on it own (and often does), but the kernel community has worked hard to maintain and communicate a unified position on driver code; consider, for example, the position statement published in 2008. As a result there is a consensus in the community covering a number of areas relevant to proprietary drivers; one will have to work hard to find voices in favor of exporting symbols to benefit such drivers, for example.

This ongoing series of discussions makes it clear that the kernel community has not yet reached a consensus when it comes to the requirements for drivers for accelerator devices. That creates a situation where code that is subject to one set of rules if merged via the DRM subsystem can avoid those rules by taking another path into the kernel. That, of course, will make it hard for the rules to stand at all. Concern about this prospect extends beyond the DRM community; media developer Laurent Pinchart wrote:

I can't emphasize strongly enough how much effort it took to start getting vendors on board, and the situation is still fragile at best. If we now send a message that all of this can be bypassed by merging code that ignores all rules in drivers/misc/, it would be ten years of completely wasted work.

Avoiding that outcome will require getting kernel developers and (especially) subsystem maintainers to come to some sort of agreement — always a challenging task.

In the case of the Gaudi driver, Greg Kroah-Hartman replied that he had pulled the controversial code into his tree. In response to the subsequent objections, he dropped that work and promised to "write more" once time allows. Dropping the patches for now helps to calm the situation, but it has not resolved the underlying disagreement. At some point, the kernel community will have to reach some sort of conclusion regarding its rules for accelerator drivers. Failing that, we are likely to see a steady stream of not-a-GPU drivers finding their way into the kernel — and a lot of unhappiness in their wake.

Comments (68 posted)

Nftables reaches 1.0

By Jonathan Corbet
August 27, 2021

The Linux kernel is a fast-moving project, but change can still be surprisingly slow to come at times. The nftables project to replace the kernel's packet-filtering subsystem has its origins in 2008, but is still not being used by most (or perhaps even many) production firewalls. The transition may be getting closer, though, as highlighted by the release of nftables 1.0.0 on August 19.

The first public nftables release was made by Patrick McHardy in early 2009. At that time, the kernel had a capable packet-filtering subsystem in the form of iptables, of course, that was in widespread use, but there were a number of problems driving a change. These include the fact that the kernel had (and still has) more than one packet-filtering mechanism: there is one for IPv4, another for IPv6, yet another for ARP, and so on. Each of those subsystems is mostly independent, with a lot of duplicated code. Beyond that, iptables contains an excessive amount of built-in protocol knowledge and suffers from a difficult API that, among other things, makes it impossible to update a single rule without replacing the entire set.

The core idea behind nftables was to throw away all of that protocol-aware machinery and replace it with a simple virtual machine that could be programmed from user space. Administrators would still write rules referring to specific packet-header fields and such, but user-space tooling would translate those rules into low-level fetch and compare operations, then load the result into the kernel. That resulted in a smaller packet-filtering engine that was also far more flexible; it also had the potential to perform better. It looked like a win, overall, once the minor problem of transitioning a vast number of users had been overcome.

Nftables made a bit of a splash when it was launched, but then bogged down and disappeared from view, perhaps because McHardy decided he had more interesting opportunities to pursue in courtrooms. In 2013, though, Pablo Neira Ayuso restarted the project with the idea of getting the code merged into the mainline as soon as possible. That part succeeded; nftables found its way into the 3.13 kernel release at the beginning of 2014.

The work since then has been a hard slog of filling in the gaps and making nftables sufficiently appealing that users would want to make the transition. The language used to write filtering rules has gained a long list of features for stateful tracking, address mapping, efficient handling of address intervals and large rule chains, and support for numerous protocols. There was also documentation to write, of course; the nftables wiki has a lot of information about how it all works.

There is, of course, one other significant impediment to transitioning away from iptables: the vast number of deployed, working firewalls using the latter. In many cases, rewriting the firewall rules may be the best course of action because many complex filtering setups can be expressed much more efficiently in the new scheme. But, for administrators who just want their painfully developed firewall to keep working, the benefits of nftables may be less appealing than one might expect. The nftables developers have developed a set of scripts to translate iptables firewalls into the nftables equivalent, which should help, but it is still a big jump.

In some cases, users may eventually make that jump without even noticing, though. Linux distributions have carried support for nftables for some time now, and work is being done to port tools like Red Hat's firewalld to nftables. In cases like this, users may have never seen the iptables rules in the first place and, with luck, will not notice that the underlying mechanism has changed.

When will that change happen? It is still somewhat hard to say. The 2018 Netfilter Workshop decreed that iptables is "a legacy tool" whose days are numbered. Debian switched to nftables by default in the 2019 Debian 10 "buster" release, though Ubuntu didn't follow until the 21.04 release. While almost all distributions ship nftables, many of them have yet to make the switch to use it by default.

The release of nftables 1.0.0 can be seen as a signal that it is time for the laggards to get more serious about making the switch. While it is hard to imagine iptables support being removed anytime soon, it's rather easier to foresee that enthusiasm for maintaining it will continue to wane. New features will show up in nftables instead, and users will eventually need to migrate over to take advantage of them. It only took 13 years, but this transition finally appears to be heading into its final stage.

There is, however, one other interesting question. In 2018, the BPF developers announced bpfilter, a packet-filtering mechanism that runs on the BPF virtual machine. The announcement drew some attention at the time; BPF had (and has) a lot of momentum, and a lot of work has been done to optimize the virtual machine and make it safe to use. Arguably, it makes sense to use that rather than maintain yet another virtual machine just for packet filtering. That would allow the removal of a bunch of code and the focusing of maintenance effort on BPF.

The bpfilter code was merged for the 4.18 kernel release; it also brought in a "user-mode blobs" mechanism that was intended to facilitate the translation of firewall rules to the new machine. Since then, however, development on this code has come to a halt; there have been exactly two (trivial) commits to the code in net/bpfilter in 2021. The removal of this code was discussed in June 2020 but it survived at that time. Since then, the cobwebs have only gotten thicker; it seems fair to say that bpfilter is not an active area of development at this point, and that it seems unlikely to displace nftables anytime soon.

Whether that is the "right" outcome is hard to say. Perhaps the special-purpose virtual machine used by nftables is a better solution to this particular problem than the more general BPF. Or possibly nftables came out on top simply because the developers behind it continued to show up and push the project forward. One of the keys to success in kernel development is simple persistence; that is doubly true for a critical subsystem like packet filtering, where it is more than reassuring to know that the developers are in it for the long haul.

Comments (26 posted)

Some 5.14 development statistics

By Jonathan Corbet
August 30, 2021

The 5.14 kernel was released on August 29 after a nine-week development period. This cycle was not as active as its predecessor, which set a record for the number of developers involved, but there was still a lot going on and a number of long-awaited features were merged. Now that the release is out, the time has come for our traditional look at where the code in 5.14 came from and how it got there.

To create 5.14, the kernel community applied 14,735 non-merge changesets from 1,912 developers; 261 of those developers made their first kernel contribution during this cycle. There were 861,000 lines of code added to the kernel and 321,000 lines removed, for a net growth of 540,000 lines.

The most active 5.14 developers were:

Most active 5.14 developers

By changesets

Lee Jones 215 1.5%

Andy Shevchenko 196 1.3%

Mauro Carvalho Chehab 191 1.3%

Peng Li 167 1.1%

Yang Yingliang 153 1.0%

Zhen Lei 145 1.0%

Christoph Hellwig 136 0.9%

Colin Ian King 136 0.9%

Vladimir Oltean 134 0.9%

Fabio Aiuto 132 0.9%

Takashi Iwai 131 0.9%

Sean Christopherson 122 0.8%

Jiri Slaby 113 0.8%

Jonathan Cameron 108 0.7%

Christophe Leroy 107 0.7%

Geert Uytterhoeven 102 0.7%

Takashi Sakamoto 96 0.7%

Krzysztof Kozlowski 94 0.6%

Gustavo A. R. Silva 93 0.6%

Thomas Gleixner 83 0.6%

By changed lines

Aaron Liu 193379 18.9%

Aurabindo Jayamohanan Pillai 48184 4.7%

Christoph Hellwig 46667 4.6%

Mustafa Ismail 32014 3.1%

James Smart 30907 3.0%

Shiraz Saleem 29185 2.8%

Nicholas Kazlauskas 19620 1.9%

Kashyap Desai 12891 1.3%

Steen Hegelund 12584 1.2%

Masahiro Yamada 10517 1.0%

Jin Yao 10133 1.0%

M Chetan Kumar 8947 0.9%

Konrad Dybcio 8853 0.9%

Srinivas Kandagatla 8266 0.8%

Fabio Aiuto 6976 0.7%

Vladimir Oltean 6444 0.6%

Thierry Reding 6314 0.6%

Takashi Iwai 5858 0.6%

Mark Rutland 5612 0.5%

Greg Kroah-Hartman 5485 0.5%

Lee Jones seems to have staked out a permanent position as the lead contributor of changesets; he continues to focus on cleanups and warning fixes all over the kernel tree. Andy Shevchenko made a lot of fixes throughout the driver subsystem. Mauro Carvalho Chehab worked mostly in the media subsystem with a bunch of documentation fixes on the side, Peng Li contributed a set of style fixes to various network drivers, and Yang Yingliang fixed a lot of warnings in various drivers.

In the "changed lines" column we see Aaron Liu and Aurabindo Jayamohanan Pillai on top with the inevitable set of amdgpu header files. Christoph Hellwig continues to do extensive refactoring work, mostly in the block subsystem. Mustafa Ismail contributed one patch series adding the Intel Ethernet protocol driver for RDMA, and James Smart added a new SCSI driver.

Work in 5.14 was supported by at least 231 employers, the most active of which where:

Most active 5.14 employers

By changesets

Huawei Technologies 1731 11.7%

Intel 1331 9.0%

(Unknown) 1003 6.8%

AMD 879 6.0%

Red Hat 854 5.8%

Google 756 5.1%

(None) 744 5.0%

Linaro 654 4.4%

SUSE 503 3.4%

IBM 445 3.0%

NVIDIA 319 2.2%

Oracle 290 2.0%

Canonical 278 1.9%

NXP Semiconductors 276 1.9%

Facebook 274 1.9%

Arm 255 1.7%

(Consultant) 229 1.6%

Renesas Electronics 203 1.4%

Linux Foundation 170 1.2%

Pengutronix 151 1.0%

By lines changed

AMD 293439 28.6%

Intel 135564 13.2%

(Consultant) 50998 5.0%

Broadcom 47742 4.7%

Linaro 33652 3.3%

Red Hat 30978 3.0%

Huawei Technologies 29704 2.9%

(Unknown) 29631 2.9%

Google 29387 2.9%

NVIDIA 28415 2.8%

(None) 23154 2.3%

IBM 22541 2.2%

SUSE 19887 1.9%

Marvell 17294 1.7%

Microchip Technology 14852 1.4%

NXP Semiconductors 12200 1.2%

Arm 11831 1.2%

SoMainline 10599 1.0%

Socionext Inc. 10526 1.0%

Code Aurora Forum 10050 1.0%

Huawei has found its way to the top of the list of changeset contributors again. Otherwise there is little new or surprising in this list.

Since the beginning, these reports have looked specifically at non-merge changesets, following the reasoning that those are the changes that contain the "real work". Merges, instead, are just the movement of patches from one Git branch or repository to another. That said, merges are a big part of a maintainer's work; each merge requires a look at the code involved and a judgment that the time has come to move that code along to the next stop on its path to the mainline kernel. So perhaps a look at merges, too, makes sense. The most active creators of merge commits in the 5.14 development cycle were:

Merge contributors in 5.14

Linus Torvalds 384 33.8%

David S. Miller 230 20.2%

Olof Johansson 82 7.2%

Mark Brown 54 4.8%

Dave Airlie 52 4.6%

Greg Kroah-Hartman 35 3.1%

Rafael J. Wysocki 28 2.5%

Jakub Kicinski 28 2.5%

Bjorn Helgaas 25 2.2%

Will Deacon 24 2.1%

Arnd Bergmann 16 1.4%

Marc Zyngier 14 1.2%

Stephen Boyd 13 1.1%

Takashi Iwai 10 0.9%

Paolo Bonzini 8 0.7%

Jens Axboe 8 0.7%

Darrick J. Wong 8 0.7%

Thomas Gleixner 6 0.5%

Ingo Molnar 6 0.5%

Jiri Kosina 6 0.5%

Merge contributors in 5.14
Linus Torvalds	384	33.8%
David S. Miller	230	20.2%
Olof Johansson	82	7.2%
Mark Brown	54	4.8%
Dave Airlie	52	4.6%
Greg Kroah-Hartman	35	3.1%
Rafael J. Wysocki	28	2.5%
Jakub Kicinski	28	2.5%
Bjorn Helgaas	25	2.2%
Will Deacon	24	2.1%
Arnd Bergmann	16	1.4%
Marc Zyngier	14	1.2%
Stephen Boyd	13	1.1%
Takashi Iwai	10	0.9%
Paolo Bonzini	8	0.7%
Jens Axboe	8	0.7%
Darrick J. Wong	8	0.7%
Thomas Gleixner	6	0.5%
Ingo Molnar	6	0.5%
Jiri Kosina	6	0.5%

Linus Torvalds tends to be notably absent from the statistics in these reports; after all, by his own admission, he does not write much code these days. The merge numbers show where part of his activity is, though; he handles hundreds of pull requests from subsystem maintainers, looks at each one (often more closely than one might expect), and does the merge if it seems like the right thing to do. In the process, he generates one-third of the merges in the kernel history.

There are, however, two ways that any given patch moves through the chain of subsystem maintainers. One is via pull requests, each of which will generate one of the merges seen in the above table. But, before that can happen, a maintainer somewhere must apply the patch to their Git repository to start the process. When that happens, the maintainer will apply a Signed-off-by tag to the patch. To see that aspect of maintainer activity, one needs to look at those tags when applied to patches written by somebody else; the result for 5.14 is:

Non-author signoffs in 5.14

David S. Miller 1625 11.0%

Greg Kroah-Hartman 1118 7.5%

Alex Deucher 867 5.8%

Mark Brown 541 3.6%

Andrew Morton 489 3.3%

Martin K. Petersen 332 2.2%

Paolo Bonzini 324 2.2%

Jens Axboe 324 2.2%

Mauro Carvalho Chehab 284 1.9%

Michael Ellerman 273 1.8%

Takashi Iwai 216 1.5%

Jason Gunthorpe 213 1.4%

Hans Verkuil 209 1.4%

Guangbin Huang 180 1.2%

Will Deacon 176 1.2%

Bjorn Andersson 170 1.1%

Arnaldo Carvalho de Melo 169 1.1%

Jakub Kicinski 154 1.0%

Jonathan Cameron 148 1.0%

Herbert Xu 144 1.0%

Non-author signoffs in 5.14
David S. Miller	1625	11.0%
Greg Kroah-Hartman	1118	7.5%
Alex Deucher	867	5.8%
Mark Brown	541	3.6%
Andrew Morton	489	3.3%
Martin K. Petersen	332	2.2%
Paolo Bonzini	324	2.2%
Jens Axboe	324	2.2%
Mauro Carvalho Chehab	284	1.9%
Michael Ellerman	273	1.8%
Takashi Iwai	216	1.5%
Jason Gunthorpe	213	1.4%
Hans Verkuil	209	1.4%
Guangbin Huang	180	1.2%
Will Deacon	176	1.2%
Bjorn Andersson	170	1.1%
Arnaldo Carvalho de Melo	169	1.1%
Jakub Kicinski	154	1.0%
Jonathan Cameron	148	1.0%
Herbert Xu	144	1.0%

Here, we see the maintainers who tend to apply patches directly rather than acting on pull requests; there are some names that appear on both but, in the end, this is a different list. The fact that David Miller appears at the top of both lists just confirms that he gets more done than just about anybody else — it is more than good to seem him apparently running at full capacity again. It also reflects a style of work that involves applying patches to topic branches, then merging them into the local trunk to send upstream; each patch series applied generates a set of non-author signoffs and a merge commit. Other maintainers apply patches directly to upstream-bound branches and do not generate these merges.

Either way, the maintainers who shepherd patches through the system are performing a crucial function within the kernel development process. Without this work, all of those developers cranking out patches would have no place to send them. Maintainership is hard and often unrewarding work; we all owe them some gratitude for keeping this whole development process going.

All told, the kernel development community appears to be continuing to operate smoothly at its usual fast pace. As of this writing, the 5.15 development cycle is already underway with large amounts of work queued to be merged. We will see the above story repeat, with variations of course, over the next nine or ten weeks.

Comments (10 posted)

Cooperative package management for Python

By Jake Edge
August 31, 2021

A longstanding tug-of-war between system package managers and Python's own installation mechanisms (primarily pip, but there are others) looks on its way to being resolved—or at least regularized. PEP 668 ("Graceful cooperation between external and Python package managers") has been created to provide ways for the two types of package installation to work together, rather than at cross-purposes at times. Since many operating systems depend on Python tools, with package versions that may differ from those of users' Python applications, making them play together nicely should result in more stable systems.

The root cause of the problem is that distribution package managers and Python package managers ("pip" is shorthand to refer to those throughout the rest of the article) often share the same "site‑packages" directory for storing installed packages. Updating a package, or, worse yet, removing one, may make perfect sense in the context of the specific package manager, but completely foul up the other. As the PEP notes, that can cause real havoc:

This may pose a critical problem for the integrity of distros, which often have package-management tools that are themselves written in Python. For example, it's possible to unintentionally break Fedora's dnf command with a pip install command, making it hard to recover.

The sys.path system parameter governs where Python looks for modules when it encounters an import statement; it gets initialized from the PYTHONPATH environment variable, with some installation- and invocation-specific directories added. sys.path is a Python list of directories that get consulted in order, much like the shell PATH environment variable that it is modeled on. Python programs can manipulate sys.path to redirect the search, which is part of what makes virtual environments work.

Using virtual environments with pip, instead of installing packages system-wide, has been the recommended practice to avoid conflicts with OS-installed packages for quite some time. But it is not generally mandatory, so users sometimes still run into problems. One goal of PEP 668 is to allow distributions to indicate that they provide another mechanism for managing Python packages, which will then change the default behavior of pip. Users will still be able to override that default, but that will hopefully alert them to the problems that could arise.

A distribution that wants to opt into the new behavior will tell pip that it manages Python packages with its tooling by placing a configuration file called EXTERNALLY‑MANAGED in the directory where the Python standard library lives. If pip finds the EXTERNALLY‑MANAGED file there and is not running within a virtual environment, it should exit with an error message unless the user has explicitly overridden the default with command-line flag; the PEP recommends ‑‑break‑system‑packages for the flag name. The EXTERNALLY‑MANAGED file can contain an error message that pip should return when it exits due to those conditions being met; the messages can be localized in the file as well. The intent is for the message to give distribution-specific information guiding the user to the proper way to create a virtual environment.

Another problem that can occur is when packages are removed from system-wide installs by pip. If, for example, the user installs a package system-wide and runs into a problem, the "obvious" solution to that may cause bigger problems:

There is a worse problem with system-wide installs: if you attempt to recover from this situation with sudo pip uninstall, you may end up removing packages that are shipped by the system's package manager. In fact, this can even happen if you simply upgrade a package - pip will try to remove the old version of the package, as shipped by the OS. At this point it may not be possible to recover the system to a consistent state using just the software remaining on the system.

A second change proposed in the PEP would limit pip to only operating on the directories specified for its use. The idea is that distributions can separate the two kinds of packages into their own directories, which is something that several Linux distributions already do:

For example, Fedora and Debian (and their derivatives) both implement this split by using /usr/local for locally-installed packages and /usr for distro-installed packages. Fedora uses /usr/local/lib/python3.x/site‑packages vs. /usr/lib/python3.x/site‑packages. (Debian uses /usr/local/lib/python3/dist‑packages vs. /usr/lib/python3/dist‑packages as an additional layer of separation from a locally-compiled Python interpreter: if you build and install upstream CPython in /usr/local/bin, it will look at /usr/local/lib/python3/site‑packages, and Debian wishes to make sure that packages installed via the locally-built interpreter don't show up on sys.path for the distro interpreter.)

So the proposal would require pip to query the location where it is meant to place its packages and only modify files in that directory. Since the locally installed packages are normally placed ahead of the system-wide packages on sys.path, though, this can lead to pip "shadowing" a distribution package. Shadowing an installed package can, of course, lead to some of the problems mentioned, so it is recommended that pip emit a warning when this happens.

The PEP has an extensive analysis of the use cases and the impact these changes will have. "The changed behavior in this PEP is intended to 'do the right thing' for as many use cases as possible." In particular, the changes to allow distributions to have two different locations for packages and for pip not to change the system-wide location are essentially standardizing the current practice of some distributions. The "Recommendations for distros" section of the PEP specifically calls out that separation as a best practice moving forward.

There are situations where distributions would not want to default to this new behavior, however. Containers for single applications may not benefit from the restrictions, so the PEP recommends that distributions change their behavior for those container images:

Distros that produce official images for single-application containers (e.g., Docker container images) should remove the EXTERNALLY‑MANAGED file, preferably in a way that makes it not come back if a user of that image installs package updates inside their image (think RUN apt‑get dist‑upgrade). On dpkg-based systems, using dpkg‑divert ‑‑local to persistently rename the file would work. On other systems, there may need to be some configuration flag available to a post-install script to re-remove the EXTERNALLY‑MANAGED file.

In general, the PEP seems not to be particularly controversial. The PEP discussion thread is positive for the most part, though Paul Moore, who may be the PEP-Delegate deciding on the proposal, is concerned that those affected may not even know about it:

One thing I would be looking for is a bit more discussion - the linux-sig discussion mentioned was only 6 messages since May, and there's only a couple of messages here. I'm not convinced that "silence means approval" is sufficient here, it's difficult to be sure where interested parties hang out, so silence seems far more likely to imply "wasn't aware of the proposal" in this case. In fact, I'd suggest that the PEP gets a section listing distributions that have confirmed their intent to support this proposal, including the distribution, and a link to where the commitment was made.

Assuming said confirmations are forthcoming, or that any objections and suggestions can be accommodated, PEP 668 seems like a nice step forward for Python. Having tools like DNF and apt fight with pip and others is obviously a situation that has caused problems in the past and will do so again. Finding a way to cooperate without causing any major backward-compatibility headaches is important. Ensuring that other distributions are on-board with these changes, all of which are ultimately optional anyway, should lead to more stability and, ultimately, happier users—both for Python and for the distributions.

Comments (28 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Briefs: Linux 5.14; Realtime preemption locking core; FSF copyright handling; Quotes; ...
Announcements: Newsletters; conferences; security updates; kernel patches; ...

Next page: Brief items>>