|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for January 30, 2020

Welcome to the LWN.net Weekly Edition for January 30, 2020

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Cryptography and elections

By Jake Edge
January 28, 2020

LCA

Transparent and verifiable electronic elections are technically feasible, but for a variety of reasons, the techniques used are not actually viable for running most elections—and definitely not for remote voting. That was one of the main takeaways from a keynote at this year's linux.conf.au given by University of Melbourne Associate Professor Vanessa Teague. She is a cryptographer who, along with her colleagues, has investigated several kinds of e-voting software; as is probably not all that much of a surprise, what they found is buggy implementations. She described some of that work in a talk that was a mix of math with software-company and government missteps; the latter may directly impact many of the Australian locals who were in attendance.

She began by noting that the "cheerful title" of her talk, "Who cares about democracy?", was hopefully only a rhetorical question, which elicited some, perhaps slightly nervous, laughter. The cryptographic algorithms and protocols that can provide step-by-step proof that a voter's intent was correctly gathered and that the vote was counted do exist, but the assumptions that need to be made about user behavior make them too difficult to use for government elections. It is unreasonable to expect that voters will take the fairly onerous actions to actually verify those steps; "it's too easy to trick people". These kinds of systems do not "adequately defend against bugs and fraud for serious elections".

End-to-end verifiable elections

The details of these techniques are complicated, she said, but "the principle is really simple": the system should "provide evidence that it has done the right thing with the data at every stage of the process". (The YouTube video of the talk shows her slides, which have some pictures that give an overview of the scheme.) Voters should be able to check that it has done so by using their own code—or code provided by organizations they trust. When voters use a machine that they may not trust to vote, they should get some kind of receipt that can be used to verify that their vote has been recorded accurately. That receipt can then be checked on some other device to ensure that the vote stored in the encrypted receipt is, in fact, the choices they wanted to make.

[Vanessa Teague]

The next stage is a public bulletin board that lists the encrypted votes that have been recorded for the election. It would need to have some "fancy features" to ensure that changes could not be made to the list once it has been posted, but it could effectively be just a web page with a list of encrypted votes recorded. If there was no concern for privacy, those votes could simply be decrypted and tallied. But since there is normally a need for secret ballots, the votes, which have identity information attached to them, cannot just be decrypted.

So a series of "mixing servers" would be used to shuffle the votes (without the identity information) and re-encrypt them, to break the link between who voted and for what. When you mix a bunch of paper ballots in a box, you don't know what shuffle was applied to the votes, but that is not true for electronic shuffling. So there should be multiple mixing servers, each using its own algorithm, and, ideally, each being run by a different organization; as long as those organizations are unlikely to all collude on the outcome (e.g. political parties) the mixing will break that link.

There is still a problem, however: whether those mixing servers also "fiddled the contents" while doing the shuffle. There is some "fancy crypto" for ensuring that the contents have not been altered, which would allow a mathematical proof of honest mixing to be published on the bulletin board.

The final step is for some kind of election authority to use the secret key to decrypt the votes and tally the results. That is, of course, another step where things can go awry. Once again, though, there are cryptographic methods that allow the authority to prove that it has not altered the votes while decrypting and counting them. That proof would also be posted to the board.

She reiterated that the algorithms are complicated, but that the technique provides a kind of openness, though one that is different from what the open-source community normally envisions. It is not necessary and, in fact, not sufficient for the different organizations to use open-source software to perform these steps—for one thing, it is impossible to be sure they have used the software that they claim to have used. But all of the steps are verifiable using open-source (or other) code that does not rely on the software used in the process. It relies on voters actually verifying all of those steps, however.

Analyzing an implementation

She and two colleagues looked at the Swiss Post e-voting system, which was provided by a company called "Scytl", last year. It does not provide end-to-end verifiability, but does provide shuffling and accurate-decryption proofs. As their report notes, it is meant to provide "complete verifiability", rather than "universal verifiability"; "complete" means that there is an assumption that at least one of the server-side systems is behaving correctly.

In an outcome that will probably only surprise politicians that push (and purchase) these kinds of systems, Teague and her colleagues, when looking at the code, found a bug in each part of what the system was meant to be able to prove. "There were only two things that they were supposed to prove and they were both buggy." In this context, "buggy" means that it can provide a proof that everything has been done correctly, "while actually fiddling the votes".

She went into some detail of the problem that they found in the accurate-decryption proof, which uses a Chaum-Pedersen proof of the equality of discrete logs. Scytl had "rolled its own" implementation that did not hash an important element of the calculation during the process, so that value could be calculated after the "proof" had been presented without altering its verifiability.

As it turned out, one of her colleagues had made the same mistake and written a paper [PDF] describing the problem and its implications a few years earlier. That made it easy for them to see the problem when looking at the code; in fact, it took longer for them to get the code to compile and run than it did to find the problem. In practice, changing votes in this fashion would be obvious, because it would result in nonsense votes, though they would cancel out valid votes.

Fully end-to-end verifiable e-voting is possible, but when you start considering government elections using a system like that, "there are some serious concerns and limitations", Teague said. For one thing, voters need to do a lot of careful and complicated work to verify that their vote has been properly encrypted; if they do not, they could be tricked into recording a vote that does not accurately reflect their intentions. In addition, the system can allow voters to prove how they voted, if it was done on, say, a home computer, which may not be a desirable outcome; e-voting in a polling place largely eliminates that particular concern, however.

Beyond that, verifying the proofs of proper mixing and accurate decryption is quite complicated; it requires significant expertise, which may not be acceptable for a democracy. Subtle bugs can partly or completely undermine the security properties of the system as she had shown. This is different from traditional paper voting in that a subtle problem in a system like that does not hand over the ability for "total manipulation of all of the votes to one entity".

In summary, she said, there are reasonable solutions for doing e-voting in a polling place, but remote e-voting in a way that "really safeguards the election against manipulation and software bugs" is an unsolved problem. "That's the good news. That is the most optimistic part of my talk."

NSW iVote

She turned to the iVote system for remote e-voting that was put in place by the state government of New South Wales (NSW) in Australia. She asked, what does end-to-end verifiability have to do with iVote? "Really, not a lot."

For iVote, voters use JavaScript provided by Scytl to cast their vote. In order to verify it, however, they have to use a different application that the NSW Election Commission (NSWEC) and Scytl provided. That application was closed source up through the election, so you had to trust Scytl/NSWEC to tell the truth when you asked them if they had recorded your vote correctly. Four months after the election, some of the code was made available under a "reasonable" non-disclosure agreement (NDA) so that it could be inspected. But at election time, there was no proof that the vote was recorded correctly.

The public bulletin board from her idealized view of an end-to-end verifiable election was replaced with a secret bulletin board that listed which votes were recorded; it was available within NSWEC, but not to ordinary citizens. So there was no proof that a given vote had even been counted. Likewise, the series of independently administered mixing servers was replaced with a single mixer, so paths from voters to votes could potentially be traced. It is a far cry from even the idealized system that she had just argued could not be trusted for a government election.

She and her colleagues were already looking at the code from Scytl, in the context of the Swiss Post analysis, when the first iVote election was taking place. The Swiss had put the code out for review six months ahead of the first election so that any problems could be found and fixed before it was used. At the time the researchers found the first problem with the Swiss Post system (for the shuffle proof), NSWEC was already using iVote for early voting.

When the Swiss Post bug went public, NSWEC released a statement that said, in effect, "yes, we do have that critical Scytl crypto defect, but don't worry, we're going to use it anyway", she said to laughter. Scytl patched the software to fix that bug during the election, however.

The second bug that was found in the Swiss Post system (that she had detailed earlier in the talk) came to light a week or so after the first; given that the software came from the same company, was iVote susceptible to that bug as well? The day before the election using iVote, NSWEC put out a release [PDF] saying that it was confident that the second issue found for the Swiss Post system was not relevant to iVote. "This was highly implausible at the time." She could imagine a situation where neither of the bugs were relevant, but it simply was not reasonable to suggest that a second bug found in the same code base would not be present in the NSWEC version.

At the time of the election, anyone wanting to see the code had to sign an onerous NDA that said you could not release any of your findings for five years. She and her colleagues did not sign that agreement, so they could not see the code. Four months after the election (so, mid-2019), it was changed to have a "45-day quiet period", which was far more reasonable.

NSW is supposed to have laws mandating public access to source code, she said, but that does not apply to voting software. Part of what she wanted to convey in her talk is that "laws about election software really matter". Switzerland has good laws of that sort, she said, "New South Wales has very bad laws". She showed the laws for election software security, noting that they say nothing about actually securing that software; instead, they impose penalties of up to two years in jail for disclosing the source code or flaws in the security of the voting software without approval. It is not just some part of a larger body of law requiring safe election practices, "this is what substitutes for legislation mandating secure and transparent electoral processes".

Unsurprisingly, the bug found in the Swiss Post system was relevant to iVote as they discovered later. The obvious correct fix for that bug was to simply add the relevant parameter into the hash, but Scytl took a completely different approach. They chose a random value, raised the missing parameter to that power, and hashed that instead. But that does not fix the problem. The parameter needs to be hashed so that it cannot be calculated later; hashing it commits to a particular value. Raising it to a random power, without hashing (and thus committing to) that random value, still allows the important parameter to be post-calculated.

If the code had been available for analysis before the election, rather than four months after, the problem could have been spotted in short order. Instead, NSWEC ran the election with flawed, manipulable software because they "did not make it openly available to scrutiny by people who knew what they were doing". She suggested that attendees look for themselves on the Scytl web site that allows one to apply for access to the iVote code. She only spent a short amount of time verifying the problems they had already found, and discovering that the fix was no fix, but there is "heaps and heaps of code there"; finding bugs in that code and getting them fixed will incrementally help secure the system before it gets used again.

Victoria

She asked the audience how many were from NSW (many) and if they had used iVote, but seemingly no one had (or would admit to it). She then asked how many were from the state of Victoria (as she is). She asked, "feeling smug?" That was met with groans, nos, and scattered laughter as attendees guessed that things there were probably not much better—or perhaps even worse.

At the end of 2019, a law passed the lower chamber of the legislative body for Victoria that would allow the Minister for Elections to choose a voting system that would be universally applied to all local elections (i.e. not for federal offices). It allows the Minister to choose postal voting, which has a body of laws already to govern election security and privacy, attendance voting, which has an even older set of laws surrounding it, or "anything else he feels like with no restrictions whatsoever".

It is clear what the agenda is, Teague said; there is no groundswell for voting with "bronze discs or pottery shards". It is meant to facilitate "large scale, totally unregulated electronic voting compulsorily for everyone" who is voting in local elections in Victoria. The legislation is coming up for a vote in the upper body in February; she suggested that attendees from Victoria figure out who their five representatives are and contact them to request the removal of that clause. "Once we lose the ability to have genuine democratic local elections in Victoria, we are never getting it back."

Risk-limiting audits

She then shifted gears to talk about a pilot project that she participated in to do a risk-limiting audit for a San Francisco district attorney election. Instead of talking about electronic voting, the second half of her keynote would be about electronic counting of paper votes, which is an easier problem in every way, Teague said. Assuming that the paper records have been properly secured along the way, there exists a full record of voters' intents that can be double-checked against the tallies provided by counting devices.

In jurisdictions that use paper ballots in the US, that kind of double-checking is often done; "serious statistical audits of those paper records" are performed to ensure that the electronic counting system is accurate. There are a number of sophisticated statistical analysis techniques that can be used to give a level of confidence that the outcome has not been subject to malware or bugs in the counting systems. In her opinion, the best technique is risk-limiting audits.

At the beginning of an election, the auditors will need to decide on the probability of being tricked they are "willing to live with"; if you are doing random sampling, there is always some possibility that the statistics do not reflect the results. Then there are calculations that can be made to determine how to do the sampling such that those constraints are met; if the sampling indicates that the reported outcome is not as it should be, then a full count by hand should be done.

These audits are used in several states, but the US has a "really weird primitive voting system" where a single candidate is chosen by each voter and the one with the most votes wins. The audit is just checking to ensure that the declared winner is actually the candidate with the most votes. But some US jurisdictions are starting to use instant-runoff voting, "which in Australia is called 'voting'", she said to laughter. It is a great voting system, she said, but it breaks the risk-limiting audit frameworks that are in place.

She got involved in a project to take the Australian understanding of preference voting and to combine it with the ideas of risk-limiting audits for the most recent election in San Francisco, which was held in November 2019. (LWN covered a talk about other aspects of San Francisco election handling back in August.) The district attorney election had four candidates and the results were quite close.

She and her colleagues in the project decided to test their risk-limiting audit techniques on that race. They only got access to the postal votes, which turned out to be about two-thirds of the votes cast. So the audit did not actually provide evidence that the outcome of the election was correctly decided, but it demonstrated how the audit could work in an election with preferential voting. It turns out that the outcome when considering just the postal votes is the opposite of the outcome when looking at all of the votes, because polling-place voters overwhelmingly chose the winning candidate. "That doesn't really affect the math, nor does it affect how this thing, in its ideal form, should work."

The idea behind the project is to provide a set of "completely open, publicly available verification and auditing tools". The actual counting machines are a kind of scanner and digitizer that comes from a proprietary company (Dominion) that she does not trust any more than she does Scytl. There is no opportunity to look at the code, but they did get the report from that system, which detailed the choices counted on each ballot; "the seventh ballot in bag 52 has this list of preferences".

Using that list, a random selection of ballots is chosen to be audited; those will be examined and compared with what the counting system reported. A tally of discrepancies is then maintained. Then there are two different sets of math analysis that need to be done.

The first is to carefully determine which parts of the preferences are important in terms of the outcome. The order in which the losing candidates are eliminated can sometimes make a big difference in the overall outcome (because the next level of voter preference is then boosted from ballots that preferred the eliminated candidate). Switching the order can cascade into a completely different winner, though usually that is not the case; normally, the low vote-getters can be eliminated in any order and it does not affect the outcome. The tool that they developed looked at the election and determined which of the comparisons were critical and then ignored the rest (for auditing purposes).

The second piece is "to do the careful statistics for eliminating the possibility that the comparison was made wrongly". Both pieces are available as open-source software: Risk-limiting Audits for Instant Runoff vote Elections (RAIRE) and Sets of Half-Average Nulls Generate Risk-Limiting Audits (SHANGRLA). Citizens in San Francisco were able to observe the auditing process, see which ballots were chosen to be sampled, compare them to the reported preferences, and then run the calculations themselves if they wished. The code can be used by others to reproduce the results or to add different types of statistical measures, for example.

Back to Oz

So, like the mechanisms to verify every step of an e-voting election, an electronically counted election can also be verified using an open process, even though the software running on the devices may not be available for scrutiny. But, she asked, what does this have to do with the Australian senate count? "You probably have already guessed—not a lot."

She went over some testimony by the electoral commissioner about the integrity of the senate count. Pointed questions were asked about whether any manual auditing of the ballots had been done and whether the error rate for data entry had been estimated, but the commissioner said that he would need to look into both of those questions. But when asked about whether the commission had followed the recommendations of the last inquiry by the Australian Parliament about the senate elections, which were to appoint expert scrutineers to observe the ballot-counting process, his memory suddenly returned. Parliament had only recommended that action, not required it by law.

There is an important element to his answer, Teague said: actual legislation is needed to ensure that voting is secure. The Swiss e-voting laws are detailed and well thought-out, with an orientation toward transparency of the process, privacy for the voters, and the ability to verify the results; they are not perfect, but they are quite good. Switzerland found out about its problems as soon as it forced the source code to be open, she said to applause.

In contrast, New South Wales found out about its problems 12 days before the election. California has "pretty decent laws about election auditing". But the Australian senate scrutineering rules have not been updated since the days of hand-counted paper ballots, which was a long time ago. In effect, those rules say that scrutineers must be allowed to stand in the room while "they do their thing on their computers". It does not say anything about providing meaningful evidence that those computers are doing the right thing.

What she believes is needed for the senate count is a mandate for a statistical audit of the paper ballots that can be meaningfully observed by scrutineers. There is no need for "the perfect law", just for something that requires auditing and allows public oversight. It seems rather obvious that the electoral commission is not going to do so unless they are forced, so a legislative mandate for that is needed, she said.

The senate count is a place where "there is a risk for undetectable electoral fraud", she said, particularly in the last few seats where "lots and lots of preference shifting" has been needed to decide them. It would be pretty difficult to fiddle with the tallies for the top vote-getters from the major parties, she said, but the preferences further down the list could be more easily manipulated. Unlike some e-voting systems, such as iVote, the senate count is in a much better position to be audited because the paper records exist.

A new project?

She finished "on a slightly upbeat note"; rather than spending 2020 by picking holes in bad software, she thought it might be better to work on a new project. "Democracy is about more than just elections." She spent some time working to defeat—or even just amend—the "shocking train wreck that was our anti-encryption laws" that were passed at the end of 2018. Working to try to convince legislators is not something she ever wants to be involved in again, but it did get her thinking.

The legislative process in Australia is fairly open; the text of the proposed bills is available online under a Creative Commons license. What if there were an online amendment and voting site that would allow cryptographically verified voting on the bills, amendments, and the like. It would allow people to more actively participate in the process and to provide input in ways other than "writing 50,000 submissions" to various committees that "ignore what we write".

She thinks a version could also be rolled out for teenagers so that they could practice debating various issues of the day. The e-voting verification techniques could be used to give users some level of assurance that the votes are being tallied correctly, while familiarizing people with the approach. The lack of an Australian digital ID will make it difficult to trust the aggregation, since the identities of the participants cannot be verified, however. In any case, she would like to pursue something down this path and solicited attendees to tell her that it is a bad idea or to give suggestions for better ones.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Gold Coast for linux.conf.au.]

Comments (171 posted)

Fedora gathering requirements for a Git forge

By Jake Edge
January 29, 2020

Fedora currently uses Pagure to host many of its Git repositories and to handle things like documentation and bug tracking. But Pagure is maintained by the Red Hat Community Platform Engineering (CPE) team, which is currently straining under the load of managing the infrastructure and tools for Fedora and CentOS, while also maintaining the tools used by the Red Hat Enterprise Linux (RHEL) team. That has led to a discussion about identifying the requirements for a "Git forge" and possibly moving away from Pagure.

The conversation started with a post on the Fedora devel mailing list from Leigh Griffin, who is the manager of the CPE team. That message was meant to call attention to a blog post that described the problems that Pagure poses for the CPE team and the path toward finding a solution using the Open Decision Framework (ODF). "We will be seeking input and requirements in an open and transparent manner on the future of a git forge solution which will be run by the CPE team on behalf of the Fedora Community."

The problem statement describes a number of areas where Pagure does not fit within the mission statement for CPE, its current priorities, and the workload for the team. Due to a lack of developer time, Pagure is also falling further and further behind the other Git forges in terms of its feature set. While Pagure was originally developed by CPE member Pierre-Yves Chibon ("pingou") and has been maintained by him and others, largely in their "spare" time, it has languished more recently:

The CPE team has been unable to commit a development team to Pagure for several months now. This situation is unlikely to change based on our current known priorities.

The choices available section lists three possibilities: GitHub, GitLab, and Pagure.

There are no other forges that we could find that had both the product maturity and standing in open source communities, therefore no other solutions are under consideration as the three choices here represent the main git forge solutions on the market.

The idea, then, is to gather requirements for a forge from the various users and stakeholders, then to make an "informed decision" on which of the options to pursue.

To be clear, the outcome here may be a decision to invest heavily in Pagure to meet the requirements or it may be to opt for another git forge to meet the requirements. No option is off the table.

Listing GitHub, and to a lesser extent GitLab, did not sit well with quite a few who commented in the thread. GitHub is closed source, while GitLab is open core—neither of which are seen as wholly compatible with Fedora. Fabio Valentini asked about GitHub and said that he did not "want to see fedora use a closed-source product for such a core service". Beyond that, he is quite happy with Pagure:

Can we please keep pagure? It already has the fedora-specific features we need, and I don't mind a "slow" pace of development. In my experience, it works really well, and I actually *like* to use it (which is not true for GitLab ... which is slow and horrible)

Michael Catanzaro also thought that GitHub should not even be under consideration, which would reduce the choice to either Pagure or the GitLab Community Edition (CE):

Well since we have a request for requirements: I propose requirements #1 and #2 are to be self-hosted and open source. I'm suspect the Fedora community would be outraged if we fail to meet either requirement.

So if we can agree on that much, then we can avoid wasting time by including GitHub in the list of options. That would bring us to a choice between GitLab CE and Pagure. (Are there any other serious options?)

There were suggestions for the lesser-known Sourcehut, Gitea, Gogs, and Phabricator forges as possible other options. Opinions differed on whether those would fit the bill, but, like any non-Pagure solution, they would certainly need to be customized for the Fedora use cases. Michal Konecny did not see that as a way forward:

If we go this way, in a few years we will end up in the same situation as with Pagure today. We will have many custom patches (which we need to take care of) and we will not have manpower to compete with the features of other major git forges.

For GitLab, there is the concern that the company will eventually go in a different direction, which could leave users of the community edition behind. As Bruno Wolff III put it:

Gitlab is open core, which means they have a conflict of interest when accepting improvements from outsiders. They could also stop supporting the open core at any time, leaving users who want infrastructure running on free software in a poor place.

Much of that discussion was not directly addressing what the CPE team is looking for, however. Clement Verna, instead, differentiated the two separate use cases for how Fedora is using Pagure today. There is pagure.io, which is a collection of various repositories for code, documentation, or simply to have a bug tracker, and the package sources repository (src.fedoraproject.org) that is more tightly integrated with Fedora services and practices. The only real requirement for the former is integration with the Fedora Account System (FAS) to allow for single sign-on, he said, while the latter has a whole host of needs:

[...] it needs to be able to integrate with Fedora FAS but also to have the FAS group synced, branch ACLs, a way to integrate with release-monitoring, a way to integrate with bugzilla, a way to integrate with fedora-messaging (RabbitMQ), .... In general I think most of the integration with our infrastructure can be done with any solution either using the solution APIs or plugins system. After we need to compare the cost of developing and maintaining these pieces of glue to integrate everything against the current situation.

Julen Landa Alustiza also split the two use cases; he agreed with Catanzaro that the solution should be open source and self-hosted, though he thought that "self-hosted" could be relaxed if there were a suitable "open source friendly service provider". He added some requirements for both use cases, including privacy and ease of onboarding, along with requirements for each individual use case. Integration with existing Fedora services, such as Bodhi and Koji, is particularly important for the distribution package source repository (which he and others referred to as "distgit").

For pagure.io, he further split the repositories into those used largely for bug tracking (perhaps containing a bit of documentation) and those that host code projects. He had specific suggestions of requirements for each. Standard bug-tracking features and documentation integration are needed for that type of repository, while the code repositories need good searching capabilities as well as a way to see at a glance which repositories are actively changing and which might be good places for newcomers to get started.

Aleksandra Federova suggested splitting off the code-review portion into its own piece. She sees the distribution source repository as a "centrally managed code-review platform". GitHub and GitLab have a workflow that is different from Fedora's; the two styles may clash:

Git Forges play a lot with the idea of users being able to create their own forks of the repository, their own projects, with their own rules. src.fp.org is the integrated platform where Fedora rules are in action. This is different from the use case KDE and Gnome have, as they manage development of projects, while we manage the integration.

Her suggestion of using Gerrit for code review was not met with much in the way of enthusiasm, but her focus was largely on the two big Git forges under consideration. For Pagure, much of the Fedora-specific work has already been done, though there is still plenty more to do. Her point is that a switch away from Pagure will need to address the impedance mismatch between GitHub/GitLab and Fedora's practices.

The discussion was somewhat unfocused; the idea of moving away from Pagure at all seems to have taken many by surprise. In addition, using the Open Decision Framework is new to some, so it was not clear what the next steps might be. Adam Saleh asked Griffin: "[...] how do we actually get a list of requirements?". Griffin said that the discussion was helping to gather that information, but that it would need to be reshaped:

This thread is serving as a source of requirements (although it has meandered dramatically away from that) but I will default to the Fedora Council for how a combined set from the input in this thread and others is collated and presented. When all requirements are gathered from all stakeholders I will share the distilled version out.

It is a bit hard to imagine Fedora moving to GitHub for several reasons: the closed-source nature of the platform and the inability to easily integrate it with the distribution's practices are two obvious ones. At first blush, GitLab CE might make a reasonable fit, though it too would need lots of customization. Given that Pagure is mostly filling the role needed at this point, and avoids the pitfalls of either proprietary or open-core code, it would seem to be the obvious answer.

Since any solution will seemingly require extra effort on the part of the CPE team, making a formal decision in favor of Pagure may make it easier for Red Hat to allocate the people and time needed. Neither of the other two possibilities would appear to provide much in the way of development and maintenance time-saving, but whether that is true will become clearer as the process progresses. It should be interesting to watch.

Comments (22 posted)

The rapid growth of io_uring

By Jonathan Corbet
January 24, 2020
One year ago, the io_uring subsystem did not exist in the mainline kernel; it showed up in the 5.1 release in May 2019. At its core, io_uring is a mechanism for performing asynchronous I/O, but it has been steadily growing beyond that use case and adding new capabilities. Herein we catch up with the current state of io_uring, where it is headed, and an interesting question or two that will come up along the way.

Classic Unix I/O is inherently synchronous. As far as an application is concerned, an operation is complete once a system call like read() or write() returns, even if some processing may continue behind its back. There is no way to launch an operation asynchronously and wait for its completion at some future time — a feature that many other operating systems had for many years before Unix was created.

In the Linux world, this gap was eventually filled with the asynchronous I/O (AIO) subsystem, but that solution has never proved to be entirely satisfactory. AIO requires specific support at the lower levels, so it never worked well outside of a couple of core use cases (direct file I/O and networking). Over the years there have been recurring conversations about better ways to solve the asynchronous-I/O problem. Various proposals with names like fibrils, threadlets, syslets, acall, and work-queue-based AIO have been discussed, but none have made it into the mainline.

The latest attempt in that series is io_uring, which did manage to get merged. Unlike its predecessors, io_uring is built around a ring buffer in memory shared between user space and the kernel; that allows the submission of operations (and collecting the results) without the need to call into the kernel in many cases. The interface is somewhat complex, but for many applications that perform massive amounts of I/O, that complexity is paid back in increased performance. See this document [PDF] for a detailed description of the io_uring API. Use of this API can be somewhat simplified with the liburing library.

What io_uring can do

Every entry placed into the io_uring submission ring carries an opcode telling the kernel what is to be done. When io_uring was added to the 5.1 kernel, the available opcodes were:

IORING_OP_NOP
This operation does nothing at all; the benefits of doing nothing asynchronously are minimal, but sometimes a placeholder is useful.

IORING_OP_READV
IORING_OP_WRITEV
Submit a readv() or write() operation — the core purpose for io_uring in most settings.

IORING_OP_READ_FIXED
IORING_OP_WRITE_FIXED
These opcodes also submit I/O operations, but they use "registered" buffers that are already mapped into the kernel, reducing the amount of total overhead.

IORING_OP_FSYNC
Issue an fsync() call — asynchronous synchronization, in other words.

IORING_OP_POLL_ADD
IORING_OP_POLL_REMOVE
IORING_OP_POLL_ADD will perform a poll() operation on a set of file descriptors. It's a one-shot operation that must be resubmitted after it completes; it can be explicitly canceled with IORING_OP_POLL_REMOVE. Polling this way can be used to asynchronously keep an eye on a set of file descriptors. The io_uring subsystem also supports a concept of dependencies between operations; a poll could be used to hold off on issuing another operation until the underlying file descriptor is ready for it.

That functionality was enough to drive some significant interest in io_uring; its creator, Jens Axboe, could have stopped there and taken a break for a while. That, however, is not what happened. Since the 5.1 release, the following operations have been added:

IORING_OP_SYNC_FILE_RANGE (5.2)
Perform a sync_file_range() call — essentially an enhancement of the existing fsync() support, though without all of the guarantees of fsync().

IORING_OP_SENDMSG (5.3)
IORING_OP_RECVMSG (5.3)
These operations support the asynchronous sending and receiving of packets over the network with sendmsg() and recvmsg().

IORING_OP_TIMEOUT (5.4)
IORING_OP_TIMEOUT_REMOVE (5.5)
This operation completes after a given period of time, as measured either in seconds or number of completed io_uring operations. It is a way of forcing a waiting application to wake up even if it would otherwise continue sleeping for more completions.

IORING_OP_ACCEPT (5.5)
IORING_OP_CONNECT (5.5)
Accept a connection on a socket, or initiate a connection to a remote peer.

IORING_OP_ASYNC_CANCEL (5.5)
Attempt to cancel an operation that is currently in flight. Whether this attempt will succeed depends on the type of operation and how far along it is.

IORING_OP_LINK_TIMEOUT (5.5)
Create a timeout linked to a specific operation in the ring. Should that operation still be outstanding when the timeout happens, the kernel will attempt to cancel the operation. If, instead, the operation completes first, the timeout will be canceled.

That is where the io_uring interface will stand as of the final 5.5 kernel release.

Coming soon

The development of io_uring is far from complete. To see that, one need merely look into linux-next to see what is queued for 5.6:

IORING_OP_FALLOCATE
Manipulate the blocks allocated for a file using fallocate()

IORING_OP_OPENAT
IORING_OP_OPENAT2
IORING_OP_CLOSE
Open and close files

IORING_OP_FILES_UPDATE
Frequently used files can be registered with io_uring for faster access; this command is a way of (asynchronously) adding files to the list (or removing them from the list).

IORING_OP_STATX
Query information about a file using statx().

IORING_OP_READ
IORING_OP_WRITE
These are like IORING_OP_READV and IORING_OP_WRITEV, but they use the simpler interface that can only handle a single buffer.

IORING_OP_FADVISE
IORING_OP_MADVISE
Perform the posix_fadvise() and madvise() system calls asynchronously.

IORING_OP_SEND
IORING_OP_RECV
Send and receive network data.

IORING_OP_EPOLL_CTL
Perform operations on epoll file-descriptor sets with epoll_ctl()

What will happen after 5.6 remains to be seen. There was an attempt to add ioctl() support, but that was shot down due to reliability and security concerns. Axboe has, however, outlined a way in which support for specific ioctl() operations could be added on a case-by-case basis. One can imagine that, for example, the media subsystem, which supports a number of performance-sensitive ioctl() operations, would benefit from this mechanism.

There is also an early patch set adding support for splice().

An asynchronous world

All told, it would appear that io_uring is quickly growing the sort of capabilities that were envisioned many years ago when the developers were talking about thread-based asynchronous mechanisms. The desire to avoid blocking in event loops is strong; it seems likely that this API will continue to grow until a wide range of tasks can be performed with almost no risk of blocking at all. Along the way, though, there may be a couple of interesting issues to deal with.

One of those is that the field for io_uring commands is only eight bits wide, meaning that up to 256 opcodes can be defined. As of 5.6, 30 opcodes will exist, so there is still plenty of room for growth. There are more than 256 system calls implemented in Linux, though. If io_uring were to grow to the point where it supported most of them, that space would run out.

A different issue was raised by Stefan Metzmacher. Dependencies between commands are supported by io_uring now, so it is possible to hold the initiation of an operation until some previous operation has completed. What is rather more difficult is moving information between operations. In Metzmacher's case, he would like to call openat() asynchronously, then submit I/O operations on the resulting file descriptor without waiting for the open to complete.

It turns out that there is a plan for this: inevitably it calls for ... wait for it ... using BPF to make the connection from one operation to the next. The ability to run bits of code in the kernel at appropriate places in a chain of asynchronous operations would clearly open up a number of interesting new possibilities. "There's a lot of potential there", Axboe said. Indeed, one can imagine a point where an entire program is placed into a ring by a small C "driver", then mostly allowed to run on its own.

There is one potential hitch here, though, in that io_uring is an unprivileged interface; any necessary privilege checks are performed on the actual operations performed. But the plans to make BPF safe for unprivileged users have been sidelined, with explicit statements that unprivileged use will not be supported in the future. That could make BPF hard to use with io_uring. There may be plans for how to resolve this issue lurking deep within Facebook, but they have not yet found their way onto the public lists. It appears that the BPF topic in general will be discussed at the 2020 Linux Storage, Filesystem, and Memory-Management Summit.

In summary, though, io_uring appears to be on a roll with only a relatively small set of growing pains. It will be interesting to see how much more functionality finds its way into this subsystem in the coming releases. Recent history suggests that the growth of io_uring will not be slowing down anytime soon.

Comments (57 posted)

How to contribute to kernel documentation

By Jonathan Corbet
January 23, 2020
Some years back, I was caught in a weak moment and somehow became the kernel documentation maintainer. More recently, I've given a few talks on the state of kernel documentation and the sort of work that needs to be done to make things better. A key part of getting that work done is communicating to potential contributors the tasks that they might helpfully take on — a list that was, naturally, entirely undocumented. To that end, a version of the following document is currently under review and headed for the mainline. Read on to see how you, too, can help to make the kernel's documentation better.

How to help improve kernel documentation

Documentation is an important part of any software-development project. Good documentation helps to bring new developers in and helps established developers work more effectively. Without top-quality documentation, a lot of time is wasted in reverse-engineering the code and making avoidable mistakes. With good documentation, instead, the result can be a project that is more effective and more inclusive.

Unfortunately, the kernel's documentation currently falls far short of what it needs to be to support a project of this size and importance.

This guide is for contributors who would like to improve that situation. Kernel documentation improvements can be made by developers at a variety of skill levels; they are a relatively easy way to learn the kernel process in general and find a place in the community. There is an endless list of tasks that need to be carried out to get our documentation to where it should be. This list contains a number of important items, but is far from exhaustive; if you see a different way to improve the documentation, please do not hold back.

Addressing warnings

The documentation build currently spews out an unbelievable number of warnings. When you have that many warnings, you might as well have none at all; people ignore them, and they will never notice when their work adds new ones. For this reason, eliminating warnings is one of the highest-priority tasks on the documentation to-do list. The task itself is reasonably straightforward, but it must be approached in the right way to be successful.

Warnings issued by a compiler for C code can often be dismissed as false positives, leading to patches aimed at simply shutting the compiler up. Warnings from the documentation build almost always point at a real problem; making those warnings go away requires understanding the problem and fixing it at its source. For this reason, patches fixing documentation warnings should probably not say "fix a warning" in the changelog title; they should indicate the real problem that has been fixed.

Another important point is that documentation warnings are often created by problems in kerneldoc comments in C code. While the documentation maintainer appreciates being copied on fixes for these warnings, the documentation tree is often not the right one to actually carry those fixes; they should go to the maintainer of the subsystem in question.

For example, I grabbed a pair of warnings nearly at random from a 5.5-rc7 documentation build:

    ./drivers/devfreq/devfreq.c:1818: warning: bad line:
  	  - Resource-managed devfreq_register_notifier()
    ./drivers/devfreq/devfreq.c:1854: warning: bad line:
	  - Resource-managed devfreq_unregister_notifier()

(The lines were split for readability).

A quick look at the source file named above turned up a couple of kerneldoc comments that look like this:

  /**
   * devm_devfreq_register_notifier()
	  - Resource-managed devfreq_register_notifier()
   * @dev:	The devfreq user device. (parent of devfreq)
   * @devfreq:	The devfreq object.
   * @nb:		The notifier block to be unregistered.
   * @list:	DEVFREQ_TRANSITION_NOTIFIER.
   */

The problem is the missing "*", which confuses the build system's simplistic idea of what C comment blocks look like. This problem had been present since that comment was added in 2016 — a full four years. Fixing it was a matter of adding the missing asterisks. A quick look at the history for that file showed what the normal format for subject lines in the target subsystem is, and scripts/get_maintainer.pl told me who should receive the patch. The resulting patch looked like this:

    [PATCH] PM / devfreq: Fix two malformed kerneldoc comments

    Two kerneldoc comments in devfreq.c fail to adhere to the required format,
    resulting in these doc-build warnings:

      ./drivers/devfreq/devfreq.c:1818: warning: bad line:
    	  - Resource-managed devfreq_register_notifier()
      ./drivers/devfreq/devfreq.c:1854: warning: bad line:
	  - Resource-managed devfreq_unregister_notifier()

    Add a couple of missing asterisks and make kerneldoc a little happier.

    Signed-off-by: Jonathan Corbet <corbet@lwn.net>
    ---
     drivers/devfreq/devfreq.c | 4 ++--
     1 file changed, 2 insertions(+), 2 deletions(-)

    diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
    index 57f6944d65a6..00c9b80b3d33 100644
    --- a/drivers/devfreq/devfreq.c
    +++ b/drivers/devfreq/devfreq.c
    @@ -1814,7 +1814,7 @@ static void devm_devfreq_notifier_release(struct device *dev, void *res)

     /**
      * devm_devfreq_register_notifier()
    -	- Resource-managed devfreq_register_notifier()
    + *	- Resource-managed devfreq_register_notifier()
      * @dev:	The devfreq user device. (parent of devfreq)
      * @devfreq:	The devfreq object.
      * @nb:		The notifier block to be unregistered.
    @@ -1850,7 +1850,7 @@ EXPORT_SYMBOL(devm_devfreq_register_notifier);

     /**
      * devm_devfreq_unregister_notifier()
    -	- Resource-managed devfreq_unregister_notifier()
    + *	- Resource-managed devfreq_unregister_notifier()
      * @dev:	The devfreq user device. (parent of devfreq)
      * @devfreq:	The devfreq object.
      * @nb:		The notifier block to be unregistered.
    --
    2.24.1

The entire process only took a few minutes, and the fix was mailed off with pride. Of course, I then found that somebody else had fixed this problem in a separate tree, highlighting another lesson: always check linux-next to see if a problem has been fixed before you dig into it.

Other fixes will take longer, especially those relating to structure members or function parameters that lack documentation. In such cases, it is necessary to work out what the role of those members or parameters is and describe them correctly. Overall, this task gets a little tedious at times, but it's highly important. If we can actually eliminate warnings from the documentation build, then we can start expecting developers to avoid adding new ones.

Languishing kerneldoc comments

Developers are encouraged to write kerneldoc comments for their code, but many of those comments are never pulled into the docs build. That makes this information harder to find and, for example, makes Sphinx unable to generate links to that documentation. Adding kernel-doc directives to the documentation to bring those comments in can help the community derive the full value of the work that has gone into creating them.

The scripts/find-unused-docs.sh tool can be used to find these overlooked comments.

Note that the most value comes from pulling in the documentation for exported functions and data structures. Many subsystems also have kerneldoc comments for internal use; those should not be pulled into the documentation build unless they are placed in a document that is specifically aimed at developers working within the relevant subsystem.

Typo fixes

Fixing typographical or formatting errors in the documentation is a quick way to figure out how to create and send patches, and it is a useful service. I am always willing to accept such patches. That said, once you have fixed a few, please consider moving on to more advanced tasks, leaving some typos for the next beginner to address.

Please note that some things are not typos and should not be "fixed":

  • Both American and British English spellings are allowed within the kernel documentation. There is no need to fix one by replacing it with the other.
  • The question of whether a period should be followed by one or two spaces is not to be debated in the context of kernel documentation. Other areas of rational disagreement, such as the "Oxford comma", are also off-topic here.

As with any patch to any project, please consider whether your change is really making things better.

Ancient documentation

Some kernel documentation is current, maintained, and useful. Some documentation is ... not. Dusty, old, and inaccurate documentation can mislead readers and casts doubt on our documentation as a whole. Anything that can be done to address such problems is more than welcome.

Whenever you are working with a document, please consider whether it is current, whether it needs updating, or whether it should perhaps be removed altogether. Little value comes from fixing typos in a document full of obsolete information. There are a number of warning signs that you can pay attention to here:

  • References to 2.x kernels
  • Pointers to SourceForge repositories or mailing lists
  • Nothing but typo fixes in the history for several years
  • Discussion of pre-Git workflows
  • Internal changelogs ending in 1997

The best thing to do, of course, would be to bring the documentation current, adding whatever information is needed. Such work usually requires the cooperation of developers familiar with the subsystem in question. Developers are often more than willing to cooperate with people working to improve the documentation when asked nicely, and when their answers are listened to and acted upon.

Some documentation is beyond hope; we occasionally find documents that refer to code that was removed from the kernel long ago, for example. There is surprising resistance to removing obsolete documentation, but we should do that anyway. Extra cruft in our documentation helps nobody.

In cases where there is perhaps some useful information in a badly outdated document, and you are unable to update it, the best thing to do may be to add a warning at the beginning. The following text is recommended:

    .. warning ::
  	This document is outdated and in need of attention.  Please use
	this information with caution, and please consider sending patches
	to update it.

That way, at least our long-suffering readers have been warned that the document may lead them astray.

Documentation coherency

The old-timers around here will remember the Linux books that showed up on the shelves in the 1990s. They were simply collections of various documentation files scrounged from numerous locations across the net. The books have (mostly) improved since then, but the kernel's documentation is still mostly built on that model. It is thousands of files, almost each of which was written in isolation from all of the others. We don't have a coherent body of kernel documentation; we have thousands of individual documents.

We have been trying to improve the situation through the creation of a set of "books" that group documentation for specific readers. These include the user-space API guide, the development-process manual, the core API manual, the user's and administrator's guide, and, naturally, the documentation manual, among others.

Moving documents into the appropriate books is an important task and needs to continue. There are a couple of challenges associated with this work, though. Moving documentation files creates short-term pain for the people who work with those files; they are understandably unenthusiastic about such changes. There is also often resistance to organizing documentation for the convenience of its readers rather than that of its authors. Usually the case can be made to move a document once; we really don't want to keep shifting them around, though.

Even when all documents are in the right place, though, we have only managed to turn a big pile into a group of smaller piles. The work of trying to knit all of those documents together into a single whole has not yet begun. If you have bright ideas on how we could proceed on that front, we would be more than happy to hear them.

Stylesheet improvements

With the adoption of Sphinx we have much nicer-looking HTML output than we once did. But it could still use a lot of improvement; Donald Knuth and Edward Tufte would be unimpressed. That requires tweaking our stylesheets to create more typographically sound, accessible, and readable output.

Be warned: if you take on this task you are heading into classic bikeshed territory. Expect a lot of opinions and discussion for even relatively obvious changes. That is, alas, the nature of the world we live in.

Non-LaTeX PDF build

This is a decidedly nontrivial task for somebody with a lot of time and Python skills. The Sphinx toolchain is relatively small and well contained; it is easy to add to a development system. But building PDF or EPUB output requires installing LaTeX, which is anything but small or well contained. That would be a nice thing to eliminate.

The original hope had been to use the rst2pdf tool for PDF generation, but it turned out to not be up to the task. Development work on rst2pdf seems to have picked up again in recent times, though, which is a hopeful sign. If a suitably motivated developer were to work with that project to make rst2pdf work with the kernel documentation build, the world would be eternally grateful.

Write more documentation

Naturally, there are massive parts of the kernel that are severely underdocumented. If you have the knowledge to document a specific kernel subsystem and the desire to do so, please do not hesitate to do some writing and contribute the result to the kernel. Untold numbers of kernel developers and users will thank you.

Postscript: for those who really want to see one of those talks, here is a video from Kernel Recipes 2019 [YouTube].

Comments (14 posted)

Some 5.5 kernel development statistics

By Jonathan Corbet
January 28, 2020
The 5.5 kernel was released on January 26. Over the course of this development cycle, it was occasionally said that the holidays were slowing contributions. At the end, though, 5.5 saw the merging of 14,350 non-merge changesets from 1,885 developers — not exactly a slow-moving cycle. Indeed, 5.5 just barely edged out 5.4 as the kernel with the most developers ever. Read on for our traditional look at where the contributions to 5.5 came from, along with a digression into the stable-update process.

Just under 590,000 lines of code were added for 5.5, while almost 272,000 were removed, for a net growth of 318,000 lines of code. Of the developers contributing to 5.5, 285 were contributing for the first time. The most active developers working on 5.5 were:

Most active 5.5 developers
By changesets
Chris Wilson2641.8%
Christoph Hellwig2211.5%
Yue Haibing1971.4%
Colin Ian King1441.0%
Thierry Reding1391.0%
Krzysztof Kozlowski1300.9%
Jens Axboe1240.9%
Arnaldo Carvalho de Melo1210.8%
Arnd Bergmann1200.8%
Geert Uytterhoeven1200.8%
Ville Syrjälä1090.8%
Kuninori Morimoto1060.7%
Alex Deucher910.6%
Takashi Iwai900.6%
Andy Shevchenko890.6%
Tony Lindgren860.6%
Andrii Nakryiko850.6%
zhengbin830.6%
Ben Dooks780.5%
Dmitry Torokhov760.5%
By changed lines
Ard Biesheuvel240063.6%
Haiyan Song201823.0%
Chris Wilson135982.0%
Dmitry Osipenko127451.9%
Hao Zheng112521.7%
Christoph Hellwig106521.6%
Jérôme Pouiller106051.6%
Potnuri Bharat Teja97461.5%
Jason A. Donenfeld86561.3%
Jiaxun Yang65541.0%
Mauro Carvalho Chehab62400.9%
Bhawanpreet Lakha59080.9%
Jens Axboe57090.8%
Thierry Reding52080.8%
Vladimir Oltean49600.7%
Zaibo Xu48490.7%
Adrian Hunter46680.7%
Andrii Nakryiko45710.7%
Nuno Sá45160.7%
Brendan Higgins44020.7%

One of the most reliable ways to get into the list of top contributors, it seems, is to work on a graphics driver; it is thus not surprising that Chris Wilson contributed the most changesets entirely through work on the Intel i915 driver. Christoph Hellwig did a lot of work in the XFS filesystem, the block layer, and the RISC-V architecture code. Yue Haibing and Colin Ian King both contributed cleanup patches all over the tree, while Thierry Reding worked mostly on the Tegra graphics driver.

In the "lines changed" column, Ard Biesheuvel worked almost entirely in the crypto subsystem; much of that work was aimed at enabling the merging of the WireGuard VPN code into 5.6. Haiyan Song contributed exactly two patches updating perf events data for Intel CPUs. Dmitry Osipenko worked on Tegra hardware support, and Hao Zheng contributed one big patch to the Marvell octeontx2 network driver.

The testing and reviewing numbers this time around look like this:

Test and review credits in 5.5
Tested-by
Andrew Bowers738.2%
Arnaldo Carvalho de Melo465.2%
Keerthy212.4%
Adam Ford212.4%
Yoshihiro Shimoda171.9%
Peter Geis141.6%
Hannes Reinecke121.3%
Stan Johnson121.3%
Aaron Brown121.3%
Sean Nyekjaer111.2%
Randy Dunlap111.2%
Reviewed-by
Darrick J. Wong2053.5%
Rob Herring1763.0%
Chris Wilson1422.4%
Christoph Hellwig1151.9%
Tvrtko Ursulin1091.8%
Alex Deucher941.6%
David Sterba871.5%
Andrew Lunn841.4%
Daniel Vetter731.2%
Christian König621.0%
Greg Kroah-Hartman611.0%

Only 797 changesets (5.5% of the total) carried Tested-by tags, while 4,939 changesets (34% of the total) had Reviewed-by tags. Two of the top testers, Andrew Bowers and Keerthy focus on testing Intel and TI-specific driver patches, respectively. The top reviewer, Darrick Wong, applied the Reviewed-by tag to XFS patches that, as the XFS maintainer, he also signed off on; most other subsystem maintainers do not follow that practice.

The most prolific bug reporters (and those who credited them) in this cycle were:

Reported-by credits in 5.5
Recipients
Hulk Robot16415.7%
Syzbot12512.0%
kbuild test robot1029.8%
Dan Carpenter323.1%
Linus Torvalds141.3%
Stephen Rothwell121.2%
Geert Uytterhoeven80.8%
Randy Dunlap80.8%
kernel test robot80.8%
Qian Cai80.8%
Yauheni Kaliuta80.8%
Arnaldo Carvalho de Melo70.7%
Johan Hovold70.7%
Christophe Leroy70.7%
coverity-bot70.7%
Creditors
zhengbin838.0%
YueHaibing696.6%
Eric Dumazet323.1%
Jens Axboe262.5%
Chris Wilson222.1%
Jérôme Pouiller181.7%
Paul E. McKenney161.5%
Takashi Iwai121.2%
Florian Westphal111.1%
Frederic Weisbecker111.1%
Andrii Nakryiko101.0%
Linus Torvalds90.9%

These numbers would indicate that over 1/3 of the bug reports for the kernel (of which 934 were credited in 5.5) are now coming from automated testing systems. Some of the reported bugs are more severe than others, but there is little doubt that having automated systems finding hundreds of bugs (that are subsequently fixed) each development cycle is good for the kernel as a whole.

A total of 231 companies (that we know about) support work on the 5.5 kernel. The most active employers this time around were:

Most active 5.5 employers
By changesets
Intel165511.5%
(Unknown)9997.0%
Red Hat9456.6%
(None)8045.6%
Google7805.4%
AMD7094.9%
Huawei Technologies5864.1%
SUSE5493.8%
Linaro5003.5%
IBM4583.2%
(Consultant)3952.8%
Renesas Electronics3842.7%
Facebook3592.5%
NXP Semiconductors3362.3%
Mellanox3042.1%
Samsung2401.7%
Arm2341.6%
Texas Instruments2181.5%
Canonical1971.4%
NVIDIA1931.3%
By lines changed
Intel9553114.2%
(Unknown)472987.0%
Red Hat340645.1%
Arm335005.0%
Google316174.7%
(None)279124.2%
AMD262593.9%
Linaro258383.8%
(Consultant)233583.5%
Marvell207773.1%
SUSE183212.7%
IBM179232.7%
Facebook176282.6%
Samsung148632.2%
NXP Semiconductors141972.1%
Chelsio133222.0%
Renesas Electronics129431.9%
Huawei Technologies112921.7%
NVIDIA108991.6%
Mellanox107041.6%

The employer table is generally unsurprising, and this month is no exception.

A walk on the stable side

The 4.9 kernel was released on December 11, 2016, just over three years ago. Some 16,214 non-merge changesets went into the 4.9 release. Since then, as of this writing, there have been 210 stable updates to 4.9, adding another 15,210 changesets — enough to make up another large development cycle. That is a lot of fixes, and they are not all small: 4.9.210 is 80,000 lines larger than 4.9 was.

The contributor picture for this long-term stable kernel is somewhat different than it was at release time. The following are the most active contributors to the updates — the releases that came out after 4.9:

Most active 4.9-stable contributors
Individuals
Greg Kroah-Hartman3012.0%
Eric Dumazet2851.9%
Johan Hovold2241.5%
Arnd Bergmann2161.4%
Takashi Iwai1841.2%
Dan Carpenter1741.1%
Thomas Gleixner1410.9%
Eric Biggers1210.8%
Xin Long930.6%
Hans de Goede810.5%
Geert Uytterhoeven800.5%
Mark Rutland780.5%
Colin Ian King700.5%
Will Deacon690.5%
Cong Wang690.5%
Bart Van Assche680.4%
Dan Williams680.4%
Gustavo A. R. Silva650.4%
Peter Zijlstra640.4%
Theodore Ts'o640.4%
Companies
Red Hat13538.9%
Google13448.8%
(None)12438.2%
(Unknown)11357.5%
Intel10446.9%
SUSE7454.9%
IBM6694.4%
Oracle4623.0%
Linaro4523.0%
(Consultant)4252.8%
Linux Foundation4182.7%
Huawei Technologies3012.0%
Mellanox2841.9%
Arm2581.7%
Broadcom1891.2%
Samsung1871.2%
Canonical1811.2%
Linutronix1651.1%
Renesas Electronics1491.0%
AMD1471.0%

Greg Kroah-Hartman's position at the top of the individual list is a bit deceptive; over 200 of those commits are simply setting the version numbers for each stable release. Of the remaining commits, 50 are reverts of "stable" changes that proved to be not such a good idea. Take those commits out, and he would not have made it into the top 20 changeset contributors.

The picture that emerges in general is one containing many long-term contributors to the core kernel; they may not be the top contributors from one release to the next, but they are creating the important fixes that we all depend on.

Only 2,775 of the fixes merged between 4.9 and 4.9.210 contain a Reported-by tag; that is about 18% of the total. Given that most of the commits in this series are meant to be bug fixes, that suggests that a lot of bug reports are still going without credit. The reporters who were credited for fixes going into 4.9.x were:

Top 4.9.x bug reporters
Syzbot471
Dmitry Vyukov98
Andrey Konovalov79
Dan Carpenter52
kbuild test robot32
Hulk Robot30
Fengguang Wu26
Jianlin Shi26
Ben Hutchings24
Jann Horn23
Al Viro18
Guenter Roeck18
Wen Xu17
Arnd Bergmann14
Eric Biggers14
Anatoly Trosinenko11
Alexander Potapenko10
Li Shuang10
Eric Dumazet9
Tetsuo Handa9
Pali Rohár9

Bearing in mind that Syzbot and Dmitry Vyukov are one and the same in this context, it is clear that the Syzbot robot, in particular, has been highly effective in finding bugs that have been in the kernel for a long time. The same is true, to a lesser degree, for Fengguang Wu and his "kbuild test robot".

The process of developing and stabilizing a kernel clearly does not abruptly end the day that Linus Torvalds releases it; one might argue that much of the real work has just begun. Getting developers to focus on fixing bugs rather than developing new ones has been a struggle for almost as long as there has been software; the good news is that, over the years, the development community has gotten better at doing that work. The development process now continues at a high speed, even after a new kernel is released and the focus moves on to the next development cycle.

Comments (5 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Briefs: OpenSMTPD vuln; Linux 5.5; LibreOffice 6.4; Qt changes; Librem 5; Thunderbird spun out; Quotes; ...
  • Announcements: Newsletters; conferences; security updates; kernel patches; ...
Next page: Brief items>>

Copyright © 2020, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds