Leading items
Welcome to the LWN.net Weekly Edition for October 25, 2018
This edition contains the following feature content, including the first set of articles from the 2018 Kernel Maintainers Summit:
- Picking a governance model for Python: the Python community has a wealth of options to choose from as it decides how to govern itself in the post-Guido era.
- Making the GPL more scary: the new MongoDB license.
- The code of conduct at the Maintainers Summit: Linus Torvalds returned to the community to participate in a discussion on the new code of conduct and how it will be interpreted.
- Making stable kernels more stable: regressions are never welcome, but that is especially true for stable kernel releases.
- Replacement of deprecated kernel APIs: making tree-wide API changes in the kernel remains a challenging task.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Picking a governance model for Python
The Python language project has been officially "leaderless" since the mid-July announcement that Guido van Rossum was stepping down. He is, of course, the founder of the language and had served for more than two decades as its Benevolent Dictator for Life (BDFL). But he did not appoint a successor and left it up to the project's core developers to come up with a new governance structure. In the three months since, a great deal of work has gone into that effort, which has to bootstrap itself since there was not even any mechanism to choose how to select a new governance model.
As with nearly any sizable change for Python, the governance question was broken up into a series of Python Enhancement Proposals (PEPs). In this case, PEP 8000 is an overview (or index) of the different PEPs that are being considered. The starting point, though, is to determine how those competing proposals (there are six currently, though there is a fair amount of overlap between them in various ways) will be chosen. That is the role of PEP 8001 ("Python Governance Voting Process").
How to decide
There is a bit of a chicken-and-egg problem for PEP 8001, though. Normally, Van Rossum would decide on PEPs (or delegate them to someone else); he has not completely left the fold, but is definitely not handling that role any more. There was a core developer sprint in September where much of the contents of PEP 8001 were hashed out; it was then discussed with the core developer community. PEP 8001 was made "active" on October 22 by Łukasz Langa based more or less on consensus among the core developers.
As with almost all of the public discussion on the governance PEPs, the PEP 8001 discussion was held using the Python Discourse instance. As we reported in mid-October, Python is experimenting with Discourse for conversations that normally would have taken place in the core-developer-only python-committers mailing list. For the most part, the conversation has moved to Discourse, as was requested in order to truly experiment with that communication medium.
Based on PEP 8001, the vote will be done using instant-runoff voting (IRV); voters must rank all of the options from one to six (assuming no change in the number of entrants before the vote) in preference order (with one being the highest). If there is not a majority for one of the options, the lowest vote-getting option is eliminated and the second choice is used from those ballots; that process is repeated until one option achieves a majority.
All core committers are eligible to vote, though there is a request that
inactive core developers, who intend to remain inactive, abstain. It is
completely voluntary, but is meant to allow inactive developers "to
assess their skin in the game
" before deciding whether to vote. The
vote will be held for the
two-week period of
November 16-30. Votes will be collected in a Git repository.
Obviously the results will be made public, but, in
addition, the actual rankings made by each voter will also be public after
the vote.
Most of the discussion of PEP 8001 centered on some possible deficiencies of IRV and the possibility that voters could "peek" at other votes to try to game the vote with tactical voting. In the end, most seem comfortable that IRV will give the project a governance structure it can live with and IRV is easier to explain and discuss than some of the other options. While it is true that voters could peek, concern about it is not widespread. As Gregory P. Smith put it:
Anyone worried about protecting their vote from prying eyes/bots before the result can aim to push their vote to GitHub near the deadline.
We decided that we're all adults who should have a level of trust and respect for one another. If we believe we are composed significantly of assholes who will vote tactically, why be here at all?
However, Donald Stufft pointed
out
that "tactical voting", itself, is not necessarily a bad thing. Stufft
advocated the STAR
voting system as well as ways to keep ballots secret. In the end,
though, simplicity seems to have won out; Brett Cannon summed
things up by noting that not encrypting the ballots is far easier (and
"I personally don't care even if people do peek
").
There was no real groundswell of support for STAR either, he said. A few
days later, PEP 8001 was "approved".
GUIDO
PEPs 8010 through 8015 make up the proposals that the core developers will need to choose between. Once the new governance structure is picked, it will be enshrined as PEP 13. Given that Python has had a BDFL for its entire history, until now, it probably does not come as a surprise that one of the proposals, PEP 8010 ("The BDFL Governance Model"), would continue that role, though it does not name a BDFL. Like with other proposals, another election would be needed to fill the position.
PEP 8010, which was authored by Barry Warsaw, would replace the BDFL with
another, rather familiar looking
acronym, GUIDO, which stands for "Gracious Umpire Influencing
Decisions Officer
"—a bit of a stretch linguistically perhaps. The PEP
describes the role of the GUIDO and how they are elected, for how long,
and, even, how they are replaced if the community loses faith in them. It
notes:
The GUIDO would be nominated and seconded by the core developers and chosen under the same scheme described in PEP 8001. They would serve for roughly four-and-a-half years (which is three Python release cycles under the current cadence); they could run again as many times as they wanted. The GUIDO will serve with an elected Council of Pythonistas (CoP) who will provide counsel to them as well as serving as a check on their power. A unanimous vote of the three-member CoP can cause a project-wide confidence vote on the GUIDO to be held.
The main role for the GUIDO is "to provide an
overarching, broad, coherent vision for the evolution of the Python
language, spanning multiple releases
". They would also be the
ultimate authority for deciding on PEPs, but they can delegate that
authority to another, as was done under Van Rossum. They also have broad
powers in
terms of determining what is "PEP-worthy", shutting down unproductive
discussions on the mailing lists or elsewhere, and in resolving other kinds
of disputes. For day-to-day operations, though, the GUIDO would be
uninvolved—for the most part Python hums along without any need for a
dictator—which is also as things were under Van Rossum.
The discussion of the PEP has been somewhat sporadic. There are clearly some who are uncomfortable with the idea of a single person's "vision" (other than, perhaps, Van Rossum's) ruling the language going forward. There were also some comments about wording and the like. As with the other proposals, people are generally not posting positive comments; instead the comments are either seeking clarification or are critical of some aspect (or the whole thing).
Trio of Pythonistas
PEP 8011 ("Python Governance Model Lead by Trio of Pythonistas") on the other hand, replaces the BDFL with a "Trio of Pythonistas" (ToP or simply Trio), who are tasked with making the final decisions for the language. It does not name any members of the Trio but describes how it would be formed. Instead of electing three separate core developers to the Trio, slates of three developers would be nominated; each core developer could then vote for their favorite slate, with the slate receiving the most votes being elected as the Trio. In the discussion, it was stated that a core developer could be a candidate on multiple slates.
The goal is to get three core developers with a similar vision for Python, but potentially with different skill sets that would complement each other. Much of the PEP is marked "open to discussion", including the term of service, which is open-ended. If a single Trio member needs to step down for some reason, they would be replaced, but it is not yet specified how that would be done. The Trio as a whole is asked to give one-year's notice before it disbands or retires; at that time, the community can reflect on how successful the Trio model has been. If desired, an entirely different governance model could be chosen at that time. In some sense, that is inherent in all of the proposals, since new PEPs can always override older ones, but PEP 8011 is the only one that explicitly calls out the idea of "refactoring" the governance.
There is a whole list of things the Trio is meant to do (and a list of things that are not responsibilities for the group), but much of that boils down to handling PEP acceptance. To a large extent, that is where the rubber hits the road in Python governance; no major change is made without a PEP being accepted that describes the change. Whoever controls the PEP process controls Python, its direction, and its future. There are other things that Van Rossum did (and would be part of the Trio's responsibilities), such as setting a good example in terms of behavior and tone, but PEP pronouncement is where the power lies in the Python community.
So far, much of the discussion about the Trio proposal was focused on the requirement that Trio members also be voting members of the Python Software Foundation (PSF). Several felt that the requirement might lead otherwise-qualified developers to not run for the Trio, so it was suggested that PSF membership not be required.
Community governance
PEP 8012 ("The Community Governance Model") is based on the governance models of several large open-source projects, some of which were described in PEP 8002 ("Open Source Governance Survey"). In particular, Rust and Django are both governed by the "community model" and were outlined in PEP 8002; ECMAScript and C++ are similarly governed (according to PEP 8012) but were not part of that governance survey.
The basic idea is to have special interest groups of experts that form around specific areas of the language and, effectively, have veto power over any changes in those areas. PEPs can be put forward in a "final comment period" by unanimous consent among the experts, along with a disposition (accept, accept provisionally, reject, or defer). That period lasts 14 days and gives any stakeholders a final chance to raise objections.
If a core developer thinks that a PEP in its final comment period should be rejected instead of accepted, they can call for a vote. If more than one-third of the non-dormant core developer population vote to reject, the PEP is rejected. On the other hand, the only recourse to overturning a rejection by the experts is to disband the team of experts, which requires a two-thirds supermajority of the non-dormant developer population—as does removing a core developer from the project. The intent is that these developer-wide votes would be rare; certainly something has gone awry if they are frequent.
Developers can assign themselves to an existing expert team with unanimous consent among the existing members. PEP 8012 is one of the few proposals that has a mechanism for adding new core developers. Only one negative vote is enough to keep a nominated person from becoming a core developer. This "blackball" mechanism was questioned in the discussion of the PEP, but it is considered to be integral to the trust model of the community governance by Langa, who is the author of the PEP.
One interesting note that came out of the discussion is that Langa suspects that language changes like PEP 572 ("Assignment Expressions"), which led to the "PEP 572 mess" and, ultimately, Van Rossum's resignation, would probably not get accepted in a Python governed by PEP 8012.
External council
PEP 8013 ("The External Council Governance Model") is one of the more distinctive proposals. It would add a "Council of Auditors" (CoA) that is specifically made up of two to four people who are not core developers. As with the other proposals, much of the responsibility that the CoA has is to decide on PEPs. The CoA would be elected for the duration of one Python release cycle, but members could run again as often as they wish.
There is a voting process, where nominated and seconded non-core-developers are added to the python-committers mailing list to introduce themselves and explain why they are interested in being part of the CoA. Once the feature freeze of a release approaches, voting opens and core developers can vote for as many candidates as they like; the first-place candidate becomes the president of the CoA and the next nominees that get more than 50% of the vote (up to three) are members. Any ties are resolved by the release manager of the following release (the one for which the CoA serves).
There is also a no-confidence voting process that allows core developers to nominate CoA members to be removed; they can also nominate PEPs that should be reverted as part of the no-confidence vote. Any of those that are seconded are voted on after the CoA members are given seven days to respond to the "charges". After that, voting takes place over the next seven days; more +1 votes that -1 votes is sufficient to effect the change.
As might be guessed, much of the discussion surrounded the requirement that CoA members not be core developers. It was seen as somewhere between counter-intuitive and insulting to suggest that external people who are not core developers could be the ones to decide on language features going forward. PEP 8013 author Steve Dower clearly sees that as a benefit and is confident that well-respected (and known) people can be found for the CoA, but others are not so sure.
Commons
In some ways, PEP
8014 ("The Commons Governance Model") is similar to PEP 8013. In fact,
Dower said that he would be
happy to adopt parts of it into PEP 8013. The main difference is that the
"Council of Elders" (CoE) proposed by PEP 8014 would allow core
developers as members, unlike PEP 8013's CoA. PEP 8014 author Jack Jansen
is aiming
for "as few procedures, defined terms and
percentages as possible
"; he would have called it "The Anarchist
Governance Model" if "anarchist" did not have such negative connotations to
some.
Given the aims, it is not surprising that there is a fair amount of hand-waving in the PEP. The overarching idea is that the CoE of 5-10 people, with a diversity of backgrounds within the Python ecosystem, can determine the will of the community on any given PEP. How those persons are chosen and what kind of "emergency brake" is available to override a misbehaving council are not specified in the PEP, at least as of yet. The discussion has been largely positive; some questions about anonymity for the CoE members have been batted about.
Community organization
That brings us to the final proposal, which is PEP 8015
("Organization of the Python community"). Unlike the other proposals, PEP
8015 sets out to formalize the current Python community organization and
then to make minimal changes to the organization to reflect the changed
situation (i.e. no BDFL). The goal is to "get a smooth transition
from the old to the new organization
".
To that end, it "replaces" the BDFL with a Python Core Board that takes over the PEP approval process. It also strengthens the existing "team" idea within Python and gives teams more autonomy. The board can delegate its PEP-approval role to teams (as is currently done with the Packaging Team) for PEPs in their areas of expertise. It can also delegate to individuals, PEP-Delegates, which are the equivalent of the BDFL-Delegates in the old organization.
For particularly controversial PEPs (e.g. PEP 572), the board can call for a vote among core developers, where a simple majority rules the outcome. The three board members would serve for three years, with one seat being up for election every year; no one could serve more than two terms on the board. The idea is to promote some turnover, while maintaining some continuity as well. Board members must be core developers and must come from from different companies (no two board members can come from the same company, including its subsidiaries).
Commonality
There is a certain amount of commonality with the proposals, which should not be a huge surprise. Python has been chugging along with few major governance difficulties over the years. Obviously, losing the BDFL would qualify, but the community seems up to the challenge of finding something new. It is also not impossible that, a few years down the road, adjustments, or even a major overhaul, will be made. The latter would be disruptive, but it would be possible to do if hiccups in the governance process are found.
One interesting thread looked at the statistics of core development. While the data is not conclusive, it does seem like there has been a downturn in the number of commits over the past six years or so. That might indicate that the language is settling into a stable phase, where large changes are not really on the table. If that is the case, much of the hand-wringing around things like PEP 572 may not be much of a factor moving forward.
It would be difficult to handicap the different options at this point—or perhaps anytime before December 1. Four of the proposals (8010, 8011, 8012, and 8015) do not stray all that far from well known and understood models, either for Python or other projects. Certainly adopting the BDFL model would largely be status quo ante. PEP 8015 does not change things all that much either. The two wild cards may be PEPs 8013 and 8014 (or some combined version of them); both would upend much of the existing structure in ways that are not all that formally specified. It will be interesting to watch and see where it all goes; stay tuned.
Making the GPL more scary
For some years now, one has not had to look far to find articles proclaiming the demise of the GNU General Public License. That license, we are told, is too frightening for many businesses, which prefer to use software under the far weaker permissive class of license. But there is a business model that is based on the allegedly scary nature of the GPL, and there are those who would like to make it more lucrative; the only problem is that the GPL isn't quite scary enough yet.The business of selling exceptions to the GPL, where one pays the copyright holder for a proprietary license to the code, has been around for a long time; MySQL AB was built on this model, for example. Companies that buy such a license normally do so because they fear that their own code may fall under the requirements of the GPL; vendors tend to take an expansive view of what constitutes a derivative work to feed those fears and encourage sales. It is a model that has been shown to work, and it has generally passed muster even with organizations that are committed to the spread of free software.
MongoDB Inc. is a business built on this model. Its core database product is licensed under the Affero GPL, which tries to close the perceived "software-as-a-service loophole" in the GPL with this language:
Like Redis Labs before it, MongoDB has concluded that this license allows a bit too much. In particular, cloud providers are offering access to MongoDB instances without cutting the company in on the resulting revenue stream, and that doesn't feel right. In response, MongoDB has just announced an immediate shift to its brand-new Server Side Public License (SSPL). This license is based on the AGPL, but adds some extra text to section 13 with, it is claimed, this effect:
The license itself is more explicit about what software must be released in this manner:
The affected code must not only be released, it must be made available under the SSPL. This language, thus, extends the reach of the license beyond any modifications that may have been made to MongoDB itself or to anything that could conceivably be considered a derivative work; it now encompasses all of the software that runs around a commercial MongoDB installation. For a cloud provider, this language would appear to compel the release of most of that provider's internal software used to provide its services as a whole. That is an extension of the scope of the license that could indeed prove scary to businesses using this code.
As Matthew Garrett pointed out, expanding a license's requirements beyond derived works is not entirely new; the GPL's requirement that build scripts be released is one example. But this takes that requirement to a rather different level, to the point of, Garrett suggested, even requiring a relicensing of the Linux kernel if a MongoDB service runs on Linux. MongoDB argues that this license will inspire more companies to participate in the development community, but it seems unlikely that this is the real goal. That goal, instead, is simply to drive the sale of more proprietary licenses. The company claims that it is an open-source license, and has submitted it to the Open Source Initiative for approval. Whether that approval will be forthcoming is far from clear at this point.
One could see this change as being just another company trying to go proprietary without actually looking proprietary. But there are a couple of points to take away here. The first of these is that this kind of license change is just one of the types of obnoxiousness that can come with software that is owned by a single company, whether that software is open-source or not. Anybody depending on such software should always be aware that abrupt and unwelcome policy changes are possible.
There is a lesson here for contributors as well. The request for license
approval notes that: "As of this writing, the MongoDB GITHUB
repository shows over 43,000 commits, 680 releases, and over 350
contributors
". To become one of those contributors, a developer
must first sign MongoDB's
contributor agreement, which assigns copyright ownership to MongoDB.
Those contributors all gave MongoDB the right to relicense their code in
this manner — a permission that some of them may be reconsidering now.
Some of the affected contributions may well have come from the very
companies that the new license is meant to target. Developers should
always be aware of the possibility of this kind of change before handing
ownership of their code to another organization.
MongoDB submitted this license for approval with the optimistic statement
that "we expect our license will quickly gain a wide
following
". That remains to be seen. This license does, however,
appear to be part of a trend in some parts of the market aimed at
extracting more revenue from users of free software — or of projects that
used to be free software. Making money with free software can be
challenging, beyond any doubt, just like most other ways of running a
business. But if that challenge is solved by making the software non-free,
the business may have gained something, but the community around that
software can only lose.
The 2018 Kernel Maintainers Summit
The Kernel Maintainers Summit is a relatively new event designed to allow a relatively small number of kernel maintainers to discuss important topics that are not generally amenable to solution via email discussions. The 2018 Maintainers Summit was held on October 22 in Edinburgh, Scotland, with about 30 developers present.
This half-day meeting discussed these topics:![]()
- The code of conduct: shortly after the abrupt adoption of a new code of conduct, the maintainers discussed how the current state of affairs came about and what should happen next.
- Making stable kernels more stable: the recurring topic of how to prevent stable kernel updates from giving users more than they bargained for.
- Replacement of deprecated APIs: how can we better handle large-scale API changes in the kernel?
- Improving the handling of embargoed hardware-security bugs. The next Meltdown/Spectre vulnerability will come sooner or later; how can we handle it better?
- Removing support for old hardware from the kernel: carrying support for hardware that nobody is using has a cost, but identifying and removing that support is not always easy.
- The proper use of EXPORT_SYMBOL_GPL(): the continuation of an ongoing debate over GPL-only exports.
LWN's coverage of the 2018 Maintainers Summit is now complete.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our travel to the Maintainers Summit].
The code of conduct at the Maintainers Summit
The 2018 Kernel Maintainers Summit convened in Edinburgh, UK on October 22 with a number of things to discuss, but the top subject on most minds was the recently (and hastily) adopted code of conduct. Linus Torvalds made his reentry into the kernel community with a discussion of how we got to the current state of affairs, and the assembled maintainers had a relatively good-natured discussion on how this situation came about and where things can be expected to go from here.Torvalds started by noting that the conduct issue is not a new one; it has been "festering in the community" for years. The immediate cause of his decision to take a break and bring in the code of conduct was knowledge that The New Yorker article was coming; he noted that, contrary to what was written there, the author never tried to contact him. That article led to a number of discussions with friends and others; Torvalds concluded that the best way to "head things off" was to announce some changes with the 4.19-rc4 release. He acknowledged that this was done in private and it was rushed; it did not follow the usual open-source model. After the fact, he admitted to not being sure that the article justified all of the heartache that preceded it. But, as James Bottomley noted, the -rc4 announcement and adoption of the code of conduct did cause the article to be rewritten.
The task of writing that announcement was not fun, Torvalds added, but contrary to some speculation on the net, he did write it all himself. He suggested that anybody who needs to write a message of that nature take a few days to think about it.
Whether or not the article justified the trouble, he became convinced that he had taken the right course after about a week of reading the "vile garbage" that came from people who were opposed to it. He even saved a couple of particularly special emails that were sent to him; they dispelled any doubts that he was on the right side. From here, he only had a couple of suggestions. While he agrees with the addition of the interpretation document and the changes to the code itself, now would be a good time to stop making changes and just let things be. There are a lot of people worried about hypothetical situations, but we shouldn't make more changes unless and until something happens.
Steve Rostedt interjected that the code of conduct is not "our code" and that it would be better to move to something that better reflects our community. Torvalds concurred that a lot of people do not necessarily agree with the author of the Contributor Covenant, upon which the kernel's code is based, but that agreement is not necessary; the code itself is good, he said, and we should resist the temptation to mess with it further. There should be no more hidden emails about it; nobody is entirely happy with it, but we can live with it. Greg Kroah-Hartman added that many other projects have used it; adopting it is like picking a well-known license rather than writing a new one.
Torvalds went on to say that, as far as he is concerned, if the code of conduct ever triggers, it will indicate a big problem; he does not want the code to be an ongoing issue in the community. To that end, he asked the assembled group to watch his emails and let him know if things start to get close to the edge. He has, in fact, installed a profanity filter on his outgoing mail, but it is easy to be impolite without cursing. Kroah-Hartman noted that the previous "code of conflict" had been around for several years; it only generated three reports ever, none of which had any real substance. The community has a good history of doing sane things in this area; we also have a professional mediator, funded by the Linux Foundation, to help in that regard. Contact information for the mediator can be found in the interpretation document.
Kees Cook said that conversations between kernel developers can be scary, especially to relatively new members of the community who see them from the outside. Having the code of conduct tells these people that there is somebody they can talk to if the need arises. Bottomley added that adopting the code will help to convince the outside world that the community has gotten better. Cook noted that, "two rants ago", Torvalds apologized to him afterward, and the last one was a joke, so things were already getting better.
Ted Ts'o said that many of these interactions are context-dependent. Ten years ago, an effort was made to encourage Japanese developers in particular; part of that was sensitizing developers in the community that, in some cultures, direct criticism can lead to strong feelings of shame. Once people know about issues like that, he said, they tend to be more careful. On the other hand, Christoph Hellwig said that the "I love your patch" message from the 0day robot (which reports problems with patches) was offensive to many in a different way.
The documents associated with the code of conduct, Ts'o said, should not be seen as an absolute declaration of how things will be; instead, they are a symbol describing what we are aiming for. Grant Likely added that the community has not really changed much, but now we have thought about what to do when things go bad and have a way to deal with such situations. Laura Abbott added that having the code will help new developers feel that the welcome they get at the beginning will continue to be there as they grow into the community.
Peter Zijlstra admonished, though, that if a developer continually ignores his feedback, he'll eventually stop being nice. Others responded that it is OK to say that the code is wrong, but one can't call the developer an idiot. But what is to be done when the person, who is ignoring feedback, is the real problem? It is still not acceptable to attack the person, Dirk Hohndel said. Instead, in the worst cases, the only real alternative may be to simply ignore the patches.
Ted Ts'o wondered about difficult cases like the current effort to get the WireGuard patches merged. There is no real misbehavior happening there, it is just friction between developers. One of the reasons he started the Kernel Summit in the first place was to get kernel developers to meet each other; it's much harder to get mad at somebody you've shared a beer with. Arranging such meetings is harder now, since the community is so much bigger. In response, we all have to work harder to assume good faith on the part of other developers.
There was a fair amount of discussion on how it might be possible to get more new developers to conferences. The invitation that developers would receive to the older, larger Kernel Summit turn out to be important at a lot of companies, so it may be worthwhile finding a way to revive them. There is a lot of thought that needs to go into conferences in general, though; there are far too many conferences, so it has become difficult for aspiring developers to meet with established members of the community.
Returning to the code of conduct, Ts'o noted that the mandates on maintainers raised a lot of hackles. The purpose wasn't really to create more work for maintainers, though, or to turn them into police officers; instead, it was an expression of the idea that maintainers need to lead by example. We should all do that, he said, and to try to talk to people when we see borderline emails on the lists.
Mauro Carvalho Chehab expressed some worries that parts of the code of conduct could be seen as a binding contract in Brazil. Evidently, though, lawyers at the Linux Foundation have reviewed it with that concern in mind and concluded that it is not the case. There were some questions about what the kernel community would do if the upstream Contributor Covenant code were to change; the answer is that the changes will be evaluated when they happen.
As things wound down I tried to reemphasize the point that the time for private conversations around the code of conduct passed a while back. It was agreed that any future discussions would happen in a public forum. Kroah-Hartman added that, in a couple of years, it will likely become necessary to add some other sort of sensitive, policy-related document using a similar process, though.
When, it was asked, is Torvalds returning to the community? He answered that he is already back; he has gotten some pull requests and intends to return to working normally. It was nice, he said, to have Kroah-Hartman take over for a while and give him a break, though. Kroah-Hartman has write permission to the repository now, so he may be asked to take over again at some point. Torvalds noted dryly, referring to the mixup that got the Maintainers Summit moved to Edinburgh in the first place, that he has a vacation coming up soon where that might be welcome.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the event.]
Making stable kernels more stable
Improving the quality of stable kernel releases is a perennial subject at the Kernel and Maintainers Summit events, and this year was no exception. This session, led by Fedora kernel maintainer Laura Abbott, discussed a range of ideas but found no silver bullets. There is, it seems, not much that can be done to create better stable kernels except to perform more and better testing.Abbott's objective in running this session was to discuss ideas for reducing regressions in stable kernels. Those kernels are, after all, supposed to be stable; if they break, users will suffer and their trust in the entire process will be reduced. In the discussions prior to the summit, she had suggested that perhaps stable releases should sit in a release-candidate state for one week prior to release as a way of shaking out any bugs; that idea was not particularly well received. But we should do something, she said; if we are going to tell people that they should be running stable kernels, those people should not need to employ "an army of engineers" to debug those kernels. The stable kernels we are releasing now, she said, are not ready for production use.
Peter Zijlstra started the discussion with an assertion that the problem
will never be solved. The only way anybody can ever really know that a
kernel will work for their particular combination of hardware and
production workload is to try it. Rafael Wysocki said that there is a
fundamental conflict here: users want fixes to be aggressively applied to
stable kernels, but they also want those kernels to be mature. The end
result, Jiri Kosina said, is that the distributors are not using the stable
kernel releases anymore.
Ted Ts'o told the group that part of the problem is that the long-term support (LTS) kernels are too successful, so the regular stable kernels are not being used anymore. The support period for those kernels is simply too short. Supporting them for a longer period would help, but that would, of course, increase the amount of work required. So the non-LTS kernels are unlikely to ever be useful for distributors. Those that have tried to use them (he mentioned CoreOS in particular) have ended up shipping regressions to users, who were naturally displeased with what they got.
Greg Kroah-Hartman, the maintainer for most of the stable kernels out there, noted that CoreOS never told him about the problems it was having, so there was not much that he could have done about them. Other stable-kernel users have a different experience. Google, for example, runs each release candidate through "a zillion tests" and, as a result, is able to push updates out to users quickly. But, it was pointed out, obtaining this kind of result requires operating a large test infrastructure. Linaro is building something like it, Kroah-Hartman said, and Red Hat too. This is the only way the use of stable kernels by a distributor can really work, he said.
Abbott pointed out that big companies have the resources to put together
this kind of infrastructure, but that is not true of all would-be
stable-kernel users. Sasha Levin said that the KernelCI testing project is evolving to
the point where small groups should be able to make use of it.
Kroah-Hartman said that KernelCI is a Linux-Foundation project now, and
that it is working to add more tests; Mark Brown cautioned that KernelCI
still needs resources to be able to grow, though. and that it's a bit too
soon to advertise it as being ready for widespread use
When Ts'o asked Abbott about the bugs reported by Fedora users, she replied that most of them turn up either in the graphics drivers or the KVM virtualization subsystem. Graphics, she noted, has been getting better recently; Kroah-Hartman replied that KVM is "a black hole" in this regard. Linus Torvalds said that Intel graphics, in particular, has improved a lot recently, but there is more to graphics than Intel. Abbott added that AMD graphics seems to be the source of many recent regressions.
Returning to one of her original points, Abbott asked whether companies need to be active in the kernel community to be able to use the stable releases effectively; Kroah-Hartman responded that not all users are active kernel contributors. Zijlstra said that companies don't need experts; they just need to test their workloads on the release candidates and report any bugs they find. Ts'o thought that the core problem might be a documentation issue; if users knew that they needed to test the release candidates, they might do more of it.
Kees Cook, instead, said that if the community is seeing holes that bugs are slipping through, the right response would be to add tests that might catch them — assuming such tests exist. Paul McKenney pointed out that a lot of the existing tests out there are proprietary; in such cases, it's up to the company that owns the tests to run them and report the results. Some companies do indeed do that, Kroah-Hartman said.
Arnd Bergmann observed that more patches seem to be going into the stable releases than was once the case. Kroah-Hartman said that a lot of work has gone into getting maintainers to tag fixes for the stable releases; that work is bearing fruit. But, Bergmann said, many of those "fixes" appear to be bending the rules that had been put in place for the stable kernels. The rules, Kroah-Hartman responded, are there to allow the maintainers to say "no" to specific patches, but he will generally accept a much broader range of patches for stable releases if the maintainers agree. Bergmann asked whether the rules stretch to adding fixes for warnings generated by new compilers; Kroah-Hartman said "no", that the line has to be drawn somewhere. Fixes to disable those warnings in stable-kernel builds might be accepted, though.
Toward the end, Kroah-Hartman was asked if he uses the "Fixes" tag to select patches for backporting to the stable releases; he answered that he does not have the time to do that. Levin's automatic patch-selection code can make use of it, though. Ts'o said that he has started getting CVE numbers for applicable patches for a novel reason: the presence of a CVE number will cause others to do the work backporting the patches to older kernels for him. With regard to the original topic, though, the conclusion reached by the group was clear enough: if we want better stable-kernel releases, there is really no substitute for better testing.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the Maintainers Summit.]
Replacement of deprecated kernel APIs
The kernel community tries to never change the user-space API in ways that will break applications, but it explicitly allows any internal API to be changed at any time if a solid technical reason to do so exists. But that doesn't mean that such changes are easy to do. At the 2018 Kernel Maintainers Summit, Kees Cook led a discussion on the challenges he has encountered when trying to effect large-scale API changes and what might be done to make such changes go more smoothly.There are, Cook said, two common ways of doing a large API transition: too fast and too slow. As an example of the former, he mentioned the timer initialization change, which took three development cycles to prepare and he said, threatened to give him repetitive strain injuries. When changes are done quickly then, as far as the rest of the community is concerned, thousands of patches tend show up at once. Those patches tend to not see the light of day until they are thought to be ready, and they can result in a lot of merge conflicts once they surface; as a result, these patches often do not get enough testing before going into the mainline.
Doing things too quickly is almost always a bad idea, he said, but it's not
always clear what the best way to proceed is. There was some talk of
adding Coccinelle scripts (which are often
used to make such changes in the first place) to linux-next for testing.
The 0day testing system already runs the existing scripts, so it would not
be too hard to add another for use in linux-next only.
On the other extreme, some changes can take years to complete; this process is too slow and can be painful as well. If an old API hangs around for years, new users will be continually introduced, making overall forward progress hard. He has written a "deprecated APIs" document that is set to enter the mainline in this merge window; he hopes it will help developers to avoid using APIs that are on the verge of removal.
In the past, deprecated APIs have been marked with the __deprecated attribute, which generated warnings when callers of that API were compiled. The warnings were voluminous and tended to overwhelm any useful information, so they were turned off a little while back. Cook does not want to turn them back on, but he does think that the __deprecated marker serves as useful code documentation, so he would like to see developers continue to add them, which has stopped happening.
There was some talk about putting checks into the checkpatch.pl script to warn about adding new calls to deprecated APIs. That would be useful in some settings, but it cannot catch everything; the use of variable-length arrays (VLAs) is an example where checkpatch.pl is not able to help. Herbert Xu said that the 0day tester only sends notifications for new warnings added by a patch, so the __deprecated warnings could be usefully turned on there, even if they are off for everybody else.
Linus Torvalds asked about which API changes are pending now. Cook responded that the VLA removal work is just about done, but Arnd Bergmann said that he has just discovered a few more that need to be dealt with. There is the marking of implicit fall-through code in switch statements; that is an example of a slow change that has a while to go yet. It may be at the point where warnings for unannotated fall-throughs could be turned on in testing settings, at least.
The conversation shifted to the elimination of BUG() and BUG_ON() calls. These functions will kill the entire machine, making any sort of recovery impossible and debugging difficult; adding new calls risks provoking Torvalds to violate his recent conduct-related promises. In almost all cases, one of the variants of WARN_ON() should be used instead, he said; the most paranoid users can set the "panic on warn" command-line option if they really do not want the system to continue after a warning is issued. Perhaps, he suggested, current BUG() calls could all be turned into warnings now.
Cook responded that callers to BUG() do not expect that call to return, so surprising things could happen if its behavior is changed. Torvalds said that such code is buggy, but this also shouldn't be a problem since that code is not being hit in current kernels anyway (or users would be complaining). Ted Ts'o suggested explicitly deprecating BUG() and replacing it with a call whose behavior is configurable; Cook said that he already has such a thing in the form of a function called check_data_corruption(), which callers have to be prepared to see a return from.
Christoph Hellwig said that the current BUG() behavior can be useful in a number of settings; it produces useful information in crash dumps or on a serial console, for example. He suggested that there is space for three types of behavior: emit a warning, kill the machine outright, or warn by default but be configurable to crash the system instead. The third behavior is the one that is missing now. Torvalds was adamant, though, that he does not want kernel developers to have the option of killing the machine; it's not something that most users have any way of coping with. Even he doesn't use a serial console anymore, so when the machine stops, he gets upset.
Bergmann, having done a quick search, noted that there are around 10,000 BUG() and BUG_ON() calls in the kernel now. Downgrading those to warnings would be likely to add a lot of new security holes. Torvalds countered that those calls are already a security hole, but others pointed out that they are currently a denial-of-service hole rather than something worse.
Greg Kroah-Hartman raised the problem of core kernel API code that will use WARN_ON_ONCE() to complain about bad usage; that will not generate the desired result if WARN_ON_ONCE() is configured to crash the machine. He was told that the code should just call pr_warn() instead, and that the called function should return an error in such situations. It was generally agreed that any WARN_ON() or WARN_ON_ONCE() calls that can be triggered from user space need to be fixed.
At that point, the group ran low on things to talk about, as evidenced by the fact that the conversation turned to the 80-character line limit. Some developers would like to see it increased; others disagree. Everybody seemed to agree, though, that fixing line-length problems to keep checkpatch.pl happy leads to worse code most of the time.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the Maintainers Summit.]
Page editor: Jonathan Corbet
Next page:
Brief items>>