Defragmenting the kernel development process
Ted Ts'o introduced the topic by noting that his employer, Google, has a group dedicated to the creation of development tools, and that a lot of good things have come from that. The kernel community also has a lot of tools aimed at making developers more productive, but rather than having a single group creating those tools, we have many competing groups. While competition is good, he said, it also diffuses the available development time and may not be, in the end, the best way to go. He then turned the session over to Vyukov.
Lots of bugs
The kernel community has a lot of bugs, he began; various subsystems are
often broken for several releases in a row. The community adds new
vulnerabilities to the stable releases far too often. The 4.9 kernel, to
take one example, has had many thousands of fixes backported to it. There
are a lot of kernel forks out there, each of which replicates each bug, so
keeping up with these fixes adds up
to a great deal of work for the industry as a whole. The security of our
kernels is "not perfect"; as we fix five holes, ten more are introduced —
on the order of 20,000 bugs per release. We need to reduce the inflow of
bugs into the kernel, he said, in order to get on top of this problem.
These bugs hurt developer productivity and reduce satisfaction all around. Many of them can be attributed to the tools and the processes we use. We have many of the necessary pieces to do better, but there is also a lot of fragmentation. Every kernel subsystem does things differently; there is a distinct lack of identity for the kernel as a whole.
More testing is good, but testing is hard, he said; it's not just a matter of running a directory full of tests. There are a lot of issues that come up. People run subsystem-specific tests, but often fail to detect the failures that happen. Many groups only do build testing. About 15 engineer-years of effort are needed to get to a functional testing setup; the kernel community is spending more than that, but that effort is not being directed effectively. There are at least seven major testing systems out there (he listed 0day, kernelci.org, CKI, LKFT, ktest, syzbot, and kerneltests) when we should just have one good system.
When the testing systems work, a single problem can result in seven bug reports, each of which must be understood and answered separately. So developers have to learn how to interact with each testing system. Christoph Hellwig interjected that he has never gotten a duplicate bug report from an automated testing system, but others have had different experiences. And, to the extent that duplicate reports do not happen, it indicates that the testing systems are not functional — they are not detecting the problems.
Laura Abbott pointed out that many of the testing systems are not doing the same thing; their coverage is different, but they still have to reimplement much of the same infrastructure. Thomas Gleixner replied that the problem is at the other end: there is no centralized view of the testing that is happening. There are far more than seven systems, he said; many companies have their own internal testing operations. There is no way to figure out whether the same problem has been observed by multiple systems. Some years ago, the kerneloops site provided a clear view of where the problem hotspots were; now a developer might get a random email and can't correlate reports even when they refer to the same problem.
Ts'o said that these systems will have to learn to talk to each other, which will require a lot of engineering work. But beyond that, even the systems we have now appear to be overwhelmed. Once upon a time, the 0day robot was effective, testing patches and sending results within hours. Now he will get a report five days later for a bug in a patch that has already been superseded by two new versions. Since there is no unique ID attached to patches, there is no way for the testing system to recognize updated versions. Gerrit has solved this problem, but "everybody hates it", and there is little acceptance of change IDs for patches. It all works nicely within Google, he said, but it requires a lot of internal infrastructure. He wondered whether the kernel community could ever have the same thing.
Dave Miller said that many companies and individuals are replicating this kind of testing infrastructure; that is the source of the scalability problem. Now, he said, he has to merge patches in batches before doing a test build; if something is bad, he will lose a bunch of time unwinding that work. He would much rather get pull requests with build reports attached so that he can act on them without running into trivial problems. The 0day robot used to help in that regard, but it has lost its effectiveness. Abbott wondered if all this effort could be centralized somewhere; given that the kernelci.org effort is moving into the Linux Foundation, perhaps efforts could be focused there.
Buy-in needed
Vyukov continued by agreeing that, when a maintainer receives a change for merging, they should know that it has passed the tests. Applying should be a simple matter of saying that it is ready to go in during the next merge window. But when individual subsystems try to improve their processes by switching to centralized hosting sites, they just make things worse for the community as a whole by increasing fragmentation. He doesn't know how to fix all of this, but he does know that it has to be an explicit effort with buy-in across the community. There should be some sort of working group, he said; the proposal from Konstantin Ryabitsev could be a good foundation to build on.
Alexei Starovoitov was quick to say that no sort of community-wide buy-in to a new system is going to happen; people have too many strong opinions for that. But, if there is a better tool out there, he will try it. The discussion so far has been all "doom and gloom" he said, but the truth is that the kernel has been getting better and development is getting easier; we are rolling out kernels quickly and each is better than the one that came before. It quickly became clear that this view was not universally shared across the room. Steve Rostedt did acknowledge, though, that the -rc1 releases have become more stable than they once were; he credited Vyukov's syzbot work for helping in that regard.
Dave Airlie pointed out that, after all these years, we still don't have universal adoption of basic tools like Git. Linus Torvalds said that email is still a wonderful tool for stuff that is in development and in need of discussion; work at that stage can't really be put into Git. Miller agreed that email is "fantastic" for early-stage code, but pointed out that the usability of email as a whole is no longer under our control.
Starting with patchwork
Torvalds said that the discussion made it sound like the sky is falling. Our current automated testing infrastructure generates a lot of "crap", but 1% of it is "gold". He encouraged the room to concentrate on concrete solutions to the problems; he liked Ryabitsev's suggestion, which starts by improving the patchwork system. That, he said, should be something that everybody can agree on. Airlie said that the freedesktop.org community has done this, though, with fully funded improvements to patchwork, but it is still "an unmanageable mess" that loses patches and has a number of other problems. Miller said that he was one of the first users of patchwork in the beginning. Back then, the patchwork developer was enthusiastic about improving the system to make developers' lives better. But the situation has long since changed, and it is hard to get patchwork improvements now.
Torvalds said that, if patchwork were to get smarter, more people might use it. There are ways that developers could help it work better. His life got easier, he said, when Andrew Morton started telling him the base for the patches he was sending for merging; the same could be done for patchwork. But patchwork is focused on email, and Miller argued that "email's days are numbered". A full 90% of linux-kernel subscribers are using Gmail, he said, and Google is turning email into a social network with a web site. That gives Google a lot of control over what the community can do.
Ts'o said that, rather than focusing on a specific tool, it would be better to talk about what works. Gerrit tracks the base of patches now and can easily show the latest version of any given patch series. Patchwork requires a lot more manual work, with result that he has thousands of messages there that he is unlikely to ever get to. Olof Johansson pointed out that Gerrit only understands individual patches and cannot track a series, which is a problem for the kernel community. Peter Zijlstra, said that, instead, its biggest problem is that it is web-based. Miller replied that he wants new developers to have a web form they can use to write commit messages; he spends a lot of time now correcting email formatting. Gleixner said that the content of the messages is the real problem, but Miller insisted that developers, especially drive-by contributors, do have trouble with the mechanics.
Git as the transport layer
If patchwork could put multiple versions of a patch series into a Git repository, Ts'o said, it would enable a lot of interesting functionality, such as showing the differences between versions. That is something Gerrit can do now, and it makes life easier. Torvalds said that about half of kernel maintainers are using patchwork; there is no need to enforce its use, but it is a good starting point for future work that people can live with. But, he repeated, there needs to be a concrete goal rather than the vague complaining about the process that has been going on for years. Ryabitsev's proposal might be a good starting point, he said.
Greg Kroah-Hartman agreed that patchwork would be a good foundation to build on. But it's not the whole solution. For continuous integration, he said, the focus should be on kernelci.org; it's the only system out there that is not closed. Johansson, though, said that he does not want to have to go into both patchwork and kernelci.org to see whether something works or not.
One problem with systems like patchwork is the inability to work with them offline. Miller said that, if patchwork stored its data in a Git repository, developers could pull the latest version before getting onto a plane and everybody would be happy. Hellwig said that he has never understood why people like patchwork. It would be better, he continued, to agree on a data format rather than a specific tool.
Ryabitsev worried that centralized tools would make "a tasty target" for attackers and should perhaps be avoided. He also said that, with regard to data formats, the public-inbox system used to implement lore.kernel.org can provide a mailing-list archive as a Git repository. Torvalds said that lore.kernel.org works so well for him that he is considering unsubscribing from linux-kernel entirely. Ts'o said that a number of interesting possibilities open up if Git is used as the transport layer for some future tool. Among other things, Ryabitsev has already done a lot of work at kernel.org to provide control over who can push to a specific repository. Ryabitsev remains leery of creating a centralized site for kernel development, though.
As the discussion wound down, Abbott suggested that what is needed is a kernel DevOps team populated with developers who are good at creating that sort of infrastructure. Hellwig put in a good word for the Debian bug-tracking system, which allows most things to be done using email. Ts'o summarized the requirements he had heard so far: tools must be compatible with email-based patch review and must work offline. If the requirements can be set down, he said, perhaps developers will come along to implement them, and perhaps funding can be found as well.
The session closed with the creation of a new "workflows" mailing list on vger.kernel.org where developers can discuss how they work and share their scripts. That seems likely to be the place where this conversation will continue going forward.
[Your editor thanks the Linux Foundation, LWN's travel sponsor, for
supporting travel to this event.]
Index entries for this article | |
---|---|
Kernel | Development model/Patch management |
Kernel | Development tools/Testing |
Conference | Kernel Maintainers Summit/2019 |
Posted Sep 14, 2019 9:34 UTC (Sat)
by cyphar (subscriber, #110703)
[Link]
Posted Sep 14, 2019 9:43 UTC (Sat)
by pbonzini (subscriber, #60935)
[Link]
Let me introduce Patchew! Patchew was started when patchwork seemed to be mostly dead, so it is somewhat similar to Patchwork 2.0. But it has some nice functionality such as version comparison, pushing each submitted series to git (complete with Reviewed-by tags and the like), simple integration with testing and a REST API. It should also be quite easy to write new plugins to automatically parse syzbot or 0day emails and turn them into test failures.
Posted Sep 14, 2019 16:04 UTC (Sat)
by spwhitton (subscriber, #71678)
[Link] (2 responses)
A lot of the challenges discussed in this article are being thought about by people working on <https://sourcehut.org/>. In particular, CI and series tracking for e-mail based workflows.
Posted Sep 14, 2019 19:08 UTC (Sat)
by wiktor (guest, #132450)
[Link] (1 responses)
Another tool that provides data format first and uses git itself as a storage for code review is git-appraise: https://github.com/google/git-appraise
Posted Sep 14, 2019 19:20 UTC (Sat)
by spwhitton (subscriber, #71678)
[Link]
Posted Sep 14, 2019 17:17 UTC (Sat)
by paravoid (subscriber, #32869)
[Link] (8 responses)
...what? debbugs is a very old piece of software, and it shows. It's not scalable (ironically, bug pages for packages like Linux are among the ones that suffer the most), lacks even relatively simple features (cross-package bugs), lacks any integration with any other tooling, and is confusing to interact with for even the most experienced developers. The fact that changes happen through a custom command language, cannot be previewed before acted upon, are not realtime (you have to wait for your email to be processed) and often take a while to take full effect (due to caching) adds to the confusion. IIRC there were (heroic) attempts in the past to add a... SOAP interface to it but I don't think it's being used much (I may be wrong).
I've been using the Debian BTS for 15 years and I often still struggle. It's probably one of the biggest demotivators I have while working in Debian. That's not a criticism to its maintainers - it was probably great in the 90s or early 00s and has improved a lot in the past few years, but it still pales in comparison to the tools that exist out there today (Phabricator/GitHub/GitLab etc.). Plus it really is a bug tracking system, not a patch management system, so it feels entirely off-topic to the discussion at hand (and my understanding is that Linux has Bugzilla for a BTS?). Patch management-wise, Debian never used its BTS for anything but drive-by small patch submission attached to emails. Most of us actually switched to a Debian-hosted version of GitLab for repository hosting, merge request management & some (limited) CI recently and it has been an exciting journey.
Beyond that, it has always puzzled me how the Linux kernel community -the community that invented and popularized git!- has remained so far behind in its tooling, and stuck in old ways. Bug tracking, code review, patch management/tracking, and CI all seem intertwined with each other and really require unified tooling to manage adequately, for seasoned and new developers alike. None of the tools out there are perfect (far from it), but the majority of them are /far/ better than tracking patches in emails. It amazes me that all this is controversial and would love to understand the reasons behind it better.
Posted Sep 14, 2019 17:27 UTC (Sat)
by mpr22 (subscriber, #60784)
[Link] (3 responses)
Posted Sep 14, 2019 22:23 UTC (Sat)
by marcH (subscriber, #57642)
[Link] (2 responses)
That's the essence of spam: I wonder how it's prevented in this particular case.
Posted Sep 15, 2019 2:24 UTC (Sun)
by lsl (subscriber, #86508)
[Link] (1 responses)
Posted Sep 15, 2019 12:04 UTC (Sun)
by cjwatson (subscriber, #7322)
[Link]
(Systems that require accounts aren't immune to spam by any means, but having messages more systematically linked to identities does make some things a lot easier.)
Posted Sep 14, 2019 22:04 UTC (Sat)
by marcH (subscriber, #57642)
[Link]
Good news, you came to best place: https://www.google.com/search?q=site%3Alwn.net+email
Posted Sep 15, 2019 2:52 UTC (Sun)
by roc (subscriber, #30627)
[Link]
Posted Sep 15, 2019 4:03 UTC (Sun)
by pabs (subscriber, #43278)
[Link]
Not sure what you mean by that, but with debbugs there are a few things that seem related to what you said:
Bugs can be in package foo but marked as affecting package bar.
You can assign a bug to multiple packages when a change in any single package can fix the issue.
You can mark a bug as being blocked by another bug.
You can clone a bug and then reassign the clone, updating the blocked info at the same time.
The usertags stuff seems relevant too.
Posted Sep 15, 2019 23:53 UTC (Sun)
by neilbrown (subscriber, #359)
[Link]
Are they? I never really had any trouble tracking patches with email(*).
(*) the only real weakness with email is that I would sometimes miss patches. A gentle reminder from the sender after a few weeks of silence always got things moving again and improved the sense of community - some times humans are better than mechanical solutions.
Posted Sep 14, 2019 22:20 UTC (Sat)
by marcH (subscriber, #57642)
[Link]
Seven is too many. One would be too little. Not just to keep some competition alive but simply because the kernel is a really huge project and one size doesn't fit all.
If you think test suites for Wifi, filesystems, graphics and schedulers can re-use validation tools and test code then... it's likely these common things are not even specific to the Linux kernel at all.
Same for bug trackers: re-use is always good but a little diversity and competition doesn't hurt either and again a Linux Wifi bug and a Linux filesystem bug don't really seem much closer to each other than to a bug in openssh or whatever. Granted: they can be tracked on the same git history. Who cares, I'm not even using that filesystem anyway.
Even if all areas of the kernel ever use the exact same tools and processes one day, they should still run different _instances_ to keep things under reasonable and manageable size. Who still reads the lkml? People have been using more focused mailing lists for ages.
Posted Sep 14, 2019 22:36 UTC (Sat)
by marcH (subscriber, #57642)
[Link]
That's the core issue. Tools and design choices will keep people bikesh... fascinated and busy discussing for years on end but without some actual and skilled manpower these discussions will forever keep going around in circles.
Speaking of manpower, so-called "DevOps" and validation roles have since forever been less respected/funded/promoted than nobler "developer" roles and the Linux kernel is not much worse than the rest of the industry. Maybe it's changing? Very slowly as any culture change. I think we're starting to see conferences on these topics, did any mention the kernel?
> The first session at the 2019 Linux Kernel Maintainers Summit was a *last-minute addition to the schedule*.
Emphasis mine :-)
Posted Sep 15, 2019 1:51 UTC (Sun)
by neilbrown (subscriber, #359)
[Link] (12 responses)
Are they? How sad. You would have thought that we would have learned from the bit-keeper fiasco that depending on non-free tools as a bad idea. It seems not.
Posted Sep 15, 2019 2:08 UTC (Sun)
by marcH (subscriber, #57642)
[Link] (3 responses)
Posted Sep 15, 2019 15:37 UTC (Sun)
by k3ninho (subscriber, #50375)
[Link] (2 responses)
I can't speak to _unique_ but the "feature" is not rejecting as spam the stuff you've sent from your own mail server. The arc of this development can be read at ex-Symbolics Lisp/ex-Netscape hacker Jamie Zawinski's blog -- https://www.jwz.org/blog/tag/mail/ -- and shows that running your own email servers and communicating widely is hampered by proprietorial relays including Gmail, Microsoft, GoDaddy, Earthlink, Dreamhost and more. I get that spam has broken e-mail and that the paradox of allowing people to post to the SMTP-based social network means that you have to filter later if the posts made were unwanted or malicious.
Kernel development is a social network with some patch transport and discussion in messages and concrete source trees backing these up. Metadata like 'acked-by' and which systems tested the changes or how to replicate found bugs, that stuff needs to annotate the patches and be searchable, then reputations need to be used to retain a high signal-to-noise ratio. And because it's a social network, where Linus (among others) goes, people will follow.
K3n.
Posted Sep 16, 2019 9:02 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link] (1 responses)
Posted Sep 17, 2019 11:58 UTC (Tue)
by broonie (subscriber, #7078)
[Link]
Posted Sep 16, 2019 11:09 UTC (Mon)
by idrys (subscriber, #4347)
[Link] (7 responses)
> Are they? How sad. You would have thought that we would have learned from the bit-keeper fiasco that depending on non-free tools as a bad idea. It seems not.
I wonder how many of them are using Gmail because of corporate mail setups that suck at dealing with large amounts of mail.
Also, judging from the patches I see sent by Patch Author <author@gmail.com> with the author being Patch Author <author@bigcorp.com>, many corporate mail setups obviously suck at sending stuff as well, and people fall back to Gmail...
Posted Sep 16, 2019 14:08 UTC (Mon)
by dezgeg (subscriber, #92243)
[Link] (6 responses)
Posted Sep 16, 2019 14:47 UTC (Mon)
by pizza (subscriber, #46)
[Link]
Posted Sep 18, 2019 6:12 UTC (Wed)
by neilbrown (subscriber, #359)
[Link] (2 responses)
There is a proposal floating around to have an open git server and allow anyone to submit patches using "git push" instead of "git-send-email".
Posted Sep 19, 2019 1:53 UTC (Thu)
by pabs (subscriber, #43278)
[Link] (1 responses)
https://public-inbox.org/git/1486427537.16949.42.camel@bo...
Posted Sep 19, 2019 4:35 UTC (Thu)
by neilbrown (subscriber, #359)
[Link]
Yes, that's the one.
I think spam is a solvable problem.
If the signing key doesn't have reputation, the pushed information is limited in some way and it not forwarded to any (public) email lists.
You wrote that "support in git-daemon is needed" but I think it provides everything you need. You can certainly enable anonymous push, and there are hooks that allow you to check any change to the repo before it happens, so you should be able to prototype something yourself.
Posted Sep 19, 2019 12:43 UTC (Thu)
by jgg (subscriber, #55211)
[Link] (1 responses)
Where Linux really falls down is that all the usual tools we use for email don't support OAUTH2 - so you often can't actually login to the corp email server anyhow (be it gmail or office365 based). Sigh.
And of course Office365 apparently doesn't support OAUTH for IMAP, only SMTP!
I lost all hope when I saw the Linux team at Microsoft had to setup their own email server.
Posted Sep 19, 2019 18:20 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Sep 16, 2019 12:47 UTC (Mon)
by intgr (subscriber, #39733)
[Link] (3 responses)
But what happened to kerneloops anyway? I remember setting up the daemon on every one of my machines and servers. Only to discover later that it was defunct. Was it just that nobody cared? Surely if someone cared, the kernel community would be able to find the resources to keep it running?
Posted Sep 17, 2019 6:35 UTC (Tue)
by pabs (subscriber, #43278)
[Link]
Posted Sep 17, 2019 20:52 UTC (Tue)
by meyert (subscriber, #32097)
[Link] (1 responses)
Posted Mar 13, 2020 23:42 UTC (Fri)
by pabs (subscriber, #43278)
[Link]
Posted Sep 19, 2019 23:20 UTC (Thu)
by dbkm11 (guest, #125598)
[Link]
Checkout the tool https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/... , it basically fetches mails via lore.kernel.org archives and places them into maildir format for email clients (like mutt et al). This would in fact allow to completely unsubscribe. ;-)
Posted Sep 21, 2019 19:10 UTC (Sat)
by smitty_one_each (subscriber, #28989)
[Link]
Defragmenting the kernel development process
Defragmenting the kernel development process
> If patchwork could put multiple versions of a patch series into a Git repository, Ts'o said, it would enable a lot of interesting functionality, such as showing the differences between versions.
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
To be a contender to the established method, a new tool must not only be better (which I agree that some are), it must also be no worse. I find them all to be worse (as well as better).
Defragmenting the kernel development process
> ...
> There are at least seven major testing systems out there (he listed 0day, kernelci.org, CKI, LKFT, ktest, syzbot, and kerneltests) when we should just have one good system.
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Exactly how that would work I'm not sure. but one of the things that I do like about gerrit is that submitting patches with "git push" is quite easy.
Maybe that initiative could tip the balance away from gmail.
Defragmenting the kernel development process
Defragmenting the kernel development process
If you required the HEAD pushed to always have a signed tag, then you would have a basis for establishing reputation (and you would encourage tag signing, which is a *good* *thing*).
In that case, the author needs to copy some text that was returned by the server into an email message - which can be sent with any old MUA. IT would include a link to find the patch.
I'm not sure how reputation would be gained or revoke, but some combination of automatic flow analysis and crowd-sourcing (if 3 reputable keys vouch for a new key, it gets reputation?)
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process
Defragmenting the kernel development process