Pulling GitHub into the kernel process
There is an ongoing effort to "modernize" the kernel-development process; so far, the focus has been on providing better tools that can streamline the usual email-based workflow. But that "email-based" part has proven to be problematic for some potential contributors, especially those who might want to simply submit a small bug fix and are not interested in getting set up with that workflow. The project-hosting "forge" sites, like GitHub and GitLab, provide a nearly frictionless path for these kinds of one-off contributions, but they do not mesh well—at all, really—with most of mainline kernel development. There is some ongoing work that may change all of that, however.
Konstantin Ryabitsev at the Linux Foundation has been spearheading much of this work going back at least as far as his September 2019 draft proposal for better kernel tooling. Those ideas were discussed at the 2019 Kernel Maintainers Summit and at a meeting at Open Source Summit Europe 2019 in October. Throughout, Ryabitsev has been looking at ways to make it easier for non-email patch submitters; along the way, he has also released the b4 tool for collecting up patches and worked on patch attestation.
A recent post
to the kernel workflows mailing list shows some progress toward a bot that can turn
a GitHub pull request (PR) into a well-formed patch series to send to the
proper reviewers and mailing lists. "This would be a one-way
operation, effectively turning Github into a
fancy 'git-send-email' replacement.
" He also laid out some of the
benefits that this bot could provide both for maintainers and patch
submitters:
- submitters would no longer need to navigate their way around git-format-patch, get_maintainer.pl, and git-send-email -- nor would need to have a patch-friendly outgoing mail gateway to properly contribute patches
- subsystem maintainers can configure whatever CI pre-checks they want before the series is sent to them for review (and we can work on a library of Github actions, so nobody needs to reimplement checkpatch.pl multiple times)
- the bot should (eventually) be clever enough to automatically track v1..vX on pull request updates, assuming the API makes it straightforward
He had some questions about whether the bot should be centralized in a single repository (per forge platform) that would serve as the single submission point, or whether subsystem maintainers would want to configure their own repositories. The latter would give maintainers the opportunity to set their own criteria for checks that would need to pass (e.g. checkpatch.pl) before the PR was considered valid, but would mean that they might have to ride herd on the repository as well.
In addition, Ryabitsev wondered when and how PRs would get closed. The bot
could potentially monitor the mainline and auto-close PRs once the patch set
gets merged, but that won't be perfect, of course. An easier approach for
him would be "to auto-close the pull request right after it's sent to the list with
a message like 'thank you, please monitor your email for the rest of the
process'
", but he was unsure if that would be best.
As might be guessed, reactions from those participating in the thread were
all over the map. While there is a lack of many kinds of diversity within
the kernel community, opinions on development workflow—opinions, in
general, in truth—do not have that problem. Some maintainers have zero
interest in this kind of effort at all. As Christoph Hellwig put it:
"Please opt all subsystems I maintain out of this crap. The last
thing
I need is patches from people that can't deal with a sane workflow.
"
Hellwig's complaint, which Jiri Kosina agreed with, may be more about the expectations of those who use GitHub (and the like), and less about the possibility of having a web-based interface to kernel development. Dmitry Vyukov asked why Hellwig and Kosina would be unwilling to accept patches from the system if they cannot really distinguish them from a regular submission. Vyukov said that he is currently experiencing a Git email submission problem that he is uninterested in working around, so he can see why others might be similarly inclined. Meanwhile, though, he sees benefits from this kind of bot:
On the other hand this workflow has the potential to ensure that you never need to remind to run checkpatch.pl, nor spend time on writing code formatting comments and re-reviewing v2 because code formatting will be enforced, etc. So I see how this is actually beneficial for maintainers.
Hellwig is not
opposed to a web-based solution, though he wants nothing to do with
GitHub. But Ryabitsev seems
uninterested in "reimplementing a lot of stuff that we already get 'for
free' from Github and other forges
". Both Mark Brown
and Laurent
Pinchart suggested that there are mismatches between GitHub-normal practices
and those of the kernel community. Pinchart mentioned the inability to
comment on a patch's commit message on GitHub as something that generally leads
to poor messages; the platform is training these developers to a certain
extent:
Developers who have only been exposed to those platforms are very likely to never have learnt the importance of commit messages, and of proper split of changes across commits. Those are issues that are inherent to those platforms and that we will likely need to handle in an automated way (at least to some extent) or maintainers will become crazy [...]
But Miguel Ojeda thinks
that it is really no different from new developers showing up on the
mailing list with patches. "The same happens in the LKML -- some
people have sent bad messages,
but we correct them and they learn.
" He also noted
that automated checking of patches can help both developers and
maintainers:
[...] it is particularly useful to teach newcomers and to save time for maintainers having to explain things. Even if a maintainer has a set of email templates for the usual things, it takes time vs. not even having to read the email.
Ojeda is working on the Rust for Linux project, which we looked at back in April; he said that he has also been working on a bot:
For Rust for Linux, I have a GitHub bot that reviews PRs and spots the usual mistakes in commit messages (tags, formatting, lkml vs. lore links, that sort of thing). It has been very effective so far to teach newcomers how to follow the kernel development process.I am also extending it to take Acks, Reviewed-by's, Tested-by's, etc., and then performing the merge only if the CI passes (which includes running tests under QEMU, code formatting, lints, etc.) after applying each patch.
But Ojeda is taking things in a rather different direction than what Ryabitsev is envisioning. Ojeda wants to move the main place for patch review and the like from the mailing lists to GitHub. He is also considering having his bot pick up patches from the mailing list and turning them into GitHub PRs—the reverse of what Ryabitsev is doing.
For his part, Ryabitsev said:
"That's pretty cool, but I'm opposed to this on theological
grounds. :)
" In particular, he is concerned about the "single point
of failure" problem for the kernel-development infrastructure. If his bot
is unavailable for any reason, it may be inconvenient for those who use it,
but that will not hobble development. He sees GitHub as simply a
"developer frontend tool
".
Somewhat similar to Ojeda's intentions, Brendan Higgins has a tool to pick up patches from a mailing list (kselftest in this case) and upload them to a Gerrit instance. He sees some potential synergies between his bot and the one Ryabitsev is working on. Similarly, Drew DeVault has been working on the reverse direction, from a mailing list to a project forge, as well. Patchwork is a longstanding code-review project that also collects up patches from mailing lists to populate a web application. It would seem that much of the focus is on getting patches out of mailing lists, though, which is not where Ryabitsev is headed.
While some maintainers want no part of this "GitHub Future", others are
enthusiastic about the possibilities it could bring. Vyukov thinks
that having a single GitHub repository with multiple branches will help
consolidate the kernel-development landscape, which is currently fragmented
on subsystem lines. He sees it as an opportunity to apply consistent
coding-style standards; it does not matter which, he said, "as long as it's
consistent across the project
". It would also allow testing
consistency throughout the tree and the same for the development process:
For once: it will be possible to have proper documentation on the process (as compared to current per-subsystem rules, which are usually not documented again because of low RoI [return on investment] for anything related to a single subsystem only).
It is not at all clear that Vyukov's interest in consistency throughout the tree is shared widely, but there have certainly been complaints along the way about the difficulty of navigating between the different subsystem processes and requirements for submissions. There is also interest in making things easier for quick, one-off contributions; as Ryabitsev put it:
Our code review process must also allow for what is effectively a "report a typo" link. Currently, this is extremely onerous for anyone, as a 15-minute affair suddenly becomes a herculean effort. The goal of this work is to make drive-by patches easier without also burying maintainers under a pile of junk submissions.
Clearly keeping "junk submissions" to a bare minimum is going to be important. Linus Torvalds said that he has had to turn off email from GitHub because it is too noisy; people have apparently signed him up as a project member without any kind of opt-in check. Beyond that, any kind of patch submission from PRs would need to have some sanity checks, including size limits, so that PRs like one pointed out by Ryabitsev do not end up on the mailing list.
That kind of PR highlights another problem: repository maintenance. Greg Kroah-Hartman said that there will be a need to monitor whatever repositories are being used for this purpose. It is not a small task:
What ever repo you put this on, it's going to take constant maintenance to keep it up to date and prune out the PRs that are going to accumulate there, as well as deal with the obvious spam and abuse issues that popular trees always accumulate.
Torvalds does not want his GitHub tree used for this purpose and
Kroah-Hartman said the same. However it plays out, someone will have to be
tasked with keeping the repository tidy, which is "a thankless task
that will take constant work
". But Ryabitsev is hopeful
that the Linux Foundation could fund that kind of work if it becomes
necessary.
In the end, it will likely come down to how seamlessly the GitHub bot fits in. If maintainers truly cannot really tell the difference in any substantive way, it is hard to see many of them rejecting well-formed patches that fix real problems in their subsystems. That ideal may not be reached right away, however, which might lead to a premature end to the experiment. It will be interesting to see it all play out over the coming months and years.
| Index entries for this article | |
|---|---|
| Kernel | Development tools/Forges |
