Patch flow into the mainline for 4.14

By Jonathan Corbet
October 24, 2017

There is a lot of information buried in the kernel's Git repositories that, if one looks closely enough, can yield insights into how the development community works in the real world. It can show how the idealized hierarchical model of the kernel development community matches what actually happens and provide a picture of how the community's web of trust is used to verify contributions. Read on for an analysis of the merge operations that went into the 4.14 development cycle.

The diagram to the right was generated from the commits merged for the 4.14 release, through 4.14-rc5. It is unfortunately dense; click on the image to get a version that has a chance of being legible. In short, it shows all of the subsystem trees that were pulled into the mainline and the number of patches that flowed out of each.

LWN has posted these diagrams a couple of times in the past, for the 2.6.29 and 4.4 development cycles. They have always shown a structure that is far flatter than the hierarchical maintainer model would suggest. In the real world, mid-level maintainers are relatively rare; most maintainers send pull requests directly to Linus Torvalds. Doing so helps to get changes into the mainline more quickly; that is why, for example, some security-module maintainers recently decided to bypass the security maintainer and push their trees directly to Torvalds.

That said, the hierarchy shows more clearly than it has in past years. A number of subsystems are growing to the point where there needs to be some overall higher-level coordination. So there are more two and three-level trees than there used to be. As the kernel community continues to grow, it will almost certainly need to add more mid-level maintainers.

Signing of pull requests

Diagrams like this one can be interesting to look at just to see how work is flowing through the system. But they can also be used to reveal semi-hidden aspects of how that work is being done. This time around, your editor has decided to put a focus on the security of the process.

Shortly after the 3.0 kernel was released, it was revealed that kernel.org, where many kernel developers (including Torvalds) keep their repositories, had been broken into. This episode brought the merging of patches to a halt for some time and delayed the 3.1 release by some months; it also created a great deal of concern over the possibility that somebody's repository might have been corrupted in an attempt to get malicious code into the mainline kernel. No evidence of that happening ever turned up, but the realization that it maybe could have happened drove a number of changes in the development community.

One of those changes was the establishment of a web of trust among kernel developers; at the 2011 Kernel Summit in Prague, an initial key-signing ritual was held to bootstrap that web. The ability to GPG-sign commits and tags was added to Git. One need merely tag the commit at the head of a series to be pulled with a command like:

    git tag -s fixes-for-linus

and request that the fixes-for-linus tag be pulled. If the receiving maintainer pulls with the --verify-signatures option, Git will ensure that a valid signature exists before doing the merge.

The idea was that developers would sign their repositories before sending pull requests, allowing upstream maintainers to verify that those pull requests corresponded to legitimate streams of development. Even if an attacker could put up a convincing copy of a developer's repository (or somehow add a malicious commit to a real repository) and send a fake pull request, the attack would not succeed because the attacker would not be able to attach a proper signature to the relevant tag.

This system has been in place for six years now, and many developers routinely sign tags for outgoing commits and verify signatures when pulling from others. But do they all do so? It is possible to find out. When a signed commit or tag is pulled into a repository, the signature is stashed into the merge commit, allowing the provenance of the changes to be verified at a later date. That also makes it possible to examine the merges in the kernel repository and see how many of them carry signature information.

Referring back to the tree plot on the right, one will see that some repositories are shown in black boxes, while others use red boxes. The repositories in red are those from which no signed merges happened during the period in question. The results show that, while many developers do sign their tags before sending changes upstream, quite a few do not.

More to the point, the repository that sends more traffic into the mainline than any other — networking — makes almost no use of signatures anywhere in the chain. The "tip" tree (containing x86 and core-kernel work) is another significant tree that does not employ signatures, as is the linux-block tree. Neither the security tree nor the crypto tree employ cryptographic signatures. Pull requests from the graphics tree into the mainline are signed, but many of the trees feeding into graphics do not use signatures. On the other hand, some high-volume trees, such as arm-soc, have almost complete signature coverage from the leaves up to the mainline.

Years of traffic on the kernel mailing lists suggests that maintainers rarely ask for signatures to be added to pull requests that lack them. Torvalds will typically demand it when the tree being pulled is hosted on a public service like GitHub, but is otherwise happy to pull from unsigned tags. He does verify signatures when they do exist, though. Few other maintainers require (or even mention) signatures at all.

Your editor asked around a bit to get a sense for why some maintainers are not using signed tags. The answer was typically along the lines of "I never got around to incorporating them into my workflow". One maintainer admitted that he had probably forgotten the passphrase for his GPG key by now and would have to create a new one to be able to start signing tags. The problem, if there is one, is not any real hostility to the idea of signed commits. It is just that, since signatures are not required, many busy subsystem maintainers have not made the effort to start using them.

The result is that the kernel has a web of trust that, one might fairly conclude, is not really protecting much. It's nice to have the verification on pull requests that do carry signatures but, since those signatures seem to be almost entirely optional at present, they offer little protection against a malicious pull request.

If the intent of signed tags is limited to enabling developers to host repositories on untrusted services, then perhaps signature checking as it is practiced now is sufficient. Perhaps the threat model need not include more sophisticated attackers trying to sneak vulnerabilities into the kernel via some developer's tree on a well-run site. After all, kernel.org itself seems relatively well protected these days, and kernel developers have demonstrated that, like developers of most other projects, they are entirely capable of introducing security bugs at a sufficient rate without external assistance.

But if the intent is to make the kernel development process resilient against attacks on developers' machines or kernel.org, then there is some work yet to be done. It is worth remembering that the web of trust came about as a response to a compromise of kernel.org, after all. If we want to prepare for a recurrence of that sort of incident, the actual threat model needs to be defined, and the use of protective techniques like signed tags should probably not be optional. Partially implemented security mechanisms have a distressing tendency to fail when put to the test.

(The plot in this article was generated with the treeplot tool, which is part of the gitdm collection of hacks hosted at git://git.lwn.net/gitdm.git).

Index entries for this article
Kernel	Releases/4.14
Kernel	Security/Patch verification

Patch flow into the mainline for 4.14

Posted Oct 24, 2017 18:46 UTC (Tue) by seanyoung (subscriber, #28711) [Link] (8 responses)

Note that merge requests for the media tree are cherry-picked, not merged. I guess this is why the linux-media patch flow has no children in the graph.

Patch flow into the mainline for 4.14

Posted Oct 24, 2017 20:54 UTC (Tue) by nevets (subscriber, #11875) [Link] (3 responses)

Ug, That's even worse with respect to security. Unless you scrutinize each patch that is cherry picked, then it's no different than a work flow that takes only patches from email.

Patch flow into the mainline for 4.14

Posted Oct 25, 2017 10:13 UTC (Wed) by seanyoung (subscriber, #28711) [Link] (2 responses)

Every commit on every pull request is reviewed and cross-referenced with the corresponding patch on patchwork. Then, on top of that, the original submitter and sub-maintainer will likely check what goes into master.
There are way too many eyeballs for anything to slip through.
This is not a problem.

Patch flow into the mainline for 4.14

Posted Oct 25, 2017 16:28 UTC (Wed) by smurf (subscriber, #17840) [Link]

> There are way too many eyeballs for anything to slip through.

Not if the threat model includes innocuous-seeming feature patches which include non-features.

Numerous contests have been held on the topic of how to write C code with obfuscated, plausibly-deniable security holes.

Patch flow into the mainline for 4.14

Posted Oct 25, 2017 23:56 UTC (Wed) by ajdlinux (subscriber, #82125) [Link]

Maybe in some subsystems. Not the case kernel-wide.

Patch flow into the mainline for 4.14

Posted Oct 27, 2017 10:00 UTC (Fri) by epa (subscriber, #39769) [Link] (3 responses)

Could git grow some secondary checking for cherry-picked commits? Like, the commit would say 'cherry-picked from abcde' and then you could optionally run something which makes sure the diff being applied in this commit is the same as that from abcde. If not, it would be flagged for extra attention.

Patch flow into the mainline for 4.14

Posted Oct 27, 2017 15:10 UTC (Fri) by seanyoung (subscriber, #28711) [Link] (1 responses)

One of the reasons for cherry-pick is to be able to drop patches, fix commit messages or other cosmetic changes that maintainers do sometimes.

Patch flow into the mainline for 4.14

Posted Oct 27, 2017 19:36 UTC (Fri) by epa (subscriber, #39769) [Link]

Agreed! Another reason is to pick out a particular change without all the ones that went before it. So, given a particular commit and the other one it was cherry-picked from, it should be possible to check that the diff is the same, while ignoring the commit message, whitespace changes in the file content and other things which cause the SHA to be different but aren't important for this comparison.

Patch flow into the mainline for 4.14

Posted Oct 28, 2017 0:10 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

There's the -x flag to cherry-pick to do that. Note that conflicts will trip this up.

Patch flow into the mainline for 4.14

Posted Oct 25, 2017 2:59 UTC (Wed) by unixbhaskar (guest, #44758) [Link] (3 responses)

Making all commit from everyone as mandatory to be signed..otherwise refused to be pulled in or merged in the mainline.Sounds harsh, but that is what it should be.I believe may wise heads are there already thinking in that line and am surprised not yet imposed or implemented. Love to know the constraints.

Patch flow into the mainline for 4.14

Posted Oct 27, 2017 3:37 UTC (Fri) by flussence (guest, #85566) [Link] (2 responses)

Signing in git really isn't as hard or scary as people think it is. Make a key if necessary, configure gpg-agent so it caches key passwords for at least a few seconds (or else rebases will be painful), and set commit.gpgSign.

The only recurring effort is re-entering passwords, but there's nothing to stop you setting gpg-agent's cache time really high if it gets annoying.

Patch flow into the mainline for 4.14

Posted Oct 27, 2017 12:45 UTC (Fri) by JFlorian (guest, #49650) [Link]

In general use of gpg-agent, I wish the cache time could be dynamic. So, say it starts with a default of 10m. I use it immediately for a key and then again at 8m into that lifetime. Here it would be nice to get an automatic extension of another 8m and so on until it does finally timeout due to no use. I think that would be much more convenient and likely more secure simply because might mean fewer people use reall high timeout values. Better convenience might also translate to higher adoption rates.

Patch flow into the mainline for 4.14

Posted Oct 27, 2017 16:40 UTC (Fri) by Creideiki (subscriber, #38747) [Link]

It kind of is, if you want to do it properly. I have some scripts (available at https://github.com/saab-simc-admin/workflow-tools) for maintaining an all-signed workflow, and the amount of corner cases and badly designed interfaces I have to handle is staggering.

Not to mention the fact that since nobody uses signatures, the code isn't tested - libgit2 (which is, among other things, the base for Ruby's Git support) used to corrupt the plaintext of signed commits due to a use-after-free bug.