What's missing from our changelogs
We have a policy of good commit messages in the kernel." Andrew Morton also famously pushes developers to document the reasons explaining why a patch was written, including the user-visible effects of any bugs fixed. Kernel developers do not like having to reverse engineer the intent of a patch years after the fact.
With that context in mind, and having just worked through another merge window's worth of patches, your editor started wondering if our changelogs were always as good as they should be. A bit of scripting later, a picture of sorts has emerged; as one might expect, the results were not always entirely encouraging.
Changelogs
A patch's changelog is divided into three parts: a one-line summary, a detailed change explanation, and a tags section. For the most trivial patches, the one-line summary might suffice; there is not much to add to "add missing include of foo.h", for example. For anything else, one would expect a bit more text describing what is going on. So patches with empty explanation sections should be relatively rare.
As of this writing, just under 70,000 non-merge changesets have been pulled into the mainline repository since the release of the 3.5 kernel on July 21, 2012. Of those, 6,306 had empty explanations — 9% of the total. Many of them were as trivial as one might expect, but others were rather less so.
Some developers are rather more laconic than others. In the period since 3.5, the developers most inclined to omit explanations were:
Developer Count Al Viro 570 Ben Skeggs 224 Mark Brown 213 Hans Verkuil 204 Andreas Gruenbacher 143 Axel Lin 130 Philipp Reisner 126 Antti Palosaari 118 James Smart 107 Alex Deucher 85 Laurent Pinchart 84 Kuninori Morimoto 75 Eric W. Biederman 75 Pavel Shilovsky 72 Rafał Miłecki 72 David S. Miller 65 David Howells 61 Peter Meerwald 61 Maxime Ripard 55 YOSHIFUJI Hideaki 51
For the curious, a page listing the no-explanation patches merged by the above developers is available. A quick look shows that a lot of patches with empty explanations find their way into the core virtual filesystem layer; many of the rest affect graphics drivers, audio drivers, Video4Linux drivers, and the DRBD subsystem. But they can be found anywhere; of the 1,065 changes that touched the mm/ subdirectory, 46 lacked an explanation, for example.
If one believes that there should be fewer patches with empty explanations going into the kernel, one might be inclined to push subsystem maintainers to be a bit more demanding in this regard. But, interestingly, it has become much harder to determine which maintainers have had a hand in directing patches into the kernel.
Signoffs
The Signed-off-by line in the tags section is meant to document the provenance of patches headed into the mainline. When a developer submits a patch, the changelog should contain a signoff certifying that the patch can properly be contributed to the kernel under a GPL-compatible license. Additionally, maintainers who accept patches add their own signoffs documenting that they handled the patch and that they believe it is appropriate for submission to the mainline. In theory, by following the sequence of Signed-off-by lines, it is possible to determine the path that any change followed to get to Linus's tree.
The truth is a little bit more complicated than that. To begin with, of the changes merged since 3.5, 79 had no signoffs at all. Roughly half of those were commits by Linus changing the version number; he does not apply a signoff to such changes, even for those that contain added data beyond the version number update. The rest are all almost certainly mistakes; a handful are the result of obvious formatting errors. See the full list for details. The mistakes are innocent, but they do show a failure of a process which is supposed to disallow patches that have not been signed off by their authors.
Arguably, there is another class of patches that is more interesting: those that contain a single Signed-off-by line. Such patches have, in theory, been managed by a single developer who wrote the patch and got it into the mainline unassisted. One might think that only Linus is in a position to do any such thing; how could anybody else get a change into the mainline on their own?
In fact, of the 70,000 patches pulled into the mainline during the period under discussion, 16,651 had a single signoff line. Of those, 11,527 (16% of the total) had no other tags, like Acked-by, Reviewed-by, or Tested-by, that would indicate attention from at least one other developer. For the purposes of this discussion, only the smaller set of patches has been considered. The most frequent committers of single-signoff patches are:
Developer Count Al Viro 891 Takashi Iwai 525 Mark Brown 492 Johannes Berg 414 Alex Deucher 391 Mauro Carvalho Chehab 389 Ben Skeggs 362 Greg Kroah-Hartman 292 Trond Myklebust 279 David S. Miller 264 Felipe Balbi 259 Tomi Valkeinen 258 Arnaldo Carvalho de Melo 172 Eric W. Biederman 147 Josef Bacik 145 Shawn Guo 142 J. Bruce Fields 141 Ralf Baechle 132 Arnd Bergmann 131 Samuel Ortiz 129
(See this page for a list of the single-signoff patches merged by the above developers). These results are, of course, a result of the use of git combined with the no-rebasing rule. Once a patch has been committed to a public repository, it becomes immutable and can never again acquire tags like Signed-off-by. To pick one example from the list above, wireless developer Johannes Berg maintains his own tree for mac80211 changes; when he commits a patch, it will carry his signoff. Changes flow from that tree to John Linville's wireless tree, then to Dave Miller's networking tree, and finally to the mainline repository. Since each of those moves is done with a Git "pull" operation, no additional signoffs will be attached to any of those patches; they will arrive in the mainline with a single signoff.
One might contend that patches become less subject to review once they enter the Git stream; they can be pulled from one repository to the next sight-unseen. Indeed, early in the BitKeeper era, developers worried that pull requests would be used to slip unreviewed patches into the mainline kernel. Single-signoff patches might be an indication that this is happening. And, indeed, important patches like the addition of the O_TMPFILE option went to the mainline with, as far as your editor can tell, no public posting or review (and no explanation in the changelog, for that matter). It also seems plausible that single-signoff patches merged into the sound subsystem or the Radeon driver (to name a couple of examples) have not been reviewed in detail by anybody other than the author; there just aren't that many people with the interest and skills to review that code.
Without a chain of signoff lines, we lose more than a picture of which maintainers might have reviewed the patches; we also lose track of the path by which a patch finds its way into the mainline. A given changeset may pass through a number of repositories, but those passages leave no mark on the changeset itself. Sometimes that path can be worked out from the mainline repository history, but doing so can be harder than one might imagine, even in the absence of "fast-forward merges" and other actions that obscure that history. Given that the Signed-off-by line was introduced to document how patches get into the kernel, the loss of this information may be a reason for concern.
The kernel community prides itself on its solid foundation of good
procedures, including complete changelogs and a reliable signoff chain.
Most of the time, that pride is entirely justified. But, perhaps, there
might be room for improvement here and there — that is unsurprising when
one considers that no project that merges 70,000 changes in a year can be
expected to do a perfect job with every one of them. Where there is
imperfection, there is room for improvement — though improving the signoff
chain will be difficult as long as the tools do not allow it. But even a
bit more verbiage in commit messages would be appreciated by those of us
who read the patch stream.
Index entries for this article | |
---|---|
Kernel | Changelogs |
Kernel | Development model/Code review |
Posted Jul 24, 2013 21:08 UTC (Wed)
by johill (subscriber, #25196)
[Link]
Most of the time I (used to) post iwlwifi patches to the list for review along with the pull request, but there's no telling who looks at them. If anyone sends an explicit review/ack tag, I will add that (by way of rebasing my submission tree) unless it comes in after my tree was merged, but it's very rare in any case. I've stopped sending the patches for a while but should probably get back to it, I admit that I've been somewhat sloppy here.
Also, for iwlwifi, we have an internal process that typically has at least two people to looking at a patch (usually myself and Emmanuel) but this is only reflected in Reviewed-by tags since there's no "handling" of the patch outside of my tree.
For mac80211 (and cfg80211) this process would not really make sense. I think right now we're also lacking anyone committed to following mac80211/cfg80211 changes and wanting to familiarize themselves with the details; this is also dangerous in other ways, the bus factor is pretty small here. I suspect the same is, unfortunately, true in other subsystems as well.
Posted Jul 24, 2013 21:46 UTC (Wed)
by ohrn (subscriber, #5509)
[Link] (3 responses)
Posted Jul 24, 2013 23:29 UTC (Wed)
by marcH (subscriber, #57642)
[Link]
https://mail.gna.org/public/stgit-users/2013-07/msg00011....
Granted, the security (as implemented by the SHA1 chains) does not make that easy.
Posted Jul 25, 2013 9:40 UTC (Thu)
by anmoch (subscriber, #85760)
[Link]
It seems like the kernel collecting further signoffs and similar tags could provide the latter. I can try to get the ball rolling, but it would help massively if a group of interested kernel devs would volunteer as guinea pigs :-)
(*) you can share them by manually configuring push/fetch specs.
Posted Jul 25, 2013 17:46 UTC (Thu)
by jiiksteri (subscriber, #75247)
[Link]
As
No idea if faked signoff attempts are that big a problem though :)
Posted Jul 24, 2013 22:11 UTC (Wed)
by lrothc (subscriber, #41210)
[Link] (16 responses)
I very often find myself googling for the commit summary string as a way to figure out what has really led to the patch in question.
Now, how that could be incorporated in the commit log beats me.
Posted Jul 24, 2013 22:43 UTC (Wed)
by blackwood (guest, #44174)
[Link]
For similar reasons I also insist that the revision log of individual patches is kept as part of the commit message (and not hidden below the -- line). We also use a bit more elaborate commit message citation layout (essentially git show up to the patch headline) so that author, committer, sha1 and headline is all there at a glance.
It's a lot more work than onleliners but imo enforcing high standars for commit messages is really worth it when digging through history.
Posted Jul 25, 2013 2:01 UTC (Thu)
by nevets (subscriber, #11875)
[Link] (14 responses)
Link: http://lkml.kernel.org/r/20130718184712.GA4786@redhat.com
That maps to marc.info/?i=20130718184712.GA4786@redhat.com which is the email of the patch. Usually from that, you can get the thread, which *is* very useful. I just used it a few minutes ago to figure out why some function was called in the MIPS code.
Posted Jul 25, 2013 8:02 UTC (Thu)
by lacos (guest, #70616)
[Link] (4 responses)
http://news.gmane.org/find-root.php?message_id=2013071818...
A further improvement is storing the original subject (including the "bag of tags" part) of the patch email (if the patch was posted to and applied from a list) as a "pseudo header". This helps identifying a series using just the git commit log (due to the patch numbering being captured).
What I miss is some way to save cover letters as standalone, no-code-change commits.
Posted Jul 25, 2013 8:32 UTC (Thu)
by johill (subscriber, #25196)
[Link] (1 responses)
Posted Jul 25, 2013 8:44 UTC (Thu)
by lacos (guest, #70616)
[Link]
(Still I like to end up immediately in the threaded / frames interface.)
Posted Jul 25, 2013 12:39 UTC (Thu)
by hmh (subscriber, #3838)
[Link] (1 responses)
This works well only when the patch stack is going in through the same tree, obviously.
As for notes, their out-of-band nature will complicate a secure workflow somewhat, but it should be possible to do it.
Posted Jul 26, 2013 0:53 UTC (Fri)
by aliguori (subscriber, #30636)
[Link]
Posted Jul 25, 2013 9:29 UTC (Thu)
by lrothc (subscriber, #41210)
[Link]
But I think it would be better if this became more standard. Maybe if git itself would save the message ID so it could be easily looked up.
What we're missing here is a way to make it possible to find email discussions by looking at *any* commit in the git log. If there is a commit for which the message ID cannot be found in any public mailing lists, that would be a red flag telling us that the commit was not reviewed publicly.
Posted Jul 25, 2013 12:29 UTC (Thu)
by aliguori (subscriber, #30636)
[Link] (7 responses)
Here's an example:
http://git.qemu.org/?p=qemu.git;a=commit;h=fd1d9926e91f42...
Besides being a good way to associate a commit with a mail thread, it is useful for generating automatic "Thank you" notes when a patch is applied.
Posted Jul 25, 2013 13:02 UTC (Thu)
by nevets (subscriber, #11875)
[Link] (6 responses)
https://lkml.org/lkml/2011/3/28/460
Which was changed to the link format of:
Link: http://lkml.kernel.org/r/<message-id>
That way we have a link to the message that you can easily find, and if you are worried about that link disappearing, you have the message-id in the link. The best of both worlds.
But you should know that Linus hates just having the message-id as a tag.
Posted Jul 25, 2013 16:58 UTC (Thu)
by aliguori (subscriber, #30636)
[Link] (5 responses)
A message-id is infinitely more useful than a link. There is no programmatic way of getting a message-id from a link so you can't generate email responses without it.
People just need to learn how to use mid.gmane.org :-)
Posted Jul 25, 2013 19:02 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (3 responses)
Posted Jul 27, 2013 18:41 UTC (Sat)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Posted Jul 26, 2013 3:02 UTC (Fri)
by nevets (subscriber, #11875)
[Link]
Posted Jul 24, 2013 22:28 UTC (Wed)
by ebiederm (subscriber, #35028)
[Link] (5 responses)
I respectfully suggest your measurement and conclusions are in error.
You did not measure if the what was written actually describes what is going on, and without actually looking to see if the subject is sufficient explanation it is quite insulting to call some of these patches no explanation patches.
You seem to be encouraging people to make larger harder to bisect and harder to read patches simply to get get patches that need more explanation. Something I expect would be much worse.
Posted Jul 24, 2013 22:41 UTC (Wed)
by nix (subscriber, #2304)
[Link]
Posted Jul 24, 2013 22:42 UTC (Wed)
by corbet (editor, #1)
[Link] (3 responses)
The article does say that, for some patches, a one-line summary is sufficient.
I'm confused by the comment on readability; the point of the explanation area is to make the patch easier to understand, after all. I didn't suggest populating it from your spam folder. And I'll confess to being totally thrown by the mention of bisectability - how on earth is that affected by changelog contents?
Posted Jul 25, 2013 1:00 UTC (Thu)
by apoelstra (subscriber, #75205)
[Link] (1 responses)
I think the implication was that people ought to keep piling things into a single commit until that commit is complicated enough to justify a full explanation.
Having said that, I read the article and understood it the way you intended. (And I think most developers would have no problem recalling examples of commits where one line is sufficient, even if you had not pointed it out.)
Posted Jul 26, 2013 11:58 UTC (Fri)
by bfields (subscriber, #19510)
[Link]
Or to put it another way: if someone takes a complicated change, carefully splits it into small independent steps, and leaves the explanation on the first commit--then they get dinged for writing inadequate commit messages?
Often the reason Al's patches lack changelog bodies is because he does the hard work of figuring out how to get from point A to point B in trivial self-explanatory steps--something he deserves credit for. (What *does* get lost, I think, is motivation: for example, motivation for the readdir api change ("dealing with ->f_pos races in ->readdir() instances for good") is only in the cover letter, as far as I can tell, not in any changelog or code comment (but I may have missed it, or maybe it's there in changes to problematic filesystems, I didn't check).)
In any case, I don't think we can draw useful conclusions from simple summaries of the commit statistics--careful examination of representative cases would seem more instructive.
Posted Jul 25, 2013 19:51 UTC (Thu)
by broonie (subscriber, #7078)
[Link]
Posted Jul 24, 2013 23:02 UTC (Wed)
by aliguori (subscriber, #30636)
[Link] (2 responses)
If you do that (and we do in QEMU) then you can walk the full history.
The practical problem with Reviewed-by/Tested-by is it's a pain in the ass to add them as a maintainer. I have some scripts to do this given an mbox and have been planning to generalize but haven't gotten around to it. I suspect a few other people have similar scripts floating around.
The script I use is available at http://git.codemonkey.ws/cgit/mbox-filter.git
Posted Jul 25, 2013 3:07 UTC (Thu)
by stressinduktion (subscriber, #46452)
[Link] (1 responses)
[0] http://jk.ozlabs.org/projects/patchwork/
Posted Jul 25, 2013 3:52 UTC (Thu)
by aliguori (subscriber, #30636)
[Link]
Posted Jul 25, 2013 0:48 UTC (Thu)
by josh (subscriber, #17465)
[Link]
Posted Jul 25, 2013 7:38 UTC (Thu)
by error27 (subscriber, #8346)
[Link]
Hopefully everything went to a list as well. I don't think anything gets reviewed again when it hits git. I know I review everything on the staging mailing list, but I never review git patches.
Maintainers often don't apply patches when the merge window is open so there is sometimes a delay before the patch gets reviewed and applied. But for bystanders, if no one comments on the patch within 3 days then there probably won't be any comments. In other words, if I don't comment on the patch after three days, that means I have Ack-ed it.
Posted Jul 26, 2013 17:10 UTC (Fri)
by aegl (subscriber, #37581)
[Link] (3 responses)
What would be the purpose of such a thing. So that Linus would know that he had claimed the legal authority to post the patch that he just wrote? It all seems a bit self-referential and involves too much of Linus talking to himself.
Posted Jul 26, 2013 20:23 UTC (Fri)
by corbet (editor, #1)
[Link] (2 responses)
What am I missing here?
Posted Jul 26, 2013 21:35 UTC (Fri)
by aegl (subscriber, #37581)
[Link] (1 responses)
You called out specifically changes to version that add other data (pointing to the Linux for Workgroups logo). Would this really have been better if it had a "Signed-off-by:" line?
Posted Jul 27, 2013 13:36 UTC (Sat)
by corbet (editor, #1)
[Link]
As for whether he signs off on a version number change, it doesn't matter that much. I was just describing the data for the curious.
Posted Jul 26, 2013 18:25 UTC (Fri)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link] (3 responses)
Posted Jul 29, 2013 18:32 UTC (Mon)
by aliguori (subscriber, #30636)
[Link] (2 responses)
A revert is not the same thing as making the commit never exist. Often they do not apply cleanly and it's very hard for a reviewer to know whether the author of the revert had to apply logic to resolve conflicts.
Posted Jul 29, 2013 21:55 UTC (Mon)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link]
Posted Jul 30, 2013 8:25 UTC (Tue)
by jezuch (subscriber, #52988)
[Link]
It would be useful if revert commits looked like (or simply were) merges. I sometimes do it by hand, but that's awkward.
Posted Jul 27, 2013 2:18 UTC (Sat)
by rjw@sisk.pl (subscriber, #39252)
[Link] (3 responses)
Without a chain of signoff lines, we lose more than a picture of which maintainers might have reviewed the patches; we also lose track of the path by which a patch finds its way into the mainline.
Given a commit hash you can always do
$ git log --ancestry-path --merges <commit>..HEAD
go to the bottom of the log and see whose hands the commit has gone through.
Posted Jul 27, 2013 13:31 UTC (Sat)
by corbet (editor, #1)
[Link] (2 responses)
Posted Jul 30, 2013 4:38 UTC (Tue)
by neilbrown (subscriber, #359)
[Link] (1 responses)
fast-forward merges are certain to cause problems, and we should probably dictate that maintainers *must*not*do*that* (--no-ff please!)(they probably already do?). Assuming a lack for fast-forward patches, "git name-rev" can help.
e.g.
Posted Jul 30, 2013 16:07 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link]
Does setting the noff option in gitconfig still force pull-merges without pull.rebase to true? Merges with --no-ff are great, but pull merges are really quite a bit worse, IMO. (I don't care how often you pull from a remote, just don't modify the tree when you do it). Also, it means that if there are conflicts, they happen when merging a remote into your branch which is backwards since your branch is topologically older.
Posted Jul 29, 2013 23:31 UTC (Mon)
by akpm (guest, #4826)
[Link]
What's missing from our changelogs
What's missing from our changelogs
Where there is imperfection, there is room for improvement — though improving the signoff chain will be difficult as long as the tools do not allow it.
git notes are made for this, it's a shame they didn't standardize on adding signoffs and other tags as notes once the feature matured.
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
git notes are made for this, it's a shame they didn't standardize on adding signoffs and other tags as notes once the feature matured.
git notes
are stored in a separate ref in the repo (which is what makes them not alter history in the first place), you also have to tag and sign the note ref separately if you want to trust any notes information.
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
I don't see the problem with the way we do the links. The link includes the message id. You get the best of both worlds.
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
"No-explanation" means no text in the explanation area; I'm sorry if that wasn't clear.
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
[1] http://patchwork.ozlabs.org/project/netdev/list/
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
What's missing from our changelogs
I'm confused by this comment. Linus does sign off patches that he writes, or that he applies directly (not via a pull). And that makes sense; the purpose of a signoff isn't just for Linus to know, it's for the world to know who took responsibility for the patch. What he doesn't assign is version number updates.
What's missing from our changelogs
What's missing from our changelogs
Again, Linus seems to think differently; he follows the same rules as everybody else for anything but version number tweaks. I think this makes sense; there's lots of kernel repositories, only one controlled by Linus. Adding an implicit "no signoff means signed-off-by: Linus" rule for all of them strikes me as inconsistent and weird.
What's missing from our changelogs
What's missing from our changelogs: Reverts
What's missing from our changelogs: Reverts
What's missing from our changelogs: Reverts
What's missing from our changelogs: Reverts
What's missing from our changelogs
Interesting, I just tried that with randomly chosen commit b4419e1a15905191661ffe75ba2f9e649f5d565e (a GPIO fix) and found that it listed the thermal and tracing trees, which are unlikely to have hosted that patch. --ancestry-path prints all intervening commits, so there's a lot of extra noise, while, of course, any fast-forward merges will be missing. A useful tool but not a definitive answer.
What's missing from our changelogs
I'd suggest that was a poor example as it was merged directly by Linus, so there is only one interesting merge.
What's missing from our changelogs
$ git name-rev 453807f3006757a5661c4000262d7d9284b5214c
453807f3006757a5661c4000262d7d9284b5214c tags/v3.10-rc1~14^2~20^2~3^2~29
Then apply the following (which can obviously be scripted):
% git log -n1 --format=short tags/v3.10-rc1~14^2~20^2~3^2~29
commit 453807f3006757a5661c4000262d7d9284b5214c
Author: Lars-Peter Clausen <lars@metafoo.de>
ASoC: ep93xx: Use ep93xx_dma_params instead of ep93xx_pcm_dma_params
% git log -n1 --format=short tags/v3.10-rc1~14^2~20^2~3
commit 9eb8ae727dcb9f2530a895ee6b3496592853709d
Merge: 5561f17 6f1fd93
Author: Mark Brown <broonie@opensource.wolfsonmicro.com>
Merge remote-tracking branch 'asoc/topic/dma' into asoc-next
% git log -n1 --format=short tags/v3.10-rc1~14^2~20
commit 2fc565e4eaf8fc633bfc741b90e1f28dba732ee1
Merge: 7fc7d04 5cc50fc8
Author: Takashi Iwai <tiwai@suse.de>
Merge tag 'asoc-v3.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next
% git log -n1 --format=short tags/v3.10-rc1~14
commit 05a88a43604abb816dfbff075bb114224641793b
Merge: daf799c 6c35ae3
Author: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'sound-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
and you have the merge commits that merged the patch into the trunk and the people responsible.
(interesting commits to experiment with can be found via
git log | git name-rev --stdin | grep '~.*~.*~.*~.*~'
with varying numbers of '~' in the 'grep')
What's missing from our changelogs
What's missing from our changelogs