KS2011: Patch review

By Jonathan Corbet
October 24, 2011

2011 Kernel Summit coverage

The "patch review" session at the 2011 Kernel Summit went off in some interesting directions, but it all started with Rafael Wysocki making the claim that we simply do not do enough patch review. How, he said, can we fix that? And how can we get thorough review of patches? Just getting a Reviewed-by tag, he said, is often not really enough to give confidence in a patch. That led to some discussion of the relative meaning and merit of Reviewed-by and Signed-off-by. The latter is a much stronger statement in the end, and, it was agreed, carries more weight.

Ted Ts'o talked about the plight of developers who get a lot of detail-oriented review which they duly address. Only later does a subsystem maintainer come by and say that the basic premise of the patch is wrong and that the previous revisions were for nothing. It would be better, he said, if maintainers could perform high-level review of patches quickly to help developers avoid useless work. Ingo Molnar disagreed, saying that reviewer bandwidth is, in general, the limiting factor in kernel development. So he insists that the code be clean and free of stylistic problems before he will review it at all.

Another issue raised by Ted is ensuring that problems raised in previous reviews have been addressed in a new revision of a patch set. Tools like Gerrit, he said, can be most helpful in this regard. With Gerrit, it is possible to look at the changes in a patch over time and to track the comments that were made. At his work, use of Gerrit is mandated by policy, and it helps a lot.

Johannes Berg said that there is more review happening than people think, it's just that a lot of it is invisible. He looks at almost every wireless patch that goes by but, if he sees nothing to complain about, he'll usually just move on quietly. Responding to every patch, even if he has no objections, would increase the amount of email flying around and slow things down.

Peter Zijlstra complained that he has a whole mailbox full of patches that he needs to review. Alan Cox corrected him: what Peter really has is a mailbox of patches whose developers need to find reviewers for. Trying to do all the review himself is never going to scale; that is a lesson that has been learned many times in the kernel's history. It is necessary to find a way to spread out the work.

Andrew Morton said that the quality of the reviews and Acked-by responses is not always that great. Sometimes a patch will go out, only to be followed by several Acked-by responses within a few minutes; he knows that all of those people are not reviewing that patch in that time. Some of those responses, he said, are clearly bogus. Peter suggested that "corporate interest" can be behind that kind of coordinated response. But Dave Airlie pointed out that quick acks can also happen if the patch has already been through internal review and the developers involved already know that they are happy with it. James Bottomley said that he'll often ignore reviews that come from the same company as the original patch; he will also ignore a Reviewed-by response that did not contain any actual comments.

Alan said that part of the problem comes from development that has been contracted out to other companies - often located in China or India - where the contract requires that the resulting code be merged upstream. Ted noted that this is actually an improvement - upstreaming code was once seen as unimportant. But, he said, it would be good to get any in-house review done in public rather than behind closed doors; he acknowledged that this idea can be a hard sell in some companies. Alan observed that many of those patches, in their early versions, are not something that most people would want to see.

Rafael concluded that if there is a review problem, it is specific to some subsystems. The maintainers of those subsystems will need to figure things out. But, Ted said, a related problem is large patch sets that touch many subsystems. Those patch sets may never get in at all, or they may see review only for some parts of the set.

Andrew complained about the acceptance of entirely new features into the kernel. Those features often land on his doorstep without much justification, forcing him to ask the developers to explain their motivations. The kernel community, he complained, is not supporting him well. Who can tell him if a given patch makes sense? Mistakes have been made in the past; bad features have been merged and good stuff has been lost. How, he asked, can he find people who know better about the desirability of specific patches?

Grant Likely suggested the possible creation of an architectural review group that could consider patches as a whole. Andrew said that might help, but it would have to meet frequently enough to respond to patches quickly. Nobody took responsibility for organizing this group, so it would not be surprising if the idea goes no farther.

When "no" becomes "yes"

Alan said that, once upon a time, Linus used to simply say "no" to a lot of patches. If the distributors shipped it anyway for long enough, he would usually give in and merge it anyway. It is not an infallible method, but it provided a good indication of whether a patch is worthwhile or not. Chris Mason observed that the distributors have changed; they no longer want to ship out-of-tree code in their kernels, so this method has lost some of its efficacy.

Inevitably somebody pointed out an important example of just this process in action, though. Many of the changes in the Android kernel have been blocked from merging into the mainline, but Android (which certainly qualifies as a sort of distribution) shipped it anyway, and it now has many millions of users. What does that say about how we should treat the Android code?

As one might imagine, the discussion became rather unfocused and fragmented for a while. It came back together when Linus took the microphone and stated that, simply, code that actually is used is the code that is actually worth something. The Android code is certainly being used; the in-kernel code aimed at the same problems is just a vague idea that is worthless in comparison. We should, he said, consider merging suspend blockers as a real option. Even if it truly is crap, we've had crap in the kernel before. The code does not get any better out of tree. Alan Cox agreed that it is probably a good idea to merge that code. The interface is important and has a lot of users; getting the code merged is the best way to fix the implementation. Ingo also agreed, saying that when code has millions of users, we have to say "yes" to it.

Tim Bird said that there is a project underway in the Consumer Electronics Linux Forum to fix up the Android patches and resubmit them for merging. That news came as a surprise to the others in the room; it seems that there is an independent effort in Linaro to do the same thing. Hopefully one result of this discussion is that these people will find each other and join forces.

Ted said that, as kernel developers, we often have an inflated idea of our ability to block things by saying "no." We can't stop things that way; they will happen anyway; Jesse Barnes added that we don't have the power to stop product development. Ben Herrenschmidt (and others) noted that, for a lot of hardware vendors, the requirement now is to provide an Android kernel for their products - not a mainline kernel. There is a Linux fork out there, and it is threatening to take over. We are, Ben said, shooting ourselves in the foot by keeping that code out.

Linus said that it comes down to a marketing issue. Once upon a time, the traditional Linux distributors used to ship a lot of out-of-tree code in their quest to have the best kernel. They have gotten past that phase now and can pat themselves on the back, but they did the same things that Android is doing now. We should, he said, calm down about the whole thing.

There were no hard conclusions from the session. Given the way things went, though, it would be seriously surprising not to see the Android code seriously considered for merging in the 3.3 or 3.4 time frame. Android has shown staying power, and much of the opposition appears to have moderated over time. That is a surprising conclusion from a session on patch review, but a worthwhile one nonetheless.

Next: Development process issues

Index entries for this article
Kernel	Android
Kernel	Development model/Code review

KS2011: Patch review

Posted Oct 25, 2011 0:59 UTC (Tue) by neilbrown (subscriber, #359) [Link] (2 responses)

> That led to some discussion of the relative meaning and merit of Reviewed-by and Signed-off-by. The latter is a much stronger statement in the end, and, it was agreed, carries more weight.

This makes no sense to me. As defined in SubmittingPatches, the Signed-off-by: line says nothing about the quality of the patch, only the origin.
It represents a "Developer's Certificate of Origin" and is meant to address questions of copyright, not questions of value.

So it seems to me that if the original statement is true, then people are interpreting SOB wrongly.

SubmittingPatches gives a "Reviewer's statement of oversight" which defines what Reviewed-by: is meant to mean, and it sounds like it should carry a whole lot more weight...

Do we need to revise SubmittingPatches ??

KS2011: Patch review

Posted Oct 26, 2011 3:32 UTC (Wed) by tytso (subscriber, #9993) [Link]

Let's consider when a Signed-off-by: should appear.

(1) when a patch is written by the original author and submitted for consideration. At that point, the author had better be convinced that it is Correct and Sane, since he or she is putting their name and reputation on the line with that patch. In some cases patches are submitted for comments, and in that case sometimes the signed-off-by line is not present (because the author knows some changes might be necessary), or to make the point that the patch really isn't suitable for inclusion, might even have a NOT-Signed-off-by: line to make that point clear.

(2) when the patch is copied from e-mail into a git repository. If the person with the git repository then sends a pull request to Linus, it had better mean that she has put her name and reputation on the line, and is confident that the patch is good. It may not mean that the person has reviewed it; it may be that he is depending on his reviewers or sub-maintainers to have done a good job. But ultimately, it is the maintainer's reputation which is on the line. If the maintainer isn't sure about a sub-maintainer's competence in review and good taste, then she should be doing spot checks or even review every single commit before accepting a pull from the sub-maintainer --- or not stop pulling form the sub-maintainer altogether.

This also implies that in practice, a commit should generally only have two signed-off-by lines. There is an exception, which is if there is a sub-maintainer who isn't using git, so a patch gets forwarded outside of git an additional time. An example of this Andrew Morton's mm tree, which is maintained as a quilt patch series outside of git. But in general, patches are reviewed on e-mail, and once they hit git, they're not going to accumulate any more signed-off-by lines. As those git commits get merged into more senior trees, the maintainer who accepts the pull request is effectively putting their imprimatur on them. So they had better have reviewed the patch, or be willing to put their reputation on the line based on their trust of their sub-maintainers.

KS2011: Patch review

Posted Oct 26, 2011 22:32 UTC (Wed) by BenHutchings (subscriber, #37955) [Link]

According to my notes, Linus said that a second Signed-off-by implies only Acked-by; and the person adding it could add Reviewed-by if they want.

KS2011: Patch review

Posted Oct 25, 2011 3:55 UTC (Tue) by dlang (guest, #313) [Link] (8 responses)

I would love to have the a ability to put a kernel.org kernel on my devices that shipped with android. I see far too much work going on with people writing drivers for (in kernel.org terms) ancient kernels.

in many cases there is a driver for a device in a new kernel, but since that's not an 'android' kernel, people spend a lot of time duplicating or backporting drivers.

I'm not saying that the kernel.org kernel needs to accept all the implementation details of the android kernels, but they do need to accept enough that the interfaces can be made available and the result at least run.

in the case of wakelocks, on my tablet I don't have all the different weird things going on that can happen on a phone, so I would even be happy with a noop version of wakelocks that didn't actually do anything beyond allowing the android userspace to run without complaining.

KS2011: Patch review

Posted Oct 25, 2011 7:33 UTC (Tue) by swetland (guest, #63414) [Link] (7 responses)

I'd love that too!

To be very clear, we (Google/Android) are not opposed to some level of compromise, incremental shifts, and migration of APIs. We just find the "no you should throw it all away and rewrite your userspace to be different" approach to be something we're not interested in doing.

I suspect there's a middle ground somewhere between "put all of the Android changes in the kernel unmodified right now" and "delete them forever because they suck". Here's hoping we find it.

KS2011: Patch review

Posted Oct 25, 2011 8:00 UTC (Tue) by dlang (guest, #313) [Link] (6 responses)

not counting hardware drivers for the moment, is there a list of what interfaces would need to get added to the kernel.org kernel to get Android userspace to run on it?

and then of those interfaces, which ones need to actually be something other than a stub?

for example, wakelocks could probably be a stub, but an inter-process communications would have to actually work (I seem to remember reading that Android had a kernel assisted IPC protocol)

KS2011: Patch review

Posted Oct 25, 2011 8:08 UTC (Tue) by swetland (guest, #63414) [Link] (5 responses)

I'd need to look through things, but I suspect the answer is "not many", though some of them are pretty important (at least to Android).

Some of the notable ones off the top of my head:
- wakelocks
- binder (IPC subsystem) -- /dev/binder -- self-contained, in theory one could replace it with an entirely userspace implementation, in practice probably a lot of work
- logger driver -- /dev/logs/* -- multiple ring buffers for logging state without a lot of context switch overhead -- back when running on virtually tagged ARM9s this was very useful
- ION -- /dev/ion -- a unified approach to buffer management and sharing between display, GPU, camera, codecs, etc, new in Ice Cream Sandwich

Where possible we try to use standard interfaces -- for example, the power supply framework for battery charging and status, a lot of assorted interfaces via /sys, etc.

KS2011: Patch review

Posted Oct 25, 2011 8:17 UTC (Tue) by dlang (guest, #313) [Link] (2 responses)

is the logger driver really needed for the userspace to run? could you point me to documentation on it?

as for ION, since there is no code available yet, it's hard to talk about it.

KS2011: Patch review

Posted Oct 25, 2011 8:23 UTC (Tue) by swetland (guest, #63414) [Link] (1 responses)

Userspace should fail over to dropping all logs on the floor. Not useful but it should work.

We're still getting back online after the kernel.org downtime, so I don't have a handy URL for browseable source. Before kernel.org went down the ION kernel side sources were being developed in the open in the kernel repositories at android.git.kernel.org.

Hopefully it'll all be sorted out in the near future alongside the ICS userspace source release.

KS2011: Patch review

Posted Oct 25, 2011 18:01 UTC (Tue) by dlang (guest, #313) [Link]

this is the sort of thing I was thinking about when I was saying that things just needed to run, not implement everything.

I am thinking that the interfaces needed to just make things 'run' on a kernel.org kernel is much smaller than what people have been trying to get submitted. And then once this smaller set is in place, things can be worked on much more easily.

while I have your attention, did you see Neil Brown's proposal for a userspace wakelock-type daemon? how hard would it be to modify the Android userspace to work with his proposal? As I understand it, there's only one daemon that actually talks to the kernel, everything else in the Android userspace talks to that daemon.

KS2011: Patch review

Posted Oct 25, 2011 9:31 UTC (Tue) by xav (guest, #18536) [Link] (1 responses)

Interesting. How does ION relate to the DMA buffer sharing mechanism [1] currently in discussion ?

[1] https://lwn.net/Articles/462524/

KS2011: Patch review

Posted Oct 26, 2011 15:37 UTC (Wed) by a0273185@ti.com (guest, #74477) [Link]

ION provides memory allocation and buffer sharing using file descriptors. The allocations come from carveouts.

dmabuf just deals with buffer sharing.

I imagine CMA + dmabuf would displace ION.

KS2011: Patch review

Posted Nov 1, 2011 18:55 UTC (Tue) by badcow (guest, #71514) [Link] (1 responses)

There's no concern over Apple's desire to destroy the Android? I'm not sure that threat was just targeted at devices running it.

KS2011: Patch review

Posted Nov 8, 2011 18:58 UTC (Tue) by wookey (guest, #5501) [Link]

If apple want to attack the linux kernel directly then bring it on. The kernel has powerful friends these days like IBM, OIN, and erm, well, pretty much everybody. I'd like to think that any serious attack would be about as good for Apple as it was for SCO and Blackboard. Obviously a patent-based attack would be far less stupid than SCO's copyright-based attack, but ultimately we can remove anything which is found to be infringing, and the exposure of the madness inherent in the system can only do us good.