LWN.net Logo

Leading items

LinuxCon Japan: Making kernel developers less grumpy

By Jake Edge
June 6, 2012

Greg Kroah-Hartman is on something of a mission: reducing the grumpiness factor among kernel developers, and maintainers in particular. His keynote at LinuxCon Japan was meant to help the audience understand what the maintainers do, and how contributors' actions can sometimes result in grumpy maintainers. But, if contributors can follow the rules and make things easier on him, there are a number of things that he will promise to do on their behalf.

[Greg KH] He called the Linux kernel the "largest software development project ever" and noted that its development pace is "unprecedented". From 3.0 to 3.4, some 2833 developers from at least 373 companies contributed. In that year (from May 2011 to May 2012), the kernel had a change rate of 5.79 changes per hour. But the rate keeps increasing and if you look at just the 3.4 cycle, the rate is 7.21 changes per hour. That is, of course, just patches that are accepted into the mainline, so it doesn't count those patches that are rejected.

Developers typically send their changes to the maintainer of the file that is being changed. Those maintainers, who number around 700, feed those changes up to the 130 subsystem maintainers. From there, the patches make their way into linux-next, then to Linus Torvalds, and, eventually the mainline—if they get accepted at each step along the way.

So, in order to see why some patches might not get accepted, he looked at those that he received in the last two weeks, which coincided with the 3.5 merge window. The merge window is a time when he really shouldn't be getting many patches. He should have received them all earlier in the cycle so that he could potentially pass them on to Torvalds during the merge window. But, he said, he got 487 patches in that two-week period, many with a wide variety of problems, and some of those from core kernel developers who should know better.

Broken patches

With that, he launched into a description of some of the broken patches he got. One patch was labeled "patch 48/48" (i.e. the last patch in a set of 48) but all of the other pieces were missing. He also got a patch series with no order specified, which means that he would have to guess at the order and undoubtedly get it wrong. The alternative is to ignore the patch entirely. He also got a ten-patch set that was missing patch two in the series.

Another patch came in an email with a signature claiming that it was confidential. He actually sees that one a lot, he said, and there is nothing he can do with those kinds of patches. Linux development is done in the open and you can't send a confidential email to mailing lists or get a confidential patch merged. Obviously, it is boilerplate that gets added somewhere in the email process, but it has to be removed before the patch can be used.

There are also malformed patches that end up in his inbox, including those with tabs converted to spaces. Microsoft Exchange does that, he said, so if that's a problem in your environment, do what IBM, Microsoft, and others do: put a Linux box in the corner for the developers to use to send their mail. Sometimes the leading spaces have been stripped off the diff or the diff is not in unified format. Linux developers have gotten good at raw editing diff format, he said, which is scary in itself, but they shouldn't have to do that.

Patches are also created in the wrong directory, like down in a driver directory for example. He got a patch created in /usr/src/linux-2.6.32 and noted that there were multiple things wrong with that, including the age of the source tree and that it implied it was being built by root. The latter is very dangerous as there was a bug in the Linux build process at one point that would delete the entire root filesystem if it was run as root. None of the core developers noticed because they don't build as root. Suggestions that the bug be left in as a deterrent were ignored, but things like that can happen.

In addition, patches came in that were made against a different tree than any he would expect. He got a patch made against the SCSI development tree, for reasons unknown because it had nothing to do with SCSI.

Then there are those that don't have the right coding style. In one case, the coding style was wrong and the developer acknowledged that but wanted him to take the patch anyway. That gives the impression of "we don't care, take our code anyway", he said. There are tools to help find and fix those kinds of problems, so there is no excuse: "send it in the right coding style".

Something he sees much more than he should are patches that don't even compile. The submitter clearly hasn't even built the patch, he said. Or there are patch sets that break the build in 3/6 but then fix it in 6/6. He even got a patch that broke the build in 5/8 but contained a note that sometime in the future the submitter would send changes to fix it. Another patch had obviously wrong kernel-doc in it that would cause failures building the documentation, so it was clear that the contributor had never even tried to run the kernel-doc extraction tool.

One of the patches he got "had nothing to do with me". It was an x86 core kernel patch, which is not an area of the kernel he has ever dealt with. But the patch was sent only to him. "I get odd patches" a lot, he said.

The last patch he mentioned was 450K in size, with 4500 lines added. Somebody suggested that it be broken up, but in the meantime several maintainers actually reviewed it, so the submitter didn't really learn from that mistake.

All of this occurred during a "calm two weeks", he said. These are examples of what maintainers deal with on a weekly basis and explains why they can be grumpy. That said, he did note that this is the "best job I've ever had", but that's not to say it couldn't be improved.

If someone sends him a patch and he accepts it, that means he may have to maintain it and fix bugs in it down the road. So it's in his self interest to ignore the patch, which is an interesting dynamic, he said. The way around that is to "give me no excuse to reject your patch"; it is as simple as that, really.

Rules

Kroah-Hartman then laid out the rules that contributors need to follow in order to avoid the kinds of problems he described. Use checkpatch.pl, he said, because he will run it on your patch and it is a waste of his time to have to forward the results back when it fails. Send the patch to the right people and there is even a script available (get_maintainer.pl) to list the proper people and mailing lists where a patch should be sent.

Send the patch with a proper subject that is "short, sweet, and descriptive" because it is going to be in the kernel changelog. It should not be something like "fix bugs in driver 1/10". In addition, the changelog comment should clearly say what the patch does, but also why it is needed.

Make small changes in patches. You don't replace the scheduler in one patch, he said, you do it over five years. Small patches make it easier for reviewers and easier for maintainers to accept. In a ten-patch series, he might accept the first three, which means that the submitter just needs to continue working on the last seven. The best thing to do is to make the patch "obviously correct", which makes it easy for a maintainer to accept it.

Echoing the problems he listed earlier, he said that patches should say what tree they are based on. In addition, the order of the patches is important, as is not breaking the build. The latter "seems like it would be obvious" but he has seen too many patches that fail that test. To the extent that you can, make sure that the patch works. It is fine to submit patches for hardware that you don't have access to, but you should test on any hardware that you do have.

Review comments should not be ignored, he said. It is simply common courtesy if he takes time to review the code that those comments should be acted upon or responded to. It's fine to disagree with review comments, but submitters need to say why they disagree. If a patch gets resent, it should be accompanied with a reason for doing so. When reviewer's comments are ignored, they are unlikely to review code the next time.

Maintainer's role

When you follow those rules there are certain things you can expect from him, Kroah-Hartman said, and that you should expect from the other maintainers as well. That statement may make other maintainers mad, he joked, but it is reasonable to expect certain things. For his part, he will review patches within one or two weeks. Other maintainers do an even better job than that, he said, specifically pointing to David Miller as one who often reviews code within 48 hours of its submission. If you don't get a response to a patch within a week, it is fine to ask him what the status is.

He can't promise that he will always give constructive criticism, but he will always give "semi-constructive criticism". Sometimes he is tired or grumpy, so he can't quite get to the full "constructive" level. He will also keep submitters informed of the status of their patch. He has scripts that will help him do so, and let the submitter know when the patch gets merged into his tree or accepted into the mainline. That is unlike some other maintainers, he said, where he has submitted patches that just drop into a "big black hole" before eventually popping up in the mainline three months later.

He ended by putting up a quote from Torvalds ("Publicly making fun of people is half the fun of open source programming. ...") that was made as a comment on one of Kroah-Hartman's Google+ postings. The post was a rant about a driver that had been submitted, which even contained comments suggesting that it should not be submitted upstream. He felt bad about publicly posting that at first, but Torvalds's comment made him rethink that.

Because kernel development is done in the open, we are taking "personal pride in the work we do". As the code comment indicated, the driver developer didn't think it should be submitted because they realized the code was not in the proper shape to do so. It is that pride in the work that "makes Linux the best engineering project ever", he said. Sometimes public mocking is part of the process and can actually help instill that pride more widely.

[ The author would like to thank the Linux Foundation for assistance with his travel to Yokohama. ]

Comments (68 posted)

Backing up in trees with Obnam 1.0

By Nathan Willis
June 6, 2012

Lars Wirzenius's new backup tool Obnam was just declared 1.0. There is no shortage of backup options these days, and in some way Wirzenius's decision to scratch his own itch with the project is par for the course. But the program does offer a different feature set than many of its competitors.

For starters, Obnam makes only "snapshot" backups — that is, every backup looks like a complete snapshot of the system: there are not separate "full" and "incremental" backup options. That obviates the need to separately configure full and incremental backups on different schedules, and it similarly simplifies the restoration process. Any snapshot can be restored, without "walking" a chain of deltas from a full backup starting position. In his 1.0 release announcement, Wirzenius argues that full-plus-incremental backups make sense for tape drives, where sequential access favors adding deltas with incremental changes after an initial full backup, but that hard-disk backups make the incremental delta approach pointless.

But the sneaky part is that under the hood, Obnam's snapshots are all incremental, at least in the sense that each snapshot only records changes since the last. The difference is that they are stored in copy-on-write (COW) b-trees like those Btrfs uses for filesystems. Any snapshot can be reconstructed from the b-tree, and individual snapshots can be removed by deleting their node and re-attaching the sub-trees. To make the COW b-tree approach space-efficient, it uses pervasive automatic data de-duplication. The same chunk of data on disk is re-used — both across multiple files and over multiple snapshot generations. In addition to saving space by not duplicating files that have not changed between snapshots, moving or renaming large files does not result in duplicate copies of the bits. By default, Obnam uses one-megabyte chunks, although this setting is adjustable in Obnam's configuration file.

Obnam sports other features of practical value, such as built-in GnuPG encryption, which Wirzenius cited as a weakness in most rsync-based backup tools. It also works with local disks or over the network, including NFS, SMB, and SFTP. Wirzenius admits that the latter protocol is slow, but that SCP (which should be faster) lacks support for tracking information like file removals, which Obnam depends on. In network backup setups, Obnam supports both push (client-initiated) and pull (server-initiated) backup sessions.

Storing and retrieving

Installation requires several of Wirzenius's other code projects, including his B-tree library larch and terminal status-update library ttystatus, plus paramiko a third-party SSH2 library. Most are packaged for Debian (Wirzenius packages his own projects for Debian), but not all of them are available in downstream derivatives like Ubuntu. He provides an Apt repository for the necessary packages; instructions and a link to the repository's signing key are provided on his Obnam tutorial page.

The tutorial goes into further detail about Obnam's data de-duplication with practical examples. You can create a new backup with

    obnam backup ~/projectfoo
and subsequently back up a parent directory with
    obnam backup ~
Rather than re-save the files from projectfoo, the new backup will point to the copy already on disk. Each backup created with Obnam is specific to a directory; you can exclude specific subdirectories with the --exclude= flag, but you cannot backup several directories in a single command.

The tutorial also explains that Obnam automatically saves checkpoints every 100MB while creating a new backup. This is valuable because the initial snapshot is always akin to a full backup in other tools, and can be large enough to introduce failures. Checkpoints are not guaranteed to preserve the entire data set as are regular snapshots; they only allow an interrupted backup to resume without starting over from scratch.

Obnam's basic usage is straightforward; the same obnam backup ~ command that is used to start a new backup in the above example is used verbatim to perform the subsequent snapshots. You store snapshots on a remote repository by appending --repository=URL, specify a filesystem storage location with --output=PATH, and specify a GnuPG encryption key with --encrypt-with=KEYID.

You can restore a directory from a snapshot with

    obnam restore --to=/mnt/recovery-volume ~
(which will restore the most recent snapshot of your home directory to /mnt/recovery-volume). You can optionally restore just a file or a subdirectory from the snapshot with
    obnam restore ~/importantfiles --to=/mnt/recovery-volume ~
You can also specify a specific intermediate snapshot by appending a --generation=N flag to the restore command; you can get a list of the available snapshots by running obnam generations. The obnam verify command checks snapshot data against the files on disk, and obnam fsck checks the internal consistency of the b-tree.

Forgetfulness

The only real confusing part of working with Obnam is the snapshot retention process. You can tell the program to immediately delete older snapshots by running

    obnam forget --keep=7d
(which will keep the most recent seven days' worth of snapshots), or some variation. The wrinkle is that the 7d attribute will keep only one backup per day for those seven days, even if you run Obnam hourly. To keep seven days' worth of hourly snapshots, you would need to specify --keep=168h.

You can set a snapshot retention policy in your configuration file that uses these rules in combination. You can retain hourly, daily, weekly, monthly, and yearly snapshots by providing a comma-separated list. For example, 12h,7d,3m will keep the last 12 hourly snapshots, the last seven daily snapshots, and the last three monthly snapshots. When the numbers start to converge (such as the last 48 hourly snapshots and last two daily snapshots) is when the potential for miscounting sets in; Wirzenius recommends that you try your retention policy on the command line with the --pretend option to simulate results before deploying them in the real world.

In an email, Wirzenius elaborated a bit on those tricky multi-factor retention policies. Each retention rule (e.g., hour, day, or month) is examined separately by Obnam, he said, and a snapshot is kept if it matches any of the rules. So a 48h,2d policy would match 48 hourly snapshots, then match two additional daily snapshots, for 50 total.

As of the 1.0 release, there are a few areas that need improvement, such as managing multiple clients storing snapshots on one repository; Wirzenius says that further thought is required before implementing a real "server mode." For example, two or more machines can run Obnam and push their backups to the same remote repository, and they will be tagged with the hostname of origin. However, Obnam can also be run from the repository machine and "pull" backups from the two remote sources, but in that case each one needs to specify a client name with the --client-name= flag in order for Obnam to keep their metadata separate.

In practice, my interest in backup utilities stems largely from how rarely I make good backups on a regular basis (i.e., paranoia). I may be atypical in that way, but the primary reasons I have abandoned most of the backup utilities I have test driven in the past are the overhead in keeping track of full and incremental backup schedules and the lack of good tools for rotating old backups out without manual intervention. Obnam scores on both of those metrics. If you have a complicated setup with multiple machines, you may find quirks (such as the client name issue or the speed of SFTP) working against you, but Wirzenius is still at work on the code — and he seems quite happy to take bug reports and questions.

Comments (17 posted)

Fedora, secure boot, and an insecure future

By Jonathan Corbet
June 5, 2012
The UEFI secure boot mechanism has been the source of a great deal of concern in the free software community, and for good reason: it could easily be a mechanism by which we lose control over our own systems. Recently, Red Hat's Matthew Garrett described how the Fedora distribution planned to handle secure boot in the Fedora 18 release. That posting has inspired a great deal of concern and criticism, though, arguably, about the wrong things.

On a system with secure boot enabled, the hardware will refuse to run any system that has not been signed by a key it recognizes. Secure boot is meant to be a way to thwart boot-time malware by ensuring that only trusted (and unmodified) software gains control of the system. It is not effective as a digital rights management (DRM) mechanism; if you can gain control of the system, it is relatively easy to fool an operating system into thinking that secure boot is in effect when it is not. Providing the degree of control needed for effective DRM requires a trusted platform module (or similar) and associated software.

Secure boot does offer some hope of preventing a system from booting if its bootloader or kernel have been compromised by malware, though, as the "Flame" malware shows, there are limits to how much one can rely on signatures to keep systems secure. Secure boot could also, unfortunately, be effective in preventing booting if the user has tried to install an operating system of his or her choice.

The Windows 8 logo requirements specify that secure boot must be enabled. After some pushback, the requirements have been amended to also say that it should be possible for the owner of a system to disable secure boot or install new keys. It does not say that these actions need to be easy to carry out, though. Given that changing secure boot is a firmware-level operation, users wanting to make changes will be subjecting themselves to the very best sort of user experience that can be created by BIOS developers. It would be entirely unsurprising, for example, if users were forced to hand-enter new keys as long hex strings. For this to be an unpleasant and error-prone process would not be surprising.

Fedora's plan

Developers in the Fedora camp have evidently come to the conclusion that they do not want to force their users to endure such an experience to be able to install Fedora on their systems. So Fedora has chosen to take a different approach. Availing themselves of the Microsoft developer program, they will purchase a Microsoft-signed key for $99, then use that key to sign a minimal bootloader. UEFI-enabled hardware will then consent to boot that bootloader, which will immediately turn around and boot a special version of the GRUB2 bootloader which, in turn, will boot the Fedora kernel. A Fedora system set up in this mode should boot on a system with secure boot enabled with no changes required.

The appeal of this solution is clear: Fedora will "just work" on UEFI systems without forcing (possibly highly non-technical) users to make scary firmware-level changes. But there is a down side as well. The signed bootloader must ensure that it only runs GRUB2 if the GRUB2 binary has been signed by Fedora (using its own key at this point), and GRUB2 will only boot kernels that have been signed by Fedora. GRUB2 will need to be locked down, and the kernel too; the kernel will, for example, only be able to load modules that bear Fedora's signature. Given that, Red Hat's persistent attempts to get signed module enforcement into the kernel despite some interesting resistance make more sense.

Much of the coverage of this plan in the mainstream media bore headlines like "Red Hat to pay Microsoft for the right to run Linux." Such headlines are not strictly true; the payment ($99 total) evidently goes to Verisign, and what is really being paid for is the ability to boot Linux with a minimum of UEFI-caused user inconvenience. The payment for a Microsoft-signed key raises eyebrows, but it is evidently seen as the best response to a bad situation. And perhaps that is just what it is. But it also raises a number of interesting questions.

A good idea?

For example: what guarantees exist that a Microsoft-signed key will continue to be available in the future for a reasonable price? If secure boot takes over, and the only universally-recognized keys are those signed by Microsoft, then Microsoft will have a monopoly on the right to boot an operating system on future hardware. Corporations are, in general, not known for a principled refusal to exploit that kind of position, and this corporation, in particular, is well known indeed for the opposite sort of behavior. One can only assume that the price of such keys would increase in this situation.

Microsoft will also have the right to revoke keys if they can be said to be a threat to the promises given by the secure boot mechanism. That is why Fedora must be careful to limit anything that enables direct access to the hardware; should somebody be able to get such access, the signed Fedora system could be used to attack Windows systems that have secure boot enabled. In theory, all it would take is a kernel security hole to enable this sort of attack; that could then cause the Fedora key to be revoked. A quick check shows about 20 kernel security updates issued by Fedora since the beginning of this year, with multiple vulnerabilities fixed in most. That could lead to a lot of key churn, especially if, as Alan Cox suggests, every kernel hole will require that its certificate be revoked.

Depending on what software is run on a specific system (if it dual-boots Windows and Linux, for example), a revoked key could find itself into the system's "forbidden signatures" database. That would immediately disable the booting of the signed Fedora image, essentially crippling the machine. The amount of joy resulting from such an outcome can be expected to be small.

Some developers have argued that Fedora's plan is a violation of the GNU General Public License, or, at least, of the Fedora project's own guidelines, despite Fedora's efforts to ensure that users retain as much freedom as possible. GPL enforcement actions in this case seem unlikely; there's no shortage of much more severely locked-down Linux systems out there, and they have not been the target of such actions thus far. But there is a definite risk of damage to the Fedora project's image as users discover that they cannot easily install their own kernels, add third-party modules, or run tools like SystemTap.

Finally, there is the risk that Fedora's plan will legitimize the UEFI secure boot mechanism. For now secure boot can be disabled on x86 systems; what if Microsoft, in the future, points to Fedora 18 as an example of how everybody is able to work within the secure boot system and tries to make secure boot mandatory? Thus, some argue, Fedora is giving aid and comfort to those who would most like to take control of our systems away from us.

Why bother?

Given all of this, one might well wonder why Fedora is pursuing this path. Fedora users are not generally known to clamor for locked-down systems that they cannot easily tweak. Without any inside information whatsoever, your editor suggests that there are two entirely plausible reasons for Fedora's attempt to work with secure boot:

  • The Fedora project, like many free software projects, would like to have a wider base of users. It fears that, in the absence of a "just works" experience on upcoming hardware, it will lose users to other distributions that might be more willing to make that effort. Some of those users may be lost to Linux altogether.

  • The plan starts with a disclaimer that it is not representative in any way of Red Hat's intentions for its enterprise distribution. But it seems clear that there could be actual customer demand for a version of RHEL that runs in the secure boot environment. If one embraces the sort of restrictions that come with enterprise support, the additional rules imposed by secure boot will have a minimal impact, while the apparent benefits are clear. Fedora's role is, among other things, to test out technologies that might go into RHEL; in this case, Fedora's users get to stumble into the secure boot land mines so RHEL users don't have to.

So Fedora's decision to take this approach is not all that surprising. The project has concluded that it is better to restrict user freedom in certain settings to make their life easier in other ways; as Matthew Garrett put it:

[T]here's no way to rationally say that the loss of freedom in terms of users not being able to produce their own signed bootloader or kernel for free is more or less significant than the benefit of having an operating system that users can install without firmware reconfiguration.

For those who do think that the loss of freedom inherent in the Fedora scheme is unacceptable, the time between the present and when Windows 8 hardware starts shipping would be an ideal opportunity to demonstrate better alternatives. But it's not clear what those would be.

Alternatives?

One could simply ignore secure boot, requiring users to disable it before they can install Linux on their machines. That imposes a potentially scary or difficult task on those users; by the specification, secure boot cannot be disabled by the software directly. There may also be resistance from users who see a switch saying "turn off security" and don't want to flip it. This approach will work fine for hard-core Linux users and developers, but seems certain to lose other kinds of users.

An alternative would be to attempt to gain more control of the situation at the hardware level. An example can be seen in Google, which has made a point of ensuring that unlockable Android handsets exist and are available at a reasonable price. Hardware designed to run ChromeOS also, by design, comes with an easily-toggled physical switch that turns off the boot-time checks for users wanting to install their own software. The level of interest in "jailbreaks" for locked-down handsets shows that a lot of users do see value in having full control over the hardware they own. Open (and "open source") hardware has a following; it may be that the only real way to remain in control is to work to ensure that this kind of hardware continues to exist and has a growing market share. There should be a business opportunity here; projects like the Vivaldi tablet show that some people see that opportunity and are trying to pursue it.

In the absence of open hardware, we will continue to be at the mercy of others whose interests are unlikely to be the same as ours (for just about any value of "ours"). That will leave us in a position where attempts to cope like what we're seeing with Fedora seem like the best options available. That does not seem like the path to freedom; it is not why we have spent decades developing free operating systems. Fedora's secure boot plan may be an effective workaround, but leaves the real bug unfixed.

Comments (122 posted)

Page editor: Jonathan Corbet
Next page: Security>>

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds