|Benefits for LWN subscribers|
The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!
Distributions, like all software (and other) projects, have failures. One of the most important things that can come out of any kind of failure is to learn from it and try to prevent similar failures in the future. That is precisely the goal of Adam Williamson's post mortem on the FedUp bug that affected users trying to upgrade to Fedora 20. In it, he explained how and why things went bad, with an eye toward better testing to catch this kind of bug in the future. He also had some thoughts on how the current release process might be changed to help avoid bugs that arise because of the time crunch at the end of the cycle.
Williamson shepherds Fedora's quality assurance (QA) efforts and is thus well-placed to observe what went wrong and to suggest fixes going forward. QA didn't catch the bug before it got out into the wild and Williamson accepts his share of the blame for that. But blame is not really the purpose of the exercise. Finding the underlying problems and addressing them for the future are the goals.
When Fedora 20 was released, the FedUp version most Fedora 18 and 19 users had (fedup-0.7) would not properly upgrade those systems to F20. FedUp is the approved method for upgrading from one Fedora version to the next (or even for skipping a version and going straight from F18 to F20, for example). The solution was fairly simple, even for those who had tried and failed to upgrade with 0.7: get fedup-0.8 and use it. There was a bit more to it than that, particularly for F18 users, but that was the crux of the fix.
The bug was spotted quickly and fixed pretty quickly, but the upgrade process is one of the most high-profile places for a release-day bug. It would certainly have left a bad taste for any users who were bitten by it. The fact that the bug could easily be overcome helped, but it was something of a black eye for the distribution on a day intended to celebrate a new release.
So, how does Fedora avoid the problem in the future? The actual underlying cause of the bug has not been identified, according to Williamson, but it appears that the versions for FedUp and the fedup-dracut package must be kept the same, so that the initramfs created by fedup-dracut will work with the FedUp installed on the user's machine. Essentially, FedUp 0.7 was fetching an initramfs created by fedup-dracut-0.8, which would not work to reboot the system as part of the upgrade. Falling back to the F18 or F19 kernel and initramfs would still allow the system to boot, however.
Beyond the bug's proximate cause, Williamson noted several problems that led to the bug, including a lack of widespread knowledge about how FedUp works, inadequate test cases, and two problems that are endemic to Fedora's short stabilization phase: release candidates that are short-lived and large changes to fundamental packages made late in the cycle. The latter two tend to reduce the amount of time that QA has for testing, which can lead to more bugs slipping through the cracks. Large, late changes also mean that not all of the ramifications of a new feature are discovered pre-release, which is another source of surprises.
Adding better test cases is fairly straightforward. The existing tests were set up when FedUp was developing rapidly, so the test case grabbed the package from the updates-testing repository (rather than the stable or updates repositories). For Fedora 18 and 19, fedup-0.8 was in updates-testing, so QA never saw the bug. The tests have been changed to get the package from the other repositories.
The bug also probably led to a better understanding of at least some of the workings of FedUp within the Fedora development community. In tracking down the bug and fixes for it, some folks got a crash course in FedUp and how it operates. That may help address Williamson's concern about a lack of knowledge of the tool. Given its importance to the distribution, a tool like FedUp should be well understood by more than just a small handful of community members.
The other identified issues will be harder to address, at least in the short term. But, as Williamson noted, squeezing everything into the tail end of the release cycle is a known problem; this bug just helped highlight it again:
It's also another good indicator that we should do whatever we can to try and land major changes much earlier in the release cycle. This is hardly a new observation, of course, nor an issue of which many relevant people were previously unaware, and there are always good reasons why we wind up landing the kitchen sink a week before release, but it's always good to have another reminder.
There are likely lessons for other projects and distributions here. While some of the issues were Fedora-specific, most were not. Williamson has done a nice service not only for Fedora here, but for the wider community. There are some real advantages to doing our work in the open—learning from other projects' successes and failures is just one of them.
Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds