Why people don't test development distributions
Your editor has a habit of running development distributions on real-work machines. There is no better way to stay on top of what the development communities (at both the distributor and upstream levels) are up to; it's also a way to help the community by finding and reporting bugs. Much of June was spent traveling, though, with the result that these machines were generally on the wrong side of an ocean and, thus, fell behind the leading edge. On return, after shoveling out a horrifying inbox, your editor decided to bring his desktop system up to current Rawhide. After all, what could possibly go wrong?
Anybody who has worked with development distributions for any period of time knows that the early part of the distribution development cycle is when things are most likely to go wrong. That's when the distribution-wide, disruptive changes go in. Traffic on the mailing lists suggested that, after the Fedora 11 release, Rawhide did not disappoint anybody looking to add a little adrenaline to their working day. Still, it seemed that things had settled out a bit; one tester responded to a query from your editor by saying:
So your editor upgraded. Sound stopped working. The screen saver started leaving the display in a weird, low-color-resolution state. And, most annoyingly, the keyboard layout went fully into psychedelic country. Selecting the indispensable GNOME "caps lock is another Control" option yielded a keyboard with no Control key at all; turning that option off restored control - to the Alt-left key. The Alt modifier appeared to be entirely unobtainable - a situation which can only serve to cause extreme misery to any serious Emacs user.
All inconvenient, but, then, development distributions can be like that; one should not venture into that world if one is not prepared to encounter occasional bizarre behavior. Often, in cases like this, the best thing to do is to report the problems and follow the leading edge closely in the hope that fixes will be uploaded soon. So that's what your editor did.
[PULL QUOTE: Your editor, drawing on many years of system administration experience, had come to the reasoned conclusion that it was a good time to run away screaming. END QUOTE] Big mistake. Just before the holiday weekend in the US, somebody uploaded a broken prelink which hosed most important executables on the system. The result was a box which wouldn't boot and which couldn't really even be fixed from a rescue disk. It now seems that running prelink -au * from a rescue disk might be a way for other afflicted users to get their boxes back. By the time that was posted, though, your editor, drawing on many years of system administration experience, had come to the reasoned conclusion that it was a good time to run away screaming.
A helpful hint for development distribution users: have at least one other root-suitable partition set aside on the system. All useful files not directly tied to the distribution should be stored elsewhere. If things get really ugly, one can always boot an emergency backup partition and end up with a usable system. This article is currently being typed using a system kept on such a partition.
Others recommend running development distributions within virtualized guests or on sacrificial boxes. Both of those techniques are useful, but they miss an important point: the best way to find problems in new software is to use it for real work. If people are not trying to actually get things done with a development distribution, they are going to miss a lot of the bugs. Those bugs will then turn up after the (allegedly) stable release, biting users who didn't think they were signing up for alpha-level software. We need people doing more than just convincing themselves that the testing box boots properly.
For this reason, Fedora, like other distributors, would like to see more people testing its development distribution. Your editor would like to see that too; testing of early releases is one of the "prices" that many of us need to pay to help ensure that our free software is as good as we expect it to be. Besides, tracking an evolving system is often fun; it can help to bring users further into our community. But it is hard to tell most users that they should be running a development distribution if it's liable to leave them with a smoking wreckage of a system when they really need to get some work done.
And, it should be noted, problems like this are certainly not limited to Rawhide; Ubuntu testers who updated gdm at the wrong time will certainly be questioning their karma as this is being written.
So, what can be done to make development distributions safer for a wider community of testers? Absolute safety seems unattainable, but there are some things which could be done:
- Create a version of the distribution containing packages which have
shown a relatively low level of combustibility. The alpha releases
done by some distributors are a step in this direction; there is
usually an attempt made to stabilize things a little bit prior to the
release. But these releases tend to leave testers somewhat behind the
current state of the art. Debian's "testing" distribution is probably
the best example of how this can be done on an ongoing basis.
- Provide an indication of the state of the distribution. Many beaches
are equipped with red flags which are posted when dangerous currents
are present. Wouldn't it be nice if an apt-get upgrade
could respond with a message like "the current threat condition is
orange, you may want to reconsider"?
- A built-in rollback system which can undo the effects of an ill-advised upgrade, even if the system as a whole has been reduced to rubble. The Btrfs snapshot mechanism should be well suited to this sort of feature - once Btrfs is stable enough to be used on a root partition.
This is an issue which merits some thought. If we can make testing easier
and safer, we should end up with more testers. That, in turn, should lead
to more stable releases and, just importantly, users who have more invested
in the software and the process which creates it. It is hard to see how
those could fail to be good things.
