By Jonathan Corbet
July 6, 2009
Development distributions play a crucial role in the free software
ecosystem. They are the proving ground where much new software is first
exposed to a wider user community; they are also the place where this
software demonstrates how well it plays with other packages. Distributors
would like to see wider testing of their development releases, but, as
your editor's recent experience shows, there are limits to how wide this
testing community can be expected to be.
Your editor has a habit of running development distributions on real-work machines.
There is no better way to stay on top of what the development communities
(at both the distributor and upstream levels) are up to; it's also a way to
help the community by finding and reporting bugs. Much of June was spent
traveling, though, with the result that these machines were generally on
the wrong side of an ocean and, thus, fell behind the leading edge. On
return, after shoveling out a horrifying inbox, your editor decided to
bring his desktop system up to current Rawhide. After all, what could
possibly go wrong?
Anybody who has worked with development distributions for any period of
time knows that the early part of the distribution development cycle is
when things are most likely to go wrong. That's when the
distribution-wide, disruptive changes go in. Traffic on the mailing lists
suggested that, after the Fedora 11 release, Rawhide did not
disappoint anybody looking to add a little adrenaline to their working
day. Still, it seemed that things had settled out a bit; one tester responded to a query from your editor by
saying:
You have missed all the fun! :-) Rawhide just got back to usable
state where I can begin reporting bugs again. Firefox has been
completely weird, Evolution won't even start here, the kernel has
done a good job of cooking my system drawing about twice the normal
amount of power...
So your editor upgraded. Sound stopped working. The screen saver started
leaving the display in a weird, low-color-resolution state. And, most
annoyingly, the keyboard layout went fully into psychedelic country.
Selecting the indispensable GNOME "caps lock is another Control" option
yielded a keyboard with no Control key at all; turning that option off
restored control - to the Alt-left key. The Alt modifier appeared to be
entirely unobtainable - a situation which can only serve to cause extreme
misery to any serious Emacs user.
All inconvenient, but, then, development distributions can be like that;
one should not venture into that world if one is not prepared to encounter
occasional bizarre behavior. Often, in cases like this, the best thing to
do is to report the problems and follow the leading edge closely in the hope that
fixes will be uploaded soon. So that's what your editor did.
[PULL QUOTE:
Your editor, drawing on many years of system administration
experience, had come to the reasoned conclusion that it was a good time to
run away screaming.
END QUOTE]
Big mistake. Just before the holiday weekend in the US, somebody uploaded a broken
prelink which hosed most important executables on the system. The
result was a box which wouldn't boot and which couldn't really even be
fixed from a rescue disk. It now seems that running
prelink -au * from a rescue disk might be a way for
other afflicted users to get their boxes back. By the time that was
posted, though, your editor, drawing on many years of system administration
experience, had come to the reasoned conclusion that it was a good time to
run away screaming.
A helpful hint for development distribution users: have at least one other
root-suitable partition set aside on the system. All useful files not
directly tied to the distribution should be stored elsewhere. If things
get really ugly, one can always boot an emergency backup partition and end
up with a usable system. This article is currently being typed using a
system kept on such a partition.
Others recommend running development distributions within virtualized
guests or on sacrificial boxes. Both of those techniques are useful, but
they miss an important point: the best way to find problems in new software
is to use it for real work. If people are not trying to actually get
things done with a development distribution, they are going to miss a lot
of the bugs. Those bugs will then turn up after the (allegedly) stable
release, biting users who didn't think they were signing up for alpha-level
software. We need people doing more than just convincing themselves that the
testing box boots properly.
For this reason, Fedora, like other distributors, would like to see more
people testing its development distribution. Your editor would like to see
that too; testing of early releases is one of the "prices" that many of us
need to pay to help ensure that our free software is as good as we expect
it to be. Besides, tracking an evolving system is often fun; it can help
to bring users further into our community. But it is hard to tell most
users that they should be running a development distribution if it's liable
to leave them with a smoking wreckage of a system when they really need to
get some work done.
And, it should be noted, problems like this are certainly not limited to
Rawhide; Ubuntu testers who updated gdm at the wrong time will certainly be questioning their karma as this
is being written.
So, what can be done to make development distributions safer
for a wider community of testers? Absolute safety seems unattainable, but
there are some things which could be done:
- Create a version of the distribution containing packages which have
shown a relatively low level of combustibility. The alpha releases
done by some distributors are a step in this direction; there is
usually an attempt made to stabilize things a little bit prior to the
release. But these releases tend to leave testers somewhat behind the
current state of the art. Debian's "testing" distribution is probably
the best example of how this can be done on an ongoing basis.
- Provide an indication of the state of the distribution. Many beaches
are equipped with red flags which are posted when dangerous currents
are present. Wouldn't it be nice if an apt-get upgrade
could respond with a message like "the current threat condition is
orange, you may want to reconsider"?
- A built-in rollback system which can undo the effects of an
ill-advised upgrade, even if the system as a whole has been reduced to
rubble. The Btrfs snapshot mechanism should be well suited to this
sort of feature - once Btrfs is stable enough to be used on a root
partition.
This is an issue which merits some thought. If we can make testing easier
and safer, we should end up with more testers. That, in turn, should lead
to more stable releases and, just importantly, users who have more invested
in the software and the process which creates it. It is hard to see how
those could fail to be good things.
(
Log in to post comments)