By Jake Edge
February 9, 2009
Release engineering for a large project is always a tricky task. Balancing
the needs of new features, removing old cruft, and bug fixing while still
producing releases in a timely fashion is difficult. Python is currently
struggling with this as it is trying to determine which things go into a 3.0.1
release versus those that belong in 3.1.0. The discussion gives a glimpse
into the thinking that must go on as projects decide how, what, and
when to release.
It is very common to find bugs shortly after a release that would seem to
necessitate a bug fix release. Ofttimes these are bugs that would have been
considered show-stopping had they been found before the release. But what
about features that were supposed to be dropped, after having been
deprecated for several releases, but were mistakenly left in? That is one
of the current dilemmas facing Python.
One of the changes made in Python 3.0 was a change
to comparisons and, in particular, removing the cmp()
function. That function takes two arguments, returning -1, 0, or 1 based
on whether the first argument was less than, equal to, or greater than the
second. Python 3.0 set out to clean up some of the "warts" of the language;
cmp() could be handled in other, more efficient ways. The only
problem is: cmp() didn't really get removed from the Python
3.0 release in December.
It was recognized quite quickly (the bug report shows it being
reported three days after the release), but it wasn't exactly clear what to
do about it. There may now exist "valid" Python 3.0 programs that use
cmp() and function correctly. This led Guido van Rossum to say: "Bah. That means
we'll have to start deprecating cmp() in 3.1, and won't
be able to remove it until 3.2 or 3.3. :-)" He seems to have only
been half-serious, as the smiley might indicate, eventually concluding: "OK, remove it
in 3.0.1, provided that's released this year." Unfortunately, the
"this year" he was referring to is 2008.
Because Python 3 was such a major shift in the language, the 2to3
tool was created to help fix old code to work with the new interpreter.
But, 2to3 did not change calls to cmp(), so code created
using
that tool will run in Python 3.0. That makes for a bit of a tangle as van
Rossum explains:
Well, since 2to3 doesn't remove cmp, and it actually works, it's
likely that people will be accidentally depending on it in code
converted from 2.x. In the past, where there was a discrepancy between
docs and code, we've often ruled in favor of the code using arguments
like "it always worked like this so we'll break working code if we
change it now". There's clearly an argument of timeliness there, which
is why we'd like to get this fixed ASAP. The alternative, which nobody
likes, would be to keep it around, deprecate it in 3.1, and remove it
in 3.2 or 3.3.
As of this writing, Python 3.0.1 is intended
for release on February 13 with the removal of cmp(). There
seem to be a number of reasons that the release slipped into 2009, not
least is the holiday season that tends to eat up a fair chunk of December.
But it was also more complicated to remove cmp() than it at first
appeared. There were several standard libraries and tests that were still
using it as well Python internals that still referred to it. Inevitably,
as those things were getting worked out, other problems cropped up.
There are some fairly serious performance problems with the new I/O
library, with some experiencing read performance three orders of magnitude
slower on Python 3.0. There are also problems with chunked HTTP responses
when using urllib. Both of these require fairly extensive fixes,
though, which also requires lots of testing. It all adds up to a lot of
work, so folks start to wonder if much or all of the work shouldn't get
pushed into the 3.1 release which is targeted at an April or May time frame.
There are others who argue that the 3.0 series should be abandoned entirely
in the near term. Rather than have a 3.0.1 with substantial changes from
3.0—including the incompatible removal of cmp()—3.1
should be released quickly so that it is the release targeted by
developers. As Raymond Hettinger put it:
My preference is to drop 3.0 entirely (no [incompatible] bugfix release)
and in early February release 3.1 as the real 3.x that migrators ought
to aim for and that won't have [incompatible] bugfix releases. Then at
PyCon, we can have a real bug day and fix-up any chips in the paint.
There are some fairly important new features—notably moving the
new I/O to C for performance reasons—that will not make it for a
release in February, though. Since a 3.2 release would be quite a ways
off, those features would languish for too long. 3.1 release manager
Benjamin Peterson would would rather see an
immediate 3.0.1 release:
However, it seems to me that there are
two kinds of issues: those like __cmp__ removal and some silly IO bugs
that have been fixed for a while and [are] waiting to be released.
There's also projects like io in c which are important, but would not
make the schedule you and I want for 3.0.1/3.1. It's for those longer
term features that I want 3.0.1 and 3.1. If we [immediately] released 3.1,
when would those longer term projects that are important for migration
make it to stable? 3.2 is probably a while off.
There are also concerns that an immediate release called 3.1 might lead to
confusion and unhappiness for users. Martin Löwis voiced those fears to general agreement:
I would fear that than 3.1 gets the same fate as 3.0. In May, we will
all think "what piece of junk was that 3.1 release, let's put it to
history", and replace it with 3.2. By then, users will wonder if there
is ever a 3.x release that is any good.
Part of the problem is the "no new features" rule for bug fix
releases—those that are typically numbered by bumping the third digit
of the version number. Python established that rule in the 2.x series, to try to protect
the "most conservative users" as van Rossum points out. Those users have not moved to
Python 3 yet, so van Rossum argues that the rule can be suspended:
Frankly, I don't really believe the users for whom those rules were
created are using 3.0 yet. Instead, I expect there to be two types of
users: people in the educational business who don't have a lot of
bridges to burn and are eager to use the new features; and developers
of serious Python software (e.g. Twisted) who are trying to figure out
how to port their code to 3.0. The first group isn't affected by the
changes we're considering here (e.g. removing cmp or some obscure
functions from the operator module). The latter group *may* be
affected, simply because they may have some pre-3.0 code using old
features that (by accident) still works under 3.0.
This argument seemed to help crystallize a consensus of sorts. There were
some other discussions of exactly which "features" should make an
appearance in 3.0.1, but the push for numbering the bug fix release as 3.1
seemed to fade. The 3.0.1 release is currently scheduled for February
13th, while other new features—undoubtedly along with additional
fixes—will come with the 3.1 release in April or May.
Part of what was considered in the deliberations was the impact on users
and what they will expect from how the releases are numbered. It is a
difficult problem, as KDE found
out a year ago. Users have certain expectations based on release
numbering, which are largely outside of a project's control. But, some
kinds of changes, especially those that are not backward compatible,
necessitate a "large enough" numeric change to indicate that.
It is a fine line, which is why Python has struggled with it. One hopes
that any development for Python 3—a large, incompatible language
overhaul itself—avoided using cmp(), and will then be
unaffected. If not, the relatively small window in time should keep the
number of affected programs to a minimum.
(
Log in to post comments)