June 17, 2009
This article was contributed by Andrew M. Kuchling
On June 4th, the Python developers removed a recently-added module,
ipaddr.py, from the upcoming Python 3.1 release. At the time 3.1 was
still in release candidate status — rc1 had been released on May 30th
— and the usual policy for Python's release candidates is that only
bug fixes are permitted. Adding or removing an entire feature is
normally forbidden. Why did the Python developers change their minds?
The story behind the decision shows a possible communication problem
as projects grow and the number of available channels increases.
The story begins in September 2008 with Issue 3959 being filed in
Python's bug tracker by Guido van Rossum, recording a suggestion that
ipaddr.py be added to the standard library. After the issue was
created a few messages were exchanged discussing the
authors' willingness to move the master copy into Python's SVN
repository, and then the issue fell silent. (The Python developers
were probably distracted by preparations for the final release of
Python 2.6 that occurred a week later.)
ipaddr.py parses IPv4 numeric addresses such as 172.16.40.35 and
address ranges such as 192.168.0.0/16, as well as the corresponding
IPv6 formats. Users can create IPv4 and IPv6 address ranges, remove a
subset of addresses from a range, check whether a given address is
included in a range, and compare addresses and networks for
equivalence. The module had been used internally at Google for some
time and bears a copyright statement dated 2007; the first open-source
release was in September 2008 as the ipaddr-py project on
Google Code.
After the initial messages, it was some months before discussion
revived. Some commenters noted similar libraries such as netaddr and IPy. After some
further discussion, in February 2009 Martin von Löwis wrote "Yes,
I think it can be integrated now.". There was another delay of a few
months before the code was actually committed, but on May 1st Gregory
P. Smith committed ipaddr.py to the Python 2.7 branch in SVN revision
72173, also merging the change to the Python 3.1 branch.
But, in June Clay McClure argued against the
inclusion of ipaddr.py on the bug tracker:
When looking for an IP address library for use
in a network scanning and discovery application I wrote last year, I
decided against ipaddr because of its conflation of address and
network -- it seems strange to me to have one class modeling both
concepts.
After reading the technical discussion on the ipaddr and netaddr
lists, it is clear to me that ipaddr's designers have a somewhat
limited understanding of IP addressing, and have not designed an API
that articulates IP concepts particularly well.
McClure's objection stems from ipaddr.py using the same class to
represent both individual IP addresses and network address ranges.
Network ranges are usually represented as a starting address and the
number of bits that are the fixed portion of addresses within the
range. 192.168.0.0/16 is an IPv4 range, for example; the first 16
bits of the address (the 192.168. portion) are fixed and the remaining
bits can vary. ipaddr.py represents
a IPv4 host's address, which is a specific 32-bit value, as the address plus a
/32 value. Treating addresses in this way lets a single class be used to
represent either a single address or a range.
(IPv6 uses 128-bit addresses and hex notation, but the principles are the same.)
The debate is about whether this representation is reasonable.
>>> from ipaddr import *
>>> a1 = IPv4('127.0.0.1')
>>> a1
IPv4('127.0.0.1/32')
>>> a2 = IPv4('127.0.0.1/32')
>>> a1 == a2
True
Core contributor Raymond Hettinger
brought the discussion to the attention of the python-dev mailing list:
Clay McClure is raising some objections to the new ipaddr.py module.
JP Calderone shares his concerns. I think they were the only
commenters not directly affiliated with one of the competing projects.
The issues they've raised seem serious, but I don't know enough about
the subject to make a meaningful comment.
[...]
Does anyone here know if Clay's concern about subnets vs netmasks in accurate and whether it affects the usability of the module?
van Rossum initially thought
the reported flaws were not so significant:
I haven't read the bug, but given the extensive real-life use that
ipaddr.py has seen at Google before inclusion into the stdlib, I think
"deep conceptual flaws" must be an overstatement. There may be real
differences of opinion about the politically correct way to view
subnets and netmasks, but I don't doubt that the module as it stands
is usable enough to keep it in the stdlib. Nothing's perfect.
The resulting discussion thread received dozens of messages, and a
consensus formed that the module's interface needed some reworking.
Eventually van Rossum agreed it
should just be removed from 3.1:
I'm disappointed in the process -- it's as if nobody really reviewed
the API until it was released with rc1, and this despite there being a
significant discussion about its inclusion and alternatives months
ago. (Don't look at me -- I wouldn't recognize a netmask if it bit me
in the behind, and I can honestly say that I don't know whether /8
means to look only at the first 8 bits or whether it means to mask off
the last 8 bits.)
To a first approximation, this is just business as usual for any
project: a feature was proposed, the code got committed, but later the
developers collectively changed their minds and backed out the change
from the 3.1 branch. The impact is minimal, because the code was only
in one release tarball, a short-lived 3.1rc1 release candidate that
was replaced two weeks later by 3.1rc2. The module remains in the 2.7
branch for now, but 2.7's release is months away, leaving plenty of
time to remove the module or to rework the module's API and include it
in Python 3.2. A Python Enhancement Proposal may be written
defining an IP-address module.
However, the incident brought some process issues to attention,
and the discussion divided into three major strands.
Mailing list vs. bug tracker
Part of the reason for the late reversal was that objections were
being made through commenting on a particular issue in the bug
tracker. A January
thread on python-dev didn't cause much discussion, and the
disagreement on the issue wasn't noticed by 3.1 release manager
Benjamin Peterson.
Glyph Lefkowitz of the Twisted project detailed
the procedures for the Twisted project, where all discussions are
in the bug tracker:
The way Twisted dealt with this particular issue was to move *all*
discussions relevant to a particular feature into the ticket for that
feature. If discussion starts up on the mailing list, within a few
messages of it starting, someone on the dev team will pipe up and say
"Hey, here's the ticket for this, could you add a link to this
discussion and a summary".
Bug trackers encourage focused conversations between people with an
interest in a bug or patch, preventing an unmanageably busy mailing
list. But there's a cost; this increased focus makes it harder to
obtain a general view, or to notice that one issue is particularly
serious or important. One suggested improvement was adding the
python-dev mailing list to the Roundup bug tracker as a user. Roundup
creates a 'nosy list' for each issue; users on the nosy list receive
e-mail notifications of new comments or modifications to the issue.
If python-dev existed as a Roundup user, adding python-dev to the nosy
list for an issue would send a clear signal to other core developers.
Unclear decision-making
The developers of Apache have a formal voting
process that's specified in careful detail. Python follows some
of Apache's conventions, such as using +1 and -1 to indicate support
or opposition, but these aren't votes, merely polls showing general
opinions. For example, under Apache's rules a -1
vote is a veto and must be accompanied by a technical justification,
but many changes to Python have received -1 votes and still been
accepted.
In Lefkowitz's posting cited earlier, he also noted that
there is a vocabulary for describing the next action
to take on a patch:
Once on a ticket, the phraseology and typesetting used by our core team
has reached a very precise consensus. Like the feature? "Merge this
patch" or "Land this branch". Don't like it? "Thanks for your patch,
but:", followed by a list of enumerated feedback. The required
reactions to such feedback are equally well understood. Even if a
comment isn't a full, formal code review, it still follows a similar
style.
This system is possibly too simplistic for the more wide-ranging
development of Python; although Twisted has its share of enthusiastic
discussions, there is rarely the level of controversy I see here on
python-dev, so I can't recommend it as such. I can say that keeping
ticket discussions on tickets is generally a good idea, though,
especially in a system like roundup or trac where it's easy to say "see
message 7" rather than repeating yourself.
Selecting additions to the standard library
A number of different modules for representing IP addresses have
been written, including ipaddr.py. David P. D. Moss, author of the
netaddr module, drew up a
list of IP-address modules that contains 12 projects, the oldest
dating back to 2000. ipaddr.py is comparatively new, having been
publicly released in September 2008; so why was it chosen?
To some degree it's just a question of luck and timing: the
suggestion to add ipaddr.py was just the first one to be recorded in
the bug tracker, but a policy of "whoever asks first" does not ensure
that the best module is chosen for addition.
What selection process would work better? Barry Warsaw thought
usage information from the package index could be helpful, suggesting
"It would be really nice if say the Cheeseshop had a voting
feature". The Cheeseshop is another name for Python's package index.
Neil Schemenauer further noted
Debian's popularity-contest script, which reports the
packages installed on a system to a central server.
van Rossum didn't like the idea at all, saying "Whoa. Are
you all suddenly trying to turn Python into a democracy? I'm outta
here if that ever happens (even if I were voted BDFL by a majority
:-).".
But van Rossum had previously noted that he wasn't a likely user of the
module and could not really assess the suitability of the module's
interface. In the absence of a strong opinion based on technical
grounds, how should such decisions be made? Going by popularity seems
to be as reasonable a way to choose as any other. Of course, the
sheer number of modules may imply that this is an area where people
are finicky about the approach taken, and perhaps the standard library
needn't take a position on it.
Conclusion
In order to make progress, open-source projects, like people, must
continuously decide which action to take next. Much has been
written about personal time management, including books such as
Stephen R. Covey's 7 Habits of Effective People and David Allen's
Getting Things Done. Often the key to improvement is taking a
careful inventory of tasks and organizing this inventory to minimize
effort spent figuring out what to do and maximize effort spent
actually implementing or fixing.
As free software projects spread out across multiple communication
channels — a set of mailing lists, the
bug tracker, SourceForge forums, personal weblogs, IRC conversations,
Twitter, and soon enough new technologies such as Google Wave —
their developers will face many of the same issues at a collective level.
Taking inventory is relatively easy because there are many tools
for doing so:
version-control systems, bug trackers, and mailing list archives. But how
do you ensure that the right people are paying attention?
(
Log in to post comments)