Python and ipaddr.py
On June 4th, the Python developers removed a recently-added module, ipaddr.py, from the upcoming Python 3.1 release. At the time 3.1 was still in release candidate status — rc1 had been released on May 30th — and the usual policy for Python's release candidates is that only bug fixes are permitted. Adding or removing an entire feature is normally forbidden. Why did the Python developers change their minds? The story behind the decision shows a possible communication problem as projects grow and the number of available channels increases.
The story begins in September 2008 with Issue 3959 being filed in Python's bug tracker by Guido van Rossum, recording a suggestion that ipaddr.py be added to the standard library. After the issue was created a few messages were exchanged discussing the authors' willingness to move the master copy into Python's SVN repository, and then the issue fell silent. (The Python developers were probably distracted by preparations for the final release of Python 2.6 that occurred a week later.)
ipaddr.py parses IPv4 numeric addresses such as 172.16.40.35 and address ranges such as 192.168.0.0/16, as well as the corresponding IPv6 formats. Users can create IPv4 and IPv6 address ranges, remove a subset of addresses from a range, check whether a given address is included in a range, and compare addresses and networks for equivalence. The module had been used internally at Google for some time and bears a copyright statement dated 2007; the first open-source release was in September 2008 as the ipaddr-py project on Google Code.
After the initial messages, it was some months before discussion
revived. Some commenters noted similar libraries such as netaddr and IPy. After some
further discussion, in February 2009 Martin von Löwis wrote "Yes,
I think it can be integrated now.
". There was another delay of a few
months before the code was actually committed, but on May 1st Gregory
P. Smith committed ipaddr.py to the Python 2.7 branch in SVN revision
72173, also merging the change to the Python 3.1 branch.
But, in June Clay McClure argued against the inclusion of ipaddr.py on the bug tracker:
After reading the technical discussion on the ipaddr and netaddr lists, it is clear to me that ipaddr's designers have a somewhat limited understanding of IP addressing, and have not designed an API that articulates IP concepts particularly well.
McClure's objection stems from ipaddr.py using the same class to represent both individual IP addresses and network address ranges. Network ranges are usually represented as a starting address and the number of bits that are the fixed portion of addresses within the range. 192.168.0.0/16 is an IPv4 range, for example; the first 16 bits of the address (the 192.168. portion) are fixed and the remaining bits can vary. ipaddr.py represents a IPv4 host's address, which is a specific 32-bit value, as the address plus a /32 value. Treating addresses in this way lets a single class be used to represent either a single address or a range. (IPv6 uses 128-bit addresses and hex notation, but the principles are the same.) The debate is about whether this representation is reasonable.
>>> from ipaddr import *
>>> a1 = IPv4('127.0.0.1')
>>> a1
IPv4('127.0.0.1/32')
>>> a2 = IPv4('127.0.0.1/32')
>>> a1 == a2
True
Core contributor Raymond Hettinger brought the discussion to the attention of the python-dev mailing list:
[...]
Does anyone here know if Clay's concern about subnets vs netmasks in accurate and whether it affects the usability of the module?
van Rossum initially thought the reported flaws were not so significant:
The resulting discussion thread received dozens of messages, and a consensus formed that the module's interface needed some reworking. Eventually van Rossum agreed it should just be removed from 3.1:
To a first approximation, this is just business as usual for any project: a feature was proposed, the code got committed, but later the developers collectively changed their minds and backed out the change from the 3.1 branch. The impact is minimal, because the code was only in one release tarball, a short-lived 3.1rc1 release candidate that was replaced two weeks later by 3.1rc2. The module remains in the 2.7 branch for now, but 2.7's release is months away, leaving plenty of time to remove the module or to rework the module's API and include it in Python 3.2. A Python Enhancement Proposal may be written defining an IP-address module.
However, the incident brought some process issues to attention, and the discussion divided into three major strands.
Mailing list vs. bug tracker
Part of the reason for the late reversal was that objections were being made through commenting on a particular issue in the bug tracker. A January thread on python-dev didn't cause much discussion, and the disagreement on the issue wasn't noticed by 3.1 release manager Benjamin Peterson.
Glyph Lefkowitz of the Twisted project detailed the procedures for the Twisted project, where all discussions are in the bug tracker:
Bug trackers encourage focused conversations between people with an interest in a bug or patch, preventing an unmanageably busy mailing list. But there's a cost; this increased focus makes it harder to obtain a general view, or to notice that one issue is particularly serious or important. One suggested improvement was adding the python-dev mailing list to the Roundup bug tracker as a user. Roundup creates a 'nosy list' for each issue; users on the nosy list receive e-mail notifications of new comments or modifications to the issue. If python-dev existed as a Roundup user, adding python-dev to the nosy list for an issue would send a clear signal to other core developers.
Unclear decision-making
The developers of Apache have a formal voting process that's specified in careful detail. Python follows some of Apache's conventions, such as using +1 and -1 to indicate support or opposition, but these aren't votes, merely polls showing general opinions. For example, under Apache's rules a -1 vote is a veto and must be accompanied by a technical justification, but many changes to Python have received -1 votes and still been accepted.
In Lefkowitz's posting cited earlier, he also noted that there is a vocabulary for describing the next action to take on a patch:
This system is possibly too simplistic for the more wide-ranging development of Python; although Twisted has its share of enthusiastic discussions, there is rarely the level of controversy I see here on python-dev, so I can't recommend it as such. I can say that keeping ticket discussions on tickets is generally a good idea, though, especially in a system like roundup or trac where it's easy to say "see message 7" rather than repeating yourself.
Selecting additions to the standard library
A number of different modules for representing IP addresses have been written, including ipaddr.py. David P. D. Moss, author of the netaddr module, drew up a list of IP-address modules that contains 12 projects, the oldest dating back to 2000. ipaddr.py is comparatively new, having been publicly released in September 2008; so why was it chosen?
To some degree it's just a question of luck and timing: the suggestion to add ipaddr.py was just the first one to be recorded in the bug tracker, but a policy of "whoever asks first" does not ensure that the best module is chosen for addition.
What selection process would work better? Barry Warsaw thought
usage information from the package index could be helpful, suggesting
"It would be really nice if say the Cheeseshop had a voting
feature
". The Cheeseshop is another name for Python's package index.
Neil Schemenauer further noted
Debian's popularity-contest script, which reports the
packages installed on a system to a central server.
van Rossum didn't like the idea at all, saying "Whoa. Are
you all suddenly trying to turn Python into a democracy? I'm outta
here if that ever happens (even if I were voted BDFL by a majority
:-).
".
But van Rossum had previously noted that he wasn't a likely user of the module and could not really assess the suitability of the module's interface. In the absence of a strong opinion based on technical grounds, how should such decisions be made? Going by popularity seems to be as reasonable a way to choose as any other. Of course, the sheer number of modules may imply that this is an area where people are finicky about the approach taken, and perhaps the standard library needn't take a position on it.
Conclusion
In order to make progress, open-source projects, like people, must continuously decide which action to take next. Much has been written about personal time management, including books such as Stephen R. Covey's 7 Habits of Effective People and David Allen's Getting Things Done. Often the key to improvement is taking a careful inventory of tasks and organizing this inventory to minimize effort spent figuring out what to do and maximize effort spent actually implementing or fixing.
As free software projects spread out across multiple communication
channels — a set of mailing lists, the
bug tracker, SourceForge forums, personal weblogs, IRC conversations,
Twitter, and soon enough new technologies such as Google Wave —
their developers will face many of the same issues at a collective level.
Taking inventory is relatively easy because there are many tools
for doing so:
version-control systems, bug trackers, and mailing list archives. But how
do you ensure that the right people are paying attention?
| Index entries for this article | |
|---|---|
| GuestArticles | Kuchling, A.M. |
