LWN.net Logo

Python and ipaddr.py

June 17, 2009

This article was contributed by Andrew M. Kuchling

On June 4th, the Python developers removed a recently-added module, ipaddr.py, from the upcoming Python 3.1 release. At the time 3.1 was still in release candidate status — rc1 had been released on May 30th — and the usual policy for Python's release candidates is that only bug fixes are permitted. Adding or removing an entire feature is normally forbidden. Why did the Python developers change their minds? The story behind the decision shows a possible communication problem as projects grow and the number of available channels increases.

The story begins in September 2008 with Issue 3959 being filed in Python's bug tracker by Guido van Rossum, recording a suggestion that ipaddr.py be added to the standard library. After the issue was created a few messages were exchanged discussing the authors' willingness to move the master copy into Python's SVN repository, and then the issue fell silent. (The Python developers were probably distracted by preparations for the final release of Python 2.6 that occurred a week later.)

ipaddr.py parses IPv4 numeric addresses such as 172.16.40.35 and address ranges such as 192.168.0.0/16, as well as the corresponding IPv6 formats. Users can create IPv4 and IPv6 address ranges, remove a subset of addresses from a range, check whether a given address is included in a range, and compare addresses and networks for equivalence. The module had been used internally at Google for some time and bears a copyright statement dated 2007; the first open-source release was in September 2008 as the ipaddr-py project on Google Code.

After the initial messages, it was some months before discussion revived. Some commenters noted similar libraries such as netaddr and IPy. After some further discussion, in February 2009 Martin von Löwis wrote "Yes, I think it can be integrated now.". There was another delay of a few months before the code was actually committed, but on May 1st Gregory P. Smith committed ipaddr.py to the Python 2.7 branch in SVN revision 72173, also merging the change to the Python 3.1 branch.

But, in June Clay McClure argued against the inclusion of ipaddr.py on the bug tracker:

When looking for an IP address library for use in a network scanning and discovery application I wrote last year, I decided against ipaddr because of its conflation of address and network -- it seems strange to me to have one class modeling both concepts.

After reading the technical discussion on the ipaddr and netaddr lists, it is clear to me that ipaddr's designers have a somewhat limited understanding of IP addressing, and have not designed an API that articulates IP concepts particularly well.

McClure's objection stems from ipaddr.py using the same class to represent both individual IP addresses and network address ranges. Network ranges are usually represented as a starting address and the number of bits that are the fixed portion of addresses within the range. 192.168.0.0/16 is an IPv4 range, for example; the first 16 bits of the address (the 192.168. portion) are fixed and the remaining bits can vary. ipaddr.py represents a IPv4 host's address, which is a specific 32-bit value, as the address plus a /32 value. Treating addresses in this way lets a single class be used to represent either a single address or a range. (IPv6 uses 128-bit addresses and hex notation, but the principles are the same.) The debate is about whether this representation is reasonable.

    >>> from ipaddr import *
    >>> a1 = IPv4('127.0.0.1')
    >>> a1  
    IPv4('127.0.0.1/32')
    >>> a2 = IPv4('127.0.0.1/32')
    >>> a1 == a2  
    True

Core contributor Raymond Hettinger brought the discussion to the attention of the python-dev mailing list:

Clay McClure is raising some objections to the new ipaddr.py module. JP Calderone shares his concerns. I think they were the only commenters not directly affiliated with one of the competing projects. The issues they've raised seem serious, but I don't know enough about the subject to make a meaningful comment.

[...]

Does anyone here know if Clay's concern about subnets vs netmasks in accurate and whether it affects the usability of the module?

van Rossum initially thought the reported flaws were not so significant:

I haven't read the bug, but given the extensive real-life use that ipaddr.py has seen at Google before inclusion into the stdlib, I think "deep conceptual flaws" must be an overstatement. There may be real differences of opinion about the politically correct way to view subnets and netmasks, but I don't doubt that the module as it stands is usable enough to keep it in the stdlib. Nothing's perfect.

The resulting discussion thread received dozens of messages, and a consensus formed that the module's interface needed some reworking. Eventually van Rossum agreed it should just be removed from 3.1:

I'm disappointed in the process -- it's as if nobody really reviewed the API until it was released with rc1, and this despite there being a significant discussion about its inclusion and alternatives months ago. (Don't look at me -- I wouldn't recognize a netmask if it bit me in the behind, and I can honestly say that I don't know whether /8 means to look only at the first 8 bits or whether it means to mask off the last 8 bits.)

To a first approximation, this is just business as usual for any project: a feature was proposed, the code got committed, but later the developers collectively changed their minds and backed out the change from the 3.1 branch. The impact is minimal, because the code was only in one release tarball, a short-lived 3.1rc1 release candidate that was replaced two weeks later by 3.1rc2. The module remains in the 2.7 branch for now, but 2.7's release is months away, leaving plenty of time to remove the module or to rework the module's API and include it in Python 3.2. A Python Enhancement Proposal may be written defining an IP-address module.

However, the incident brought some process issues to attention, and the discussion divided into three major strands.

Mailing list vs. bug tracker

Part of the reason for the late reversal was that objections were being made through commenting on a particular issue in the bug tracker. A January thread on python-dev didn't cause much discussion, and the disagreement on the issue wasn't noticed by 3.1 release manager Benjamin Peterson.

Glyph Lefkowitz of the Twisted project detailed the procedures for the Twisted project, where all discussions are in the bug tracker:

The way Twisted dealt with this particular issue was to move *all* discussions relevant to a particular feature into the ticket for that feature. If discussion starts up on the mailing list, within a few messages of it starting, someone on the dev team will pipe up and say "Hey, here's the ticket for this, could you add a link to this discussion and a summary".

Bug trackers encourage focused conversations between people with an interest in a bug or patch, preventing an unmanageably busy mailing list. But there's a cost; this increased focus makes it harder to obtain a general view, or to notice that one issue is particularly serious or important. One suggested improvement was adding the python-dev mailing list to the Roundup bug tracker as a user. Roundup creates a 'nosy list' for each issue; users on the nosy list receive e-mail notifications of new comments or modifications to the issue. If python-dev existed as a Roundup user, adding python-dev to the nosy list for an issue would send a clear signal to other core developers.

Unclear decision-making

The developers of Apache have a formal voting process that's specified in careful detail. Python follows some of Apache's conventions, such as using +1 and -1 to indicate support or opposition, but these aren't votes, merely polls showing general opinions. For example, under Apache's rules a -1 vote is a veto and must be accompanied by a technical justification, but many changes to Python have received -1 votes and still been accepted.

In Lefkowitz's posting cited earlier, he also noted that there is a vocabulary for describing the next action to take on a patch:

Once on a ticket, the phraseology and typesetting used by our core team has reached a very precise consensus. Like the feature? "Merge this patch" or "Land this branch". Don't like it? "Thanks for your patch, but:", followed by a list of enumerated feedback. The required reactions to such feedback are equally well understood. Even if a comment isn't a full, formal code review, it still follows a similar style.

This system is possibly too simplistic for the more wide-ranging development of Python; although Twisted has its share of enthusiastic discussions, there is rarely the level of controversy I see here on python-dev, so I can't recommend it as such. I can say that keeping ticket discussions on tickets is generally a good idea, though, especially in a system like roundup or trac where it's easy to say "see message 7" rather than repeating yourself.

Selecting additions to the standard library

A number of different modules for representing IP addresses have been written, including ipaddr.py. David P. D. Moss, author of the netaddr module, drew up a list of IP-address modules that contains 12 projects, the oldest dating back to 2000. ipaddr.py is comparatively new, having been publicly released in September 2008; so why was it chosen?

To some degree it's just a question of luck and timing: the suggestion to add ipaddr.py was just the first one to be recorded in the bug tracker, but a policy of "whoever asks first" does not ensure that the best module is chosen for addition.

What selection process would work better? Barry Warsaw thought usage information from the package index could be helpful, suggesting "It would be really nice if say the Cheeseshop had a voting feature". The Cheeseshop is another name for Python's package index. Neil Schemenauer further noted Debian's popularity-contest script, which reports the packages installed on a system to a central server.

van Rossum didn't like the idea at all, saying "Whoa. Are you all suddenly trying to turn Python into a democracy? I'm outta here if that ever happens (even if I were voted BDFL by a majority :-).".

But van Rossum had previously noted that he wasn't a likely user of the module and could not really assess the suitability of the module's interface. In the absence of a strong opinion based on technical grounds, how should such decisions be made? Going by popularity seems to be as reasonable a way to choose as any other. Of course, the sheer number of modules may imply that this is an area where people are finicky about the approach taken, and perhaps the standard library needn't take a position on it.

Conclusion

In order to make progress, open-source projects, like people, must continuously decide which action to take next. Much has been written about personal time management, including books such as Stephen R. Covey's 7 Habits of Effective People and David Allen's Getting Things Done. Often the key to improvement is taking a careful inventory of tasks and organizing this inventory to minimize effort spent figuring out what to do and maximize effort spent actually implementing or fixing.

As free software projects spread out across multiple communication channels — a set of mailing lists, the bug tracker, SourceForge forums, personal weblogs, IRC conversations, Twitter, and soon enough new technologies such as Google Wave — their developers will face many of the same issues at a collective level. Taking inventory is relatively easy because there are many tools for doing so: version-control systems, bug trackers, and mailing list archives. But how do you ensure that the right people are paying attention?


(Log in to post comments)

Python and ipaddr.py

Posted Jun 18, 2009 2:26 UTC (Thu) by gdt (guest, #6284) [Link]

I am a network engineer. The conflation of IP addresses; IP routing prefixes; IP access lists masks; inverse masks; 32-bit numbers which are convenient to print in dotted-quad format (eg, OSPF areas) really annoys me. Tools which do this encourage sloppy thinking, leading to routing prefixes without lengths or firewall rules which can't ignore just part of an address (such as the subnet component of a EUI-64 address).

To see what I mean, consider a function is_multicast(). Handing that a bitmask or a 32-bit number (as opposed to an address or a routing prefix) should result in an error. But that's exactly the API bug which prevents OSPF areas starting with 224. on a certain make of router.

Similarly, a function ipv4_to_ipv4_mapped_ipv6() should only accept IPv4 addresses; not routing prefixes, OSPF areas, or bitmasks.

Another good example is Wireshark. It conflates all of these things. Which makes it difficult to add code like is_iana_allocated() to check all routing prefixes and indicate those which should not be seen in the wild. Every printing of an IP address needs to be examined, rather than updating a generic route prefix printing function.

Python and ipaddr.py

Posted Jun 18, 2009 12:48 UTC (Thu) by busterb (subscriber, #560) [Link]

I've been using libdnet, which is in C, for years, and it also uses the same structure to identify an
address and a subnet. http://libdnet.sourceforge.net/dnet.3.txt An individual host IP is just a /32.

I'm not sure I understand how you can deal with an IP address without also knowing its netmask,
whether you're calling it a subnet or a host IP address. In fact, Linux's tools complain if you try to
manipulate an IP address without the mask:

bcook@target4:~$ sudo ip addr add 2.0.0.1/16 dev test2
bcook@target4:~$ sudo ip addr del 2.0.0.1 dev test2
Warning: Executing wildcard deletion to stay compatible with old scripts.
Explicitly specify the prefix length (2.0.0.1/32) to avoid this warning.
This special behaviour is likely to disappear in further releases,
fix your scripts!

Python and ipaddr.py

Posted Jun 18, 2009 15:38 UTC (Thu) by johill (subscriber, #25196) [Link]

The 'ip' command behaviour outlined here is because it also conflates the IP address and the network, because it not only deletes the address but also the corresponding route.

Python and ipaddr.py

Posted Jun 18, 2009 20:32 UTC (Thu) by jengelh (subscriber, #33263) [Link]

ip does not delete routes here; the kernel does (AFAICS). You see the same behavior when doing `ip link set eth0 down`.

Python and ipaddr.py

Posted Jun 18, 2009 20:42 UTC (Thu) by johill (subscriber, #25196) [Link]

Ok, so the kernel needs to know the address and netmask and conflates it, not ip. Doesn't change much :)

Python popularity-contest

Posted Jun 22, 2009 17:59 UTC (Mon) by ballombe (subscriber, #9523) [Link]

Debian provides the ranking of python related Debian packages: http://popcon.debian.org/main/python/by_vote>

However Debian does not rely on popularity-contest ranking for technical decision, only to know whether a package is still used.

Python and ipaddr.py

Posted Jun 23, 2009 12:59 UTC (Tue) by Baylink (subscriber, #755) [Link]

I'm going to have to go scan over the conversations on this topic, I guess. Treating a host IP as a /32 is pretty traditional in a lot of environments, though there's just as much justification for writing host addresses as the netmask of the network on which they reside, like so: 172.48.119.32/16.

And, on reflection, perhaps the interface to the library making it difficult to treat that as a host address is why it's annoying people.

Ah, well; Copious Free Time.

Google Wave ( Python and ipaddr.py )

Posted Jun 25, 2009 10:29 UTC (Thu) by kragil (guest, #34373) [Link]

I think in the long run something as flexible and open as Wave will consolidate a lot these communication channel problems to the point where devs use it predominantly. (And I know we need mutt integration for that to happen :)

Just one example: Maybe it would good for Wave "mailing lists" to implement a message scoring system where valuable messages can be dugg up and flames dugg down.
That would certainly improve signal to noise. IMHO Wave makes crowd sourcing much easier.

Python and ipaddr.py

Posted Jun 25, 2009 14:02 UTC (Thu) by engla (guest, #47454) [Link]

Is Python's organisation too federated? The BDFL acting too much on good faith?

For Python 3.0, release blocker to remove __cmp__ was forgot and had to be fixed in Python 3.0.1.

There is no quality control of the standard library, lots of modules come from nowhere, does batteries included also mean that random cruft is included as well? Certainly, an even, high standard of the Python library is more important than to include everything.

Python and ipaddr.py

Posted Jun 25, 2009 17:00 UTC (Thu) by larryr (guest, #4030) [Link]

Certainly, an even, high standard of the Python library is more important than to include everything.

I do not know which is more important, but I agree there is stuff included with Python which seems relatively mediocre in quality and interface design, for example the "multiprocessing" module. I also think the "library" section is getting crowded considering how little structure it has, like "curses.textpad" grouped with "os", which is separated from "os.path".

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds