|Did you know...?|
LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.
With its first alpha just released, Python 3.0 (aka Python 3000 or Py3k) is making progress, though a final release is still a year off. Py3k overhauls the language core, removing inconsistencies and other "warts", without maintaining compatibility with the 2.x version. Various standard Python idioms go by the wayside and it will take some getting used to.
One of the driving forces for Py3k is to handle unicode strings in a uniform way. In the 2.x series, unicode handling has bugs, especially when mixing encoded and unencoded text. The Py3k solution is to separate strings, which contain decoded text, and byte-strings which are binary data into two distinct types, str and bytes. Those types cannot be combined without converting one via the encode() and decode() methods. The drawback to this change is explained in the What's New in Python 3.0 document:
This also leads to a distinction that needs to be made when handling files. Files are either binary or text files, with text files requiring an encoding to be specified when they are opened. If the wrong type or encoding is given, I/O to the file may fail.
One very visible change – perhaps the most controversial – is eliminating the print statement, moving it to a function. The change is being made mostly for consistency, as there is no other language statement like print, but it also adds additional features. One can now specify a separator, line ending, and file directly, there is no need for the print >>sys.stderr, "error" syntax, instead that becomes print("error", file=sys.stderr). As the "What's new" document points out:
Another area that has changed significantly is the dict methods. The keys(), items(), and values() methods no longer return lists, so code that treats them that way will fail. They now return something called a "view" that references the dict directly, producing values as they are needed, much like an iterator. In addition, the has_key() boolean method has been removed, the in operator should be used instead.
There are lots of smaller changes that will catch the unwary. Many of the features removed have been deprecated for some time, but, for programmers who don't follow Python language development closely, they may surprise. The raise statement has different syntax, integer division no longer truncates, instead it returns a float (with // used to get the old behavior), xrange() has been removed, and so on. It adds up to a substantial pile of things to deal with when moving existing code to Python 3.
The migration from 2.x is being assisted by the development of Python 2.6, which is slated for release in April 2008. It will provide a Py3k warnings mode that complains at runtime when a feature is being used in a way that is incompatible. It will also have many of the new features enabled, either as __future__ imports or just added into the language if it doesn't conflict with 2.x syntax. The 2to3 tool is also being developed to translate 2.6 constructs into their 3.0 equivalents. The Python Enhancement Proposal (PEP) governing the Py3k plan (PEP 3000) gives an overview of how code can be maintained to run on both 2.6 and 3.0. It sounds somewhat painful, but incompatible language changes are never easy.
There is still plenty of work to be done, the final release of 3.0 is currently scheduled for August 2008. One of the bigger remaining chunks is a reorganization of the standard library namespace. PEP 3108 lays out the changes to be made, including removing older, unsupported, or rarely used modules, renaming modules to conform to the naming standard, merging the C and Python implementations of modules (i.e. cPickle goes away and is replaced with pickle). It cleans up what had become a bit of a mess over time.
All of these changes have not come about without some objections, both from those who think another incompatible "upgrade" is not warranted to those who think Py3k doesn't go far enough. One area that is not being changed, but is a source of frustration for some, is the "global interpreter lock" (GIL), which only allows one thread at a time to operate on any Python objects or call out to C language extensions. Especially with the advent of multi-core and multi-CPU systems, the lock is very restrictive, serializing most of the core language processing.
Guido van Rossum, Benevolent Dictator for Life (BDFL) of the Python language has been very open about addressing these concerns on his All Things Pythonic weblog. That doesn't mean he plans to change things, especially with regards to the GIL, but he puts together a well reasoned defense, mostly concerning the performance of the language with finer-grained locks. He is clearly not much of a fan of multi-threaded programming with its attendant race conditions, deadlocks, and other issues, but he is not opposed to efforts to remove the GIL either. As he points out, it is not inherent in the Python language, but is an attribute of the current language implementation, other implementations (Jython, IronPython) do not have the GIL.
There are fundamental changes in Python 3, it will be interesting to see how quickly it is adopted after being released. People learning Python won't need to learn Py3k for another two years or so, according to van Rossum, and should, instead, concentrate on 2.x (which means 2.5 until April). The unicode handling rework will probably be enough to get the increasing number of localized programs updated, but the rest of the changes are not terribly compelling. It is likely that there will be Python 2.x programs around for a long time to come.
Copyright © 2007, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds