By Jake Edge
September 12, 2007
With its first
alpha just released, Python 3.0 (aka Python 3000 or Py3k) is
making progress, though a final release is still a year off. Py3k overhauls
the language core, removing inconsistencies and other "warts", without
maintaining compatibility with the 2.x version. Various standard Python
idioms go by the wayside and it will take some getting used to.
One of the driving forces for Py3k is to handle unicode strings in a uniform
way. In the 2.x series, unicode handling has bugs, especially when mixing
encoded and unencoded text. The Py3k solution is to separate strings,
which contain decoded text, and byte-strings which are binary data into two
distinct types, str and bytes. Those types cannot be
combined without converting one via the encode() and decode()
methods. The drawback to this change is explained in the
What's New in
Python 3.0 document:
This means that pretty much all code that
uses Unicode, encodings or binary data in any way has to change.
This also leads to a distinction that needs to be made when handling
files. Files are either binary or text files, with text files requiring an
encoding to be specified when they are opened. If the wrong type or
encoding is given, I/O to the file may fail.
One very visible change – perhaps the most controversial –
is eliminating
the print statement, moving it to a function.
The change is being made
mostly for consistency, as there is no other language statement like
print, but it also adds additional features. One can now specify
a separator, line ending, and file directly, there is no need for the
print >>sys.stderr, "error" syntax, instead that becomes
print("error", file=sys.stderr).
As the "What's new" document points out:
Initially, you'll be finding yourself typing the old print x a lot in
interactive mode. Time to retrain your fingers to type print(x) instead!
Another area that has changed significantly is the dict methods.
The keys(), items(), and values() methods no longer
return lists, so code that treats them that way will fail. They now return
something called a "view" that references the dict directly,
producing values as they are needed, much like an iterator. In addition, the
has_key() boolean method has been removed, the in operator
should be used instead.
There are lots of smaller changes that will catch the unwary. Many of the
features removed have been deprecated for some time, but, for programmers who
don't follow Python language development closely, they may surprise. The
raise statement has different syntax, integer division no longer
truncates, instead it returns a float (with // used to get the old
behavior), xrange() has been removed, and so on. It adds up to a
substantial pile of things to deal with when moving existing code to Python 3.
The migration from 2.x is being assisted by the development of Python
2.6, which is slated for release in April 2008. It will provide a Py3k
warnings mode that complains at runtime when a feature is being used in a
way that is incompatible. It will also have many of the new features enabled,
either as __future__ imports or just added into the language if it
doesn't conflict with 2.x syntax. The 2to3 tool is also being
developed to translate 2.6 constructs into their 3.0 equivalents. The
Python Enhancement Proposal (PEP) governing the Py3k plan (PEP 3000) gives an overview of how code
can be maintained to run on both 2.6 and 3.0. It sounds somewhat painful,
but incompatible language changes are never easy.
There is still plenty of work to be done, the final release of 3.0 is
currently scheduled for August 2008. One of the bigger remaining chunks is
a reorganization of the standard library namespace.
PEP 3108 lays out the
changes to be made, including removing older, unsupported, or rarely used
modules, renaming modules to conform to the naming standard, merging the C
and Python implementations of modules (i.e. cPickle goes away and is
replaced with pickle). It cleans up what had become a bit of a mess
over time.
All of these changes have not come about without some objections, both
from those who think another incompatible "upgrade" is not warranted to
those who think Py3k
doesn't go far enough. One area that is not being changed, but is a source of frustration for some,
is the "global interpreter lock" (GIL), which only allows one thread at a
time to operate on any Python objects or call out to C language extensions.
Especially with the advent of multi-core and multi-CPU systems, the lock is
very restrictive, serializing most of the core language processing.
Guido van Rossum, Benevolent Dictator for Life (BDFL) of the Python
language has been very open about addressing these concerns on his All Things
Pythonic weblog. That doesn't mean he plans to change things,
especially with regards to the GIL, but he puts together a well
reasoned defense, mostly concerning the performance of the language
with finer-grained locks. He is clearly not much of a fan of
multi-threaded programming with its attendant race conditions, deadlocks,
and other issues, but he is not opposed to efforts to remove the GIL
either. As he points out, it is not inherent in the Python language, but
is an attribute of the current language implementation, other
implementations (Jython, IronPython) do not have the GIL.
There are fundamental changes in Python 3, it will be interesting to see
how quickly it is adopted after being released. People learning Python
won't need to learn Py3k for another two years or so, according to van
Rossum, and should, instead, concentrate on 2.x (which means 2.5 until April).
The unicode handling rework will probably be enough to get the increasing
number of localized programs updated, but the rest of the changes are not
terribly compelling. It is likely that there will be Python 2.x programs
around for a long time to come.
(
Log in to post comments)