Moving to Python 3

February 9, 2011

This article was contributed by Ian Ward

Python 3.0 was released at the end of 2008, but so far only a relatively small number of packages have been updated to support the latest release; the majority of Python software still only supports Python 2. Python 3 introduced changes to Unicode and string handling, module importing, integer representation and division, print statements, and a number of other differences. This article will cover some of the changes that cause the most problems when porting code from Python 2 to Python 3, and will present some strategies for managing a single code base that supports both major versions.

The changes that made it into Python 3 were originally part of a plan called "Python 3000" as sort of a joke about language changes that could only be done in the distant future. The changes made up a laundry list of inconsistencies and inconvenient designs in the Python language that would have been really nice to fix, but had to wait because fixing them meant breaking all existing Python code. Eventually the weight of all the changes led the Python developers to decide to just fix the problems with a real stable release, and accept the fact that it will take a few years for most packages and users to make the switch.

So what's the big deal?

The biggest change is to how strings are handled in Python 3. Python 2 has 8-bit strings and Unicode text, whereas Python 3 has Unicode text and binary data. In Python 2 you can play fast and loose with strings and Unicode text, using either type for parameters and conversion is automatic when necessary. That's great until you get some 8-bit data in a string and some function (anywhere — in your code or deep in some library you're using) needs Unicode text. Then it all falls apart. Python 2 tries to decode strings as 7-bit ASCII to get Unicode text leaving the developer, or worse yet the end user, with one of these:

    Traceback (most recent call last):
    ...
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 3: \
                        ordinal not in range(128)

In Python 3 there are no more automatic conversions, and the default is Unicode text almost everywhere. While Python 2 treats 'all\xf4' as an 8-bit string with four bytes, Python 3 treats the same literal as Unicode text with U+00F4 as the fourth character.

Files opened in text mode (the default, including for sys.stdin, sys.stdout, and sys.stderr) in Python 3 return Unicode text from read() and expect Unicode text to be passed to write(). Files opened in binary mode operate on binary data only. This change affects Python users in Linux and other Unix-like operating systems more than Windows and Mac users — files in Python 2 on Linux that are opened in binary mode are almost indistinguishable from files opened in text mode, while Windows and Mac users have been used to Python at least munging their line breaks when in text mode.

This means that much code that used to "work" (where work is defined for uses with ASCII text only) is now broken. But once that code is updated to properly account for which inputs and outputs are encoded text and which are binary, it can then be used comfortably by people whose native languages or names don't fit in ASCII. That's a pretty nice result.

Python 3's bytes type for binary data is quite different from Python 2's 8-bit strings. Python 2.6 and later have defined bytes to be the same as the str type, which a little strange because the interface has changed significantly:

    >>> bytes([2,3,4]) # Python 2
    '[2, 3, 4]'
    >>> [x for x in 'abc']
    ['a', 'b', 'c']

In Python 3 b'' is used for byte literals:

    >>> bytes([2,3,4]) # Python 3
    b'\x02\x03\x04'
    >>> [x for x in b'abc']
    [97, 98, 99]

Python 3's byte type can be treated like an unchanging list with values between 0 and 255. That's convenient for doing bit arithmetic and other numeric operations common to dealing with binary data, but it's quite different from the string-of-length-1 Python 2 programmers expect.

Integers have changed as well. There is no distinction between long integers and normal integers and sys.maxint is gone. Integer division has changed too. Anyone with a background in Python (or C) will tell you that:

    >>> 1/2
    0
    >>> 1.0/2
    0.5

But no longer. Python 3 returns 0.5 for both expressions. Fortunately Python 2.2 and later have an operator for floor division (//). Use it and you can be certain of an integer result.

The last big change I'll point out is to comparisons. In Python 2 comparisons (<, <=, >=, >) are always defined between all objects. When no explicit ordering is defined then all the objects of one type will either be arbitrarily considered greater or less than all the objects of another type. So you could take a list with a mix of types, sort it, and all the different types will be grouped together. Most of the time though, you really don't want to order different types of objects and this feature just hides some nasty bugs.

Python 3 now raises a TypeError any time you compare objects with incompatible types, as it should. Note that equality (==, !=) is still defined for all types.

Module importing has changed. In Python 2 the directory containing the source file is searched first when importing (called a "relative import"), then the directories in the system path are tried in order. In Python 3 relative imports must be made explicit:

    from . import my_utils

The print statement has become a function in Python 3. This Python 2 code that prints a string to sys.stderr with a space instead of a newline at the end:

    import sys
    print >>sys.stderr, 'something bad happened:',

becomes:

    import sys
    print('something bad happened:', end=' ', file=sys.stderr)

These are just some of the biggest changes. The complete list is here.

That list is huge. How do I deal with all that?

Fortunately a large number of the little incompatibilities are taken care of by the 2to3 tool that ships with Python. 2to3 takes Python 2 source code and performs some automated replacements to prepare the code to run in Python 3. Print statements become functions, Unicode text literals drop their "u" prefix, relative imports are made explicit, and so on.

Unfortunately the rest of the changes need to be made by hand.

It is reasonable to maintain a single code base that works across Python 2 and Python 3 with the help of 2to3. In the case of my library "Urwid" I am targeting Python 2.4 and up, and this is part of the compatibility code I use. When you really have to write code that takes different paths for Python 2 and Python 3 it's nice to be clear with an "if PYTHON3:" statement:

    import sys
    PYTHON3 = sys.version_info >= (3, 0)

    try: # define bytes for Python 2.4, 2.5
        bytes = bytes
    except NameError:
        bytes = str
    
    if PYTHON3: # for creating byte strings
        B = lambda x: x.encode('latin1')
    else:
        B = lambda x: x

String handling and literal strings are the most common areas that need to be updated. Some guidelines:

Use Unicode literals (u'') for all literal text in your source. That way your intention is clear and behaviour will be the same in Python 3 (2to3 will turn these into normal text strings).
Use byte literals (b'') for all literal byte strings or the B() function above if you are supporting versions of Python earlier than 2.6. B() uses the fact that the first 256 code points in Unicode map to Latin-1 to create a binary string from Unicode text.
Use normal strings ('') only in cases where 8-bit strings are expected in Python 2 but Unicode text is expected in Python 3. These cases include attribute names, identifiers, docstrings, and __repr__ return values.
Document whether your functions accept bytes or Unicode text and guard against the wrong type being passed in (eg. assert isinstance(var, unicode)), or convert to Unicode text immediately if you must accept both types.

Clearly labeling text as text and binary as binary in your source serves as documentation and may prevent you from writing code that will fail when run under Python 3.

Handling binary data across Python versions can be done a few ways. If you replace all individual byte accesses such as data[i] with data[i:i+1] then you will get a byte-string-of-length-1 in both Python 2 and Python 3. However, I prefer to follow the Python 3 convention of treating byte strings as lists of integers with some more compatibility code:

    if PYTHON3: # for operating on bytes
        ord2 = lambda x: x
        chr2 = lambda x: bytes([x])
    else:
        ord2 = ord
        chr2 = chr

ord2 returns the ordinal value of a byte in Python 2 or Python 3 (where it's a no-op) and chr2 converts back to a byte string. Depending on how you are processing your binary data, it might be noticeably faster to operate on the integer ordinal values instead of byte-strings-of-length-1.

Python "doctests" are snippets of test code that appear in function, class and module documentation text. The test code resembles an interactive Python session and includes the code run and its output. For simple functions this sort of testing is often enough, and it's good documentation. Doctests create a challenge for supporting Python 2 and Python 3 from the same code base, however.

2to3 can convert doctest code in the same way as the rest of the source, but it doesn't touch the expected output. Python 2 will put an "L" at the end of a long integer output and a "u" in front of Unicode strings that won't be present in Python 3, but print-ing the value will always work the same. Make sure that other code run from doctests outputs the same text all the time, and if you can't you might be able to use the ELLIPSIS flag and ... in your output to paper over small differences.

There are a number easy changes you need to make as well, including:

Use // everywhere you want floor division (mentioned above).
Derive exception classes from BaseException.
Use k in my_dict instead of my_dict.has_key(k).
Use my_list.sort(key=custom_key_fn) instead of my_list.sort(custom_sort).
Use distribute instead of Setuptools.

There are two additional resources that may be helpful: Porting Python Code to 3.0 and Writing Forwards Compatible Python Code.

So if I do all that, what's in it for me?

Python 3 is unarguably a better language than Python 2. Many people new to the language are starting with Python 3, particularly users of proprietary operating systems. Many more current Python 2 users are interested in Python 3 but are held back by the code or a library they are using.

By adding Python 3 support to an application or library you help:

make it available to the new users just starting with Python 3
encourage existing users to adopt it, knowing it won't stop them from switching to Python 3 later
clean up ambiguous use of text and binary data and find related bugs

And as a little bonus that software can then be listed among the packages with Python 3 support in the Python Packaging Index, one click from the front page.

Many popular Python packages haven't yet made the switch, but it's certainly on everyone's radar. In my case I was lucky. Members of the community already did most of the hard work porting my library to Python 3, I only had to update my tests and find ways to make the changes work with old versions of Python as well.

There is currently a divide in the Python community because of the significant differences between Python 2 and Python 3. But with some work, that divide can be bridged. It's worth the effort.

Index entries for this article
GuestArticles	Ward, Ian

Moving to Python 3

Posted Feb 9, 2011 20:26 UTC (Wed) by midg3t (guest, #30998) [Link] (3 responses)

This reminds me of the IPv4-to-IPv6 transition: lots of heel dragging because it's "too hard" or "doesn't affect me".

Moving to Python 3

Posted Feb 9, 2011 23:22 UTC (Wed) by euske (guest, #9300) [Link] (2 responses)

And just like IPv6, it will only add confusion to the users. I like many design choices made by Python3 people, but branding Python3 as the "next version" of Python2 was a big mistake. One should give it a different name when two languages are incompatible.

Moving to Python 3

Posted Feb 9, 2011 23:32 UTC (Wed) by bronson (subscriber, #4806) [Link]

Same only moreso for Perl6.

Moving to Python 3

Posted Feb 10, 2011 6:52 UTC (Thu) by flewellyn (subscriber, #5047) [Link]

>One should give it a different name when two languages are incompatible.

They did. It's called "Python3". The old one is called "Python2".

It's analogous in some ways to ALGOL 60 vs ALGOL 68. Same heritage, but different languages.

catch 22: it's not just too hard

Posted Feb 9, 2011 20:41 UTC (Wed) by amtota (guest, #4012) [Link] (4 responses)

I can't bear the costs of maintaining two versions of my software, so I don't... and I stick with python2.x

And I suspect that is the main reason why so few have made the jump: there is no real incentive. There are no killer features (for most) and there is a huge cost associated with supporting python3.x
If there was a way to support both versions in the same codebase cleanly, I would have done it by now, but there isn't. (exception handling is one sticky point, there are others)

To be honest, I don't see any easy way out of this one. Complaining that authors don't make the porting effort is barking up the wrong tree.

catch 22: it's not just too hard

Posted Feb 9, 2011 21:17 UTC (Wed) by iabervon (subscriber, #722) [Link] (1 responses)

I'm not seeing anything related to exception handling where you can't make the same code work with both 2.x and 3.x, although there are certainly changes from traditional style needed. What in particular are you thinking of?

catch 22: it's not just too hard

Posted Feb 9, 2011 22:53 UTC (Wed) by mmcgrath (guest, #44906) [Link]

> I'm not seeing anything

in fairness time is invisible.

catch 22: it's not just too hard

Posted Feb 9, 2011 23:24 UTC (Wed) by Webexcess (guest, #197) [Link] (1 responses)

I'm also curious what changes to exception handling are causing you trouble. The minor change to the except syntax is taken care of by 2to3 and using BaseException as the base class of your exceptions works on versions at least as early as 2.4.

It does take a lot of work to do, and you have to restrict your programming style to maintain compatibility. If I didn't have help I'm sure I wouldn't have made much progress at it. More and more projects do support both major versions from the same code base. It certainly can be done, it will just be ugly for a while.

upgrade to ugly code?

Posted Feb 11, 2011 13:16 UTC (Fri) by amtota (guest, #4012) [Link]

> It certainly can be done, it will just be ugly for a while.
And here lies the heart of my complaint: what sort of "upgrade" makes your code harder to read and maintain?

the list on PyPI is far from complete

Posted Feb 9, 2011 22:30 UTC (Wed) by zuki (subscriber, #41808) [Link] (1 responses)

Packages appear as Python compatible there only if the developers
remember to add the right classifier (Programming Language::Python::3).
Many don't. E.g. decorator has been py3 compatible for years. The
same is true for lots of other packages.

The major packages which kept a lot of software back, were numpy and
scipy. But numpy officially supports python 3, and scipy, currently in -rc, will soon.

There's no need to have majority of packages supporting python 3, just a few percent of the most important ones are enough. It seems that we are quite
close to this amount.

the list on PyPI is far from complete

Posted Feb 10, 2011 3:36 UTC (Thu) by maney (subscriber, #12630) [Link]

Ah, the word processor delusion. You know how that goes: sure, no one uses more than 10% of the features that MS Word provides; but someone uses each of them (or darned near), so a replacement that only covers the most-used 10% will not appeal to a majority of users.

All statistics in this post were made up or borrowed from half-remembered sources who almost certainly made them up. Nevertheless, MS Word is still the 800 pound gorilla...

pygtk+ would be nice

Posted Feb 10, 2011 1:24 UTC (Thu) by tstover (guest, #56283) [Link]

I remember when I just started doing allot of mixed python and C work, and then I got hit in the face with needing all the library/toolkit wrappers on the python side to also support python3. I've got a lot if #ifdefs just waiting to be used someday..

Maybe a gtk3 python3 wrapper will clean out some legacy baggage.

Big problem: Confusing implementation with spec. One implementation should support BOTH

Posted Feb 10, 2011 2:46 UTC (Thu) by dwheeler (guest, #1216) [Link] (4 responses)

The real problem here is that the *spec* is being confused with its *implementation*. If there was a single implementation that accepted both Python2 and Python3, and let Python3 programs call Python2 programs, this would be a non-event. In fact, it'd be easy to do the transition, and we'd be mostly done.

Instead, there's one program that implements Python3, and a completely separate and incompatible engine that runs Python2.

Of course, this suggests a way out: Expand the Python2 system so it can run arbitrary Python3, and make it possible for Python3 programs to seamlessly call Python2. Then the problem would disappear.

Big problem: Confusing implementation with spec. One implementation should support BOTH

Posted Feb 10, 2011 3:47 UTC (Thu) by maney (subscriber, #12630) [Link] (3 responses)

I agree almost entirely, but I think there's one quibble that has to be made: the real problem is that the spec *is* the (CPython) implementation. Perhaps that's beginning to change, finally?

Oh, and you gloss over the issue of how the omnivorous compiler knows which language version to treat a given module as. Sure, tagging could be added, but then you're not really doing seamless interoperability after all. It would be close enough to make me smile!

BTW, I just saw a brief mention of what sounds like it might be real, workable versioning for compiled (.pyc and .so) modules in a new or forthcoming release of 3.x. That might not be a killer feature, but it's the first thing in 3 that's more than "that's nice, wish Guido really did have a time machine so the original mistake hadn't been made" to me.

Big problem: Confusing implementation with spec. One implementation should support BOTH

Posted Feb 10, 2011 17:44 UTC (Thu) by zlynx (guest, #2285) [Link] (2 responses)

I see a simple solution that could be implemented as a shell script, even.

Make a "python" interpreter that reads the script file and either looks for a Python2 marker of some kind or analyzes the file for Python3 syntax. It then calls the correct real interpreter.

Considering the speed (slow) of Python applications, this step would not add too much time.

Big problem: Confusing implementation with spec. One implementation should support BOTH

Posted Feb 10, 2011 18:53 UTC (Thu) by foom (subscriber, #14868) [Link] (1 responses)

That won't help you at all when you want to use libraries. Having "import foo" work when you're writing in the Py3 dialect, and foo is written in the Py2 dialect would have been a major win.

Big problem: Confusing implementation with spec. One implementation should support BOTH

Posted Feb 10, 2011 19:56 UTC (Thu) by zlynx (guest, #2285) [Link]

Writing a giant combo cPython interpreter doesn't sound like a winner to me.

The solution might be to convert cPython into IronPython everywhere and compile Python to .NET / Mono IL. Then Python3 could call Python2 code in the same way that C#, F#, C++ and Visual Basic can all call each other.

Or to be more acceptable to anti-Microsoft people, use the Parrot virtual machine, if that ever becomes usable. Another option might be LLVM. A problem with LLVM is that it doesn't specify a reflection, object, function and data sharing scheme in the same way that .NET does. Java might be another virtual machine that could be made to work.

Moving to Python 3

Posted Feb 10, 2011 5:19 UTC (Thu) by nevyn (guest, #33129) [Link] (10 responses)

> Python 2 tries to decode strings as 7-bit ASCII to get Unicode text

Which is like a one line fix to make, to default the system locale to utf-8 in py-2 instead of "ascii" ... almost instantly removing the need for checking every $%#%$# string operation in your app. ... but hey, let's pretend it's 1985 instead and write an incompatible language.

> Python 3 is unarguably a better language than Python 2.

Really? Unarguably? The fact that os.listdir() is utterly broken on Linux isn't any kind of hint that maybe, just maybe, there might be some problems? Or maybe people might find _some_ argument in the fact that in the two years since py-3 (3.0 was released Dec. 2008), _no_ Linux distribution has announced a timeline to move to py3k as the default python implementation.

How many apps. on rawhide or unstable run against the py-3 stack, again? About as many as the perl apps. are running on perl6?

First perl kills itself, and now this ... it's enough to make you go back to C ... or even look at Java again.

Moving to Python 3

Posted Feb 10, 2011 6:10 UTC (Thu) by mrjoel (subscriber, #60922) [Link]

Arch is actually moving to py3 as the default: http://www.archlinux.org/news/python-is-now-python-3/

Moving to Python 3

Posted Feb 10, 2011 14:39 UTC (Thu) by Webexcess (guest, #197) [Link] (6 responses)

You didn't provide a link for the problem, but is this what you're looking for?

    os.listdir(b'.') # no decoding for me, thanks

Moving to Python 3

Posted Feb 10, 2011 15:38 UTC (Thu) by nevyn (guest, #33129) [Link] (5 responses)

I'm aware that you can call it (directly) as os.listdir(bytes(mypath)), but this has a number of problems:

1. When calling listdir() directly, the default is broken (and in a non-obvious way) ... so everybody has to remember "Oh, yeh, you have to call os.listdir() in this speciail way or it's broken".

2. It assumes people are calling os.listdir() directly ... which is _far_ from the normal case. So now, to do the same hack, every API that eventually calls listdir() will have to implement/debug the bytes vs. unicode input vs. output thing ... and every caller of those APIs will have to remember "Oh, yeh, you have to call foo_API() in this speciail way or it's broken".

3. It's still not obvious what you _do_ with those bytes, because the reason listdir() doesn't work "normally" is that it's model of the Universe doesn't match reality. Basically you can't load a POSIX filename, and print "Error: open(%s): %s" ... and this problem is much bigger than POSIX filenames, it's just that's the most glaringly broken problem that people see. So the whole thing is a huge clue that "Unicode" is not any better in py-3 than it is in py-2 (which is to say, it's completely broken).

Moving to Python 3

Posted Feb 12, 2011 0:13 UTC (Sat) by cmccabe (guest, #60281) [Link] (4 responses)

There was a thread about non-UTF8 filenames on LWN a little while back. (One of many, I'm sure.) The consensus seemed to be that they were quite useless. They tend to break the mental model of programmers too. For example, programmers tend to assume that printing a filename to stdout is *not* a security vulnerability. But if that filename contains control characters... surprise! It can hack your terminal emulator.

Python has a pretty long history of "forcing" what it believes to be the correct behavior on its users. It even tells you how to use whitespace. I am not surprised at all that they ignore non-UTF filenames. Frankly, it's a good decision.

Moving to Python 3

Posted Feb 12, 2011 1:50 UTC (Sat) by foom (subscriber, #14868) [Link]

They don't ignore random-byte filenames. Filenames are decoded from bytes to unicode with the *locale encoding* (not always utf8), and the "surrogateescape" error handler. That allows roundtripping filenames through unicode even if they're not in the proper encoding at all (although in that case they'll be garbage).

http://www.python.org/dev/peps/pep-0383/

Moving to Python 3

Posted Feb 15, 2011 1:24 UTC (Tue) by yuhong (guest, #57183) [Link] (2 responses)

Yea, it is not Python's fault that historically there has been no standard character encoding beyond ASCII for Unix filenames, in contrast to Windows LFN filenames and Mac HFS+ filenames, both of which used UTF-16 from the beginning.

Moving to Python 3

Posted Feb 15, 2011 14:32 UTC (Tue) by nevyn (guest, #33129) [Link] (1 responses)

> Yea, it is not Python's fault [that unix doesn't look like windows]

It is exactly python's fault that it pretends unix is like windows, when it isn't.

Moving to Python 3

Posted Feb 15, 2011 14:52 UTC (Tue) by foom (subscriber, #14868) [Link]

> It is exactly python's fault that it pretends unix is like windows, when it isn't.

Except that python doesn't actually do that, see comment above...

Moving to Python 3

Posted Feb 10, 2011 22:20 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link] (1 responses)

As far as Rawhide is concerned, lot more than Perl 6

$yum search python3

Loaded plugins: presto, refresh-packagekit
========== N/S Matched: python3 ===========
dreampie-python3.noarch : Support for running the python3 interpreter from
: dreampie
python3-cairo-devel.i686 : Libraries and headers for python3-cairo
python3-cairo-devel.x86_64 : Libraries and headers for python3-cairo
python3-decorator.noarch : Module to simplify usage of decorators in python3
python3-smbc.x86_64 : Python3 bindings for libsmbclient API from Samba
python3-stomppy.noarch : Python stomp client for messaging for python3
dpm-python3.x86_64 : Disk Pool Manager (DPM) python bindings
lfc-python3.x86_64 : LCG File Catalog (LFC) python bindings
libselinux-python3.x86_64 : SELinux python 3 bindings for libselinux
libsemanage-python3.x86_64 : semanage python 3 bindings for libsemanage
python3.i686 : Version 3 of the Python programming language aka Python 3000
python3.x86_64 : Version 3 of the Python programming language aka Python 3000
python3-PyQt4.i686 : Python 3 bindings for Qt4
python3-PyQt4.x86_64 : Python 3 bindings for Qt4
python3-PyQt4-devel.i686 : Python 3 bindings for Qt4
python3-PyQt4-devel.x86_64 : Python 3 bindings for Qt4
python3-PyYAML.x86_64 : YAML parser and emitter for Python
python3-babel.noarch : Library for internationalizing Python applications
python3-beaker.noarch : WSGI middleware layer to provide sessions
python3-bpython.noarch : Fancy curses interface to the Python 3 interactive
: interpreter
python3-cairo.x86_64 : Python 3 bindings for the cairo library
python3-chardet.noarch : Character encoding auto-detection in Python
python3-cherrypy.noarch : Pythonic, object-oriented web development framework
python3-coverage.x86_64 : Code coverage testing module for Python 3
python3-debug.i686 : Debug version of the Python 3 runtime
python3-debug.x86_64 : Debug version of the Python 3 runtime
python3-deltarpm.x86_64 : Python bindings for deltarpm
python3-devel.i686 : Libraries and header files needed for Python 3 development
python3-devel.x86_64 : Libraries and header files needed for Python 3
: development
python3-gobject.i686 : Python 3 bindings for GObject and GObject Introspection
python3-gobject.x86_64 : Python 3 bindings for GObject and GObject Introspection
python3-httplib2.noarch : A comprehensive HTTP client library
python3-inotify.noarch : Monitor filesystem events with Python under Linux
python3-jinja2.noarch : General purpose template engine
python3-libs.i686 : Python 3 runtime libraries
python3-libs.x86_64 : Python 3 runtime libraries
python3-lxml.x86_64 : ElementTree-like Python 3 bindings for libxml2 and libxslt
python3-mako.noarch : Mako template library for Python 3
python3-markupsafe.x86_64 : Implements a XML/HTML/XHTML Markup safe string for
: Python
python3-minimock.noarch : The simplest possible mock library
python3-mpi4py-mpich2.x86_64 : Python bindings of MPI, MPICH2 version
python3-mpi4py-openmpi.x86_64 : Python bindings of MPI, Open MPI version
python3-numpy.x86_64 : A fast multidimensional array facility for Python
python3-numpy-f2py.x86_64 : f2py for numpy
python3-paste.noarch : Tools for using a Web Server Gateway Interface stack
python3-ply.noarch : Python Lex-Yacc
python3-postgresql.x86_64 : Connect to PostgreSQL with Python 3
python3-psutil.noarch : A process utilities module for Python 3
python3-pygments.noarch : A syntax highlighting engine written in Python 3
python3-pyke.noarch : Knowledge-based inference engine
python3-pyparsing.noarch : An object-oriented approach to text processing
: (Python 3 version)
python3-setuptools.noarch : Easily build and distribute Python 3 packages
python3-sip.i686 : SIP - Python 3/C++ Bindings Generator
python3-sip.x86_64 : SIP - Python 3/C++ Bindings Generator
python3-sip-devel.i686 : Files needed to generate Python 3 bindings for any C++
: class library
python3-sip-devel.x86_64 : Files needed to generate Python 3 bindings for any
: C++ class library
python3-sleekxmpp.noarch : Flexible XMPP client/component/server library for
: Python
python3-smbpasswd.x86_64 : Python SMB Password Hst Generator Module for Python 3
python3-sqlalchemy.x86_64 : Modular and flexible ORM library for python
python3-tempita.noarch : A very small text templating language
python3-test.i686 : The test modules from the main python 3 package
python3-test.x86_64 : The test modules from the main python 3 package
python3-tkinter.i686 : A GUI toolkit for Python 3
python3-tkinter.x86_64 : A GUI toolkit for Python 3
python3-tools.i686 : A collection of tools included with Python 3
python3-tools.x86_64 : A collection of tools included with Python 3
python3-zmq.x86_64 : Software library for fast, message-based applications

Moving to Python 3

Posted Feb 10, 2011 23:50 UTC (Thu) by dave_malcolm (subscriber, #15013) [Link]

FWIW, we're tracking Fedora's Python 3 status in more detail here:
https://fedoraproject.org/wiki/Python3#Porting_status
(and this may be of interest to other distributions looking to build out a Python 3 stack)

Moving to Python 3

Posted Feb 10, 2011 6:39 UTC (Thu) by ras (subscriber, #33059) [Link] (13 responses)

I'll be avoiding Python 3 as long as I can. The adoption of UCS2 for strings is a major stuff-up. There was a good way to improve string handling in Python 3. Drop UCS2 strings entirely. Their introduction was a mistake.

I think the mistake arose from a common misconception. It seems popular to equate UCS2 with unicode support. This is an abuse of terminology. The old strings represented unicode perfectly well as UTF-8. The new strings style strings use UCS2 instead. That may have been a good idea when Java introduced it, because back then unicode only occupied one code plane in UCS2. Now UCS2, just like UTF-8 must use multibyte sequences for some unicode code points. So the one good point is gone.

The major downside remains however: UCS2 is almost never found in the real world. So you spend 1/2 your time converting between whatever the outside world is using and UCS2, and then back again. The lines of code increase, memory requirements almost double, the execution time increases and in my experience the bugs sky rocket.

Moving to Python 3

Posted Feb 10, 2011 7:26 UTC (Thu) by peregrin (guest, #56601) [Link] (4 responses)

"UCS2 is almost never found in the real world."

Windows 2000/XP/2003/Vista/2008/7 is almost never found in the real world?

Moving to Python 3

Posted Feb 10, 2011 11:52 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (2 responses)

The Win32 Unicode APIs were retconned to be UTF-16, with older versions (I think NT 4 and perhaps Windows 2000) simply "not supporting" planes other than the BMP (ie characters U+10000 and beyond)

So, no, Windows isn't an example of UCS2, and hasn't been for many years.

Moving to Python 3

Posted Feb 10, 2011 17:10 UTC (Thu) by marcH (subscriber, #57642) [Link] (1 responses)

I think a lot of programs support UCS-2 only. I mean they would fail in various ways as soon as a supplementary character comes. How many Java programs do you expect to use Java.lang.String.codePointCount() ?

In this sense, UCS-2 is extremely often found in the real world.

UTF family

Posted Feb 11, 2011 4:01 UTC (Fri) by tialaramex (subscriber, #21167) [Link]

I expect a lot of Java programs (and other programs) work fine with supplementary characters and myraid other thing so long as they leave anything clever to software written by someone else (or more likely a team of somebody elses) who actually knows lots about text.

What were you imagining they should be using java.lang.String.codePointCount() for ? Text is hard, like I said, and a count of Unicode code points is rarely what you need.

Examples of things which are assigned one or more Unicode code points: A harmless, invisible and ignorable marker; indication that subsequent neutral text is intended to be displayed right-to-left; the cedilla accent on a character; a lowercase x; a vertical tab; indication that a non-fatal error occurred in some previous processing.

Moving to Python 3

Posted Feb 10, 2011 11:57 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

Oh, and in terms of interoperability, both UCS2 and UTF-16 are a big problem. Nobody wants to add BOMs everywhere, but if you don't you have no idea what you're looking at. So you end up with even products built by Microsoft entirely with Microsoft technologies (and thus heavily invested in 16-bit code units) communicating in UTF-8 anyway.

As the original poster said (even if their terminology is wrong in a bunch of places) UCS-2 looked like it might be clever in the mid-1990s. Once it became clear that Unicode's hyperspace would be populated, and UCS2 wasn't capable of handling that, the choice was no longer between UCS2 and UTF8 (where UCS2 delivers some intuitive-seeming properties, although not as many as sometimes claimed) but between UTF8 and UTF16, where UTF16 is completely horrible.

Moving to Python 3

Posted Feb 10, 2011 9:33 UTC (Thu) by rweir (subscriber, #24833) [Link] (6 responses)

>The adoption of UCS2 for strings is a major stuff-up.

???

all python3 did was switch what the 'str' type refers to, from 'bytes' to 'abstract sequence of unicode codepoints'. as far as I know, python 2 and python 3 both support ucs-2 or ucs-4 as the concrete-you-almost-never-have-to-care representation for unicode strings.

Moving to Python 3

Posted Feb 10, 2011 11:03 UTC (Thu) by cortana (subscriber, #24596) [Link] (5 responses)

Who cares how Python 3 encodes its unicode strings internally? If you cared about the extra memory overhead of storing strings in UTF-16 v.s. UTF-8 (which is not always a net overhead, BTW), and the extra time taken to convert from UTF-8 to UTF-16, then you are free to keep using the 'bytes' type and store UTF-8 data in it. But if you really cared that much, surely you would be using C, C++, D, etc.

Leaky abstractions

Posted Feb 10, 2011 12:05 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (2 responses)

So long as the programmer never knows, I don't care.

But in my experience it's surprisingly hard to prevent this abstraction from leaking. Text is really tricky, in fact one of the main lessons from the Unicode project is that text is way trickier than anyone had really thought before.

For example, what happens with canonicalisation in Python?

(You will not be surprised to know that the answer in C is generally "C does not care about canonicalisation, it's all byte strings to us")

Leaky abstractions

Posted Feb 10, 2011 17:11 UTC (Thu) by marcH (subscriber, #57642) [Link] (1 responses)

Except for wchar_t?

Leaky abstractions

Posted Feb 11, 2011 4:06 UTC (Fri) by tialaramex (subscriber, #21167) [Link]

The standard conveniently permits that wchar_t can be char, allowing you to ignore it altogether :D

Moving to Python 3

Posted Feb 17, 2011 4:14 UTC (Thu) by spitzak (guest, #4593) [Link] (1 responses)

In the real world, text that is "UTF-8" can contain ERRORS (ie bytes that are not in UTF-8 order). They CANNOT be converted to UCS-2 or UTF-16 or UTF-32 or whatever Python thinks should be used. Any such conversion will either throw an error (resulting in a denial-of-service bug) or will be lossy (resulting in who knows what security or functionality loss bug).

The real result, in Python 3 and 2 and on Windows and virtually everywhere else where the "wchar" madness infects designers is that any programmers working with text where the UTF-8 might contain an error is that they resort to destroying the UTF-8 support by saying the text is actually ASCII or ISO-8859-1 or whatever (sometimes they double-UTF-8 encode it which is the same as ISO-8859-1). Basically the question is whether to eliminate the ability to see even the ASCII letters in the filenames versus the ability to see some rarely-used foreign letters in the cases where they happen to be encoded correctly. If you don't believe me then you have not looked at any recent applications that read text filenames, even on Windows. Or just look at the idiotic behavior of Python 2, described right here in this article!

Congratulations, your belief in new encodings has set I18N back 20 years. We will never see filenames that work across systems and support Unicode. Never ever ever, because of your stubborn belief that you are "right".

The real answer:

Text is a stream of 8 bit bytes. In about 1% of the cases you will care about any characters other than a tiny number of ASCII ones such as NUL and CR. You will then have to decode it, using an INTERATOR that steps through the string, and is capable of returning Unicode code points, Unicode composed characters, and clear lossless indications of encoding errors.

Strings in source files should assume UTF-8 encoding. If the source file itself is UTF-8 this is trivial. But "\u1234" should produce the 3-byte UTF-8 encoding of U+1234. "\xNN" should produce a byte with that value, despite the fact that this can produce an invalid UTF-8 encoding. Printing UTF-8 should never throw an error, it should produce error boxes for encoding errors, one for each byte. On backwards systems where some idiot thought "wchar" was a hot idea, you may need to convert to it, in which case encoding errors should translate to U+DCxx where xx is the byte's value (these are errors in UTF-16 as well), but conversion back from UTF-16 will be lossy as these will turn back into 3 UTF-8 bytes.

Moving to Python 3

Posted Feb 17, 2011 14:27 UTC (Thu) by foom (subscriber, #14868) [Link]

+1000 to the sentiment. Decoding bytes to codepoints is a total waste of time, almost always, and just adds unnecessary complication to the system.

However, as I said in http://lwn.net/Articles/426906/ python3 *does* do non-lossy decoding/encoding for filenames with random bytes in them.

Moving to Python 3

Posted Feb 11, 2011 0:00 UTC (Fri) by dave_malcolm (subscriber, #15013) [Link]

FWIW, PEP 393 may ameliorate some of these issues (still at the planning stage, though)

Moving to Python 3

Posted Feb 10, 2011 7:10 UTC (Thu) by ssmith32 (subscriber, #72404) [Link] (25 responses)

Hmm.. what I'd wish they done in py3 was fix the GIL in CPython. I'd switch for that.

Yes, I could be misinformed - I haven't kept up because:

I was a huge python fan until a few years ago when I started making bigger and bigger apps in it, and threading them more and more. And I was not getting the performance gain from multi-core machines that I would get with other languages.

Quick googling seems to indicate it's still a known issue with the fix being "use processes". Perhaps viable, but not for me. Threads can be a mess, but there are some apps that fit the thread model pretty well.

I still use python for little scripts, but, at that point, the version rarely matters.

Moving to Python 3

Posted Feb 10, 2011 15:48 UTC (Thu) by fandingo (guest, #67019) [Link] (14 responses)

What's wrong with the Multiprocess library. There are all kinds of things like mutex locks and process safe data structures (like Queue) that will work across processes.
Using multiprocess, it's possible for a Python programmer to use "multi-threaded" techniques, with the underlying implementation being processes instead of threads.

I've been usign multiprocess with Python3 for a while, and I don't find that there are any limitations. What specific problems keep you from using it?

The GIL needs to go, but there's a lot of ways to work around it.

Moving to Python 3

Posted Feb 10, 2011 23:58 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (12 responses)

>What's wrong with the Multiprocess library.

It's not threading. My friend write a SIP switch in Python and the major bottleneck is SIP message parsing. Right now they use a thread pool and C library to parse them in several threads.

Pickling/unpickling results to pass them between processes would defeat the whole idea.

Moving to Python 3

Posted Feb 11, 2011 1:21 UTC (Fri) by cmccabe (guest, #60281) [Link] (10 responses)

Python seems like a weird choice for writing a SIP switch. It seems like you would want to write that sort of thing in C, for much the same reason apache is written in C.

Then again, I've never worked with SIP, so maybe I'm off base here.

I do think there's a desire for a language that's lower level than Python/Ruby/Perl, but higher level than C. Java and C++ sort of fill that role, but poorly. I'm hoping that I'll get a chance to test out Google Go to see if it works for that.

Moving to Python 3

Posted Feb 11, 2011 1:46 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (8 responses)

You may want to look into D: http://www.digitalmars.com/d/2.0/index.html

Like C++, but better. Native performance, safe dynamic arrays/strings, Unicode support, garbage collection (if you wish), namespaces, classes, exception handling, interfaces, value types, metaprogramming, compile-time evaluation, and more. The main limitation at the moment is that it remains a new language under heavy development, but it's far enough along for real programs.

Moving to Python 3

Posted Feb 11, 2011 19:47 UTC (Fri) by cmccabe (guest, #60281) [Link] (7 responses)

D is kind of interesting. Wikipedia says it's existed since 1999, but apparently the 1.0 release was only in 2007.

To me the big question is why I should use D over, say, Java. D has "Java-style single inheritance with interfaces and mixins" according to Wikipedia. If you're going to be Java-style, why not just use Java? I'm sure that D's syntax is probably slightly better than Java's. (That's not a difficult achievement.) But does that slight improvement justify throwing away all the existing libraries and code?

On the other hand, Google Go doesn't have inheritance at all. It has duck typing, enforced at compile-time. Its approach to concurrency is not mutexes and condition variables, but channels. I think these two improvements alone justify the switch to another language.

I am curious how the D and Google Go runtimes compare to the JVM. My experience has been that the JVM runs code relatively fast, but starting the JVM itself takes up a huge amount of resources. In comparison, equivalent C++ or C programs have a lower up-front cost. This is one of the reasons why Java web hosting is still more expensive than PHP web hosting, even today. Java has a high up-front performance cost but then does better than PHP over time.

Moving to Python 3

Posted Feb 11, 2011 21:02 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (6 responses)

D is far from "Java with slightly better syntax". It compiles to native binary code, like C or C++, and can link directly against C and C++ library functions. Its metaprogramming features give you all the flexibility of C++-style class and function templates, including compile-time "duck typing". It has a multithreading model based on message-passing, not mutexes or conditions, with explicit sharing and process-wide immutability built into the type system. Object invariants and pre-/post-condition annotation are available to support design-by-contract.

Moving to Python 3

Posted Feb 12, 2011 1:33 UTC (Sat) by cmccabe (guest, #60281) [Link] (5 responses)

> D compiles to native binary code, like C or C++

Well, technically gcj can compile Java to native binary code. However, Java in general has poor integration with non-Java code-- I can't deny that.

> Its metaprogramming features give you all the flexibility of C++-style
> class and function templates, including compile-time "duck typing"

Saying D has duck typing because it has templates is a little misleading. Nobody would seriously say that C++ has duck typing, and it also has templates. For most of your classes in either C++ or D, you're still going to be worrying about inheritance hierarchies and doing "big design up front" which seems more like the Java way of doing things, not the Ruby way.

Speaking of metaprogramming... one nice thing about Google Go is that because there's no inheritance hierarchies, there's no dynamic_cast. One less ugly piece of clutter.

The message passing stuff in Phobos is interesting. It seems that the integration into the language is just at the library level, though, rather than being an integral part of the syntax as in Go.

Overall, the more I learn about D, the more I see it as a "better C++". Like C++, it tries to include everything *and* the kitchen sink. Thankfully multiple inheritance didn't make the cut this time, but most of the other clutter did. (And just like C++, there are some weird omissions-- like reflection.) Google Go, on the other hand, seems to be a more elegant and minimalist language, kind of like C. I like that. But, these are just first impressions, and I guess I might change my mind later.

Moving to Python 3

Posted Feb 12, 2011 3:44 UTC (Sat) by nybble41 (subscriber, #55106) [Link] (4 responses)

> Saying D has duck typing because it has templates is a little misleading. Nobody would seriously say that C++ has duck typing, and it also has templates.

Why not? The following is valid D code:

void callMethod(T)(T object)
{
if (object.property)
object.method();
}
...
callMethod(new ClassA);
callMethod(new ClassB); // unrelated to ClassA

Any type which has the required property and method can be passed to the template function. I would say that looks exactly like duck typing. You can do the same thing in C++ with template functions, and in fact the STL makes extensive use of such functions.

If you end up using interfaces and/or inheritance instead that is only because the strongly-typed approach (which Go does not support) has advantages over duck typing, including better compile-time error checking and runtime performance.

> Speaking of metaprogramming... one nice thing about Google Go is that because there's no inheritance hierarchies, there's no dynamic_cast. One less ugly piece of clutter.

One could say that this is only because there is an implied dynamic_cast (but with worse performance, since it requires a runtime lookup) at every location where an object is used.

> The message passing stuff in Phobos is interesting. It seems that the integration into the language is just at the library level, though, rather than being an integral part of the syntax as in Go.

While I fail to see any reason to prefer extraneous syntax over a library, there is in fact some syntax in D intended to support multithreaded programming: the 'immutable' and 'sharable' type keywords, for example, plus the fact that global variables occupy thread-local storage by default. Assurance that shared state cannot be mutated behind the scenes (not just through a given reference, like 'const', but *anywhere* in the process) allows large messages to be passed efficiently and safely.

Moving to Python 3

Posted Feb 13, 2011 1:55 UTC (Sun) by cmccabe (guest, #60281) [Link] (3 responses)

> [For golang] One could say that this is only because there is an
> implied dynamic_cast (but with worse performance, since it requires a
> runtime lookup) at every location where an object is used.

You are confused. Go has static typing. As in, checked at compile-time. Code that misuses types will not compile.

Think of it this way: if you refer to a type T in a C++ template, and then use T::foo, you're not doing a dynamic_cast. You're just using the normal type system. If T does not have a foo method, the code will not compile. Similarly, in Go, you will get a compile-time, not runtime, error, if you try to use methods on an object that don't exist.

Also, you seem to be confused about C++ as well. dynamic_cast *always* "requires a runtime lookup" on the RTTI (runtime type information) in C++. If you do not compile with support for RTTI, you cannot use dynamic_cast. C++ templates, on the other hand, are a completely compile-time mechanism. I don't know why you would say that templated code is not as "strongly-typed" as other C++ code.

> While I fail to see any reason to prefer extraneous syntax over a library,
> there is in fact some syntax in D intended to support multithreaded
> programming: the 'immutable' and 'sharable' type keywords, for example,
> plus the fact that global variables occupy thread-local storage by
> default. Assurance that shared state cannot be mutated behind the scenes
> (not just through a given reference, like 'const', but *anywhere* in the
> process) allows large messages to be passed efficiently and safely.

There's a lot of syntax in D, but is it the right syntax? I think we are going to have to agree to disagree.

I also would like to note in passing that C/C++ have the const, restrict and __thread keywords, which could be used to provide exactly the same message-passing interface you describe in D. In fact, there have been a lot of them created over the years.

Moving to Python 3

Posted Feb 13, 2011 9:33 UTC (Sun) by nybble41 (subscriber, #55106) [Link] (2 responses)

> You are confused. Go has static typing. As in, checked at compile-time. Code that misuses types will not compile.... I don't know why you would say that templated code is not as "strongly-typed" as other C++ code.

Perhaps I am confused after all. I fail to see how Go can be statically typed, with, in particular, specific types for each function parameter, as well as "duck typed", where any type which provides certain methods will be accepted. If types must be known at compile-time then what would be the point of "duck typing"?

Note, too, that it is entirely possible to misuse types in a "duck typed" language, even with compile-time checking. Since there is no information regarding the connections between types (namely inheritance), any type which provides the right method signatures will be accepted, whether or not they were intended to be used in that fashion. This is where explicit interfaces and inheritance can prove very useful as a way of preventing semantic errors.

Go code and C++/D templates are not as strongly-typed for the reasons I stated above: namely, any type with a matching signature will be accepted, even if that use of the type's members is incorrect or even undefined. Classes with inheritance are more strongly typed because they explicitly state that they implement the expected interface--the semantics, not just the type signatures.

> Also, you seem to be confused about C++ as well. dynamic_cast *always* "requires a runtime lookup" on the RTTI (runtime type information) in C++.

RTTI is only required to test that the object you are casting is derived from the class you are casting to. As such, it is a very simple test. In Go the compiler must either generate separate code for each type (like C++/D template functions) or else look up the member offset or function pointer by name for each object at runtime, which seems to me quite likely to be far more expensive.

> I also would like to note in passing that C/C++ have the const, restrict and __thread keywords, which could be used to provide exactly the same message-passing interface you describe in D.

The 'const' and 'restrict' keywords in C++ are not equivalent to 'immutable' in D. Immutable data is guaranteed to never change, and this is enforced by the compiler. You can't pass mutable data to a function expecting immutable, for example. (You *can* pass mutable data to a function which accepts 'const' data, in addition to immutable data.) The 'restrict' keyword is just a declaration of intent. If you declare a pointer 'const' and 'restrict' then the compiler will optimize the code on the assumption that the data cannot be altered via another pointer, but it is up to the programmer to ensure that this is actually the case. I do not think it insignificant, either, that in D you have to specifically declare data as shareable between threads (again enforced by the compiler, unlike in C/C++), whereas in C/C++ all global data is shared by default and you must use __thread to make it private. In D thread-safety is the default; in C/C++ it is a rarely-used compiler-specific extension. (C++0x will supposedly add TLS as a standard storage class.)

Moving to Python 3

Posted Feb 13, 2011 21:12 UTC (Sun) by cmccabe (guest, #60281) [Link]

> Perhaps I am confused after all. I fail to see how Go can be statically
> typed, with, in particular, specific types for each function parameter, as
> well as "duck typed", where any type which provides certain methods will
> be accepted. If types must be known at compile-time then what would be the
> point of "duck typing"?

Golang's philosophy is that inheritance is evil. Not "multiple inheritance is evil" (that is Java's philosophy), or "inheritance is often less useful than composition" (that's Scott Meyers' philosophy in Effective C++). Just "inheritance is evil."

Why is inheritance evil? Well, it forces you to do a lot of work up front before you start writing code. A lot of that work is just writing boilerplate code like Singletons, abstract base classes, Factories, Adaptors, etc. This leads to longer and less readable code. Changing the inheritance hierarchy is difficult after you've written the code. Moreover, unless the code is totally trivial, you will *have* to change the hierarchy in response to changing requirements and new insights into the design that you'll have over time.

The dirty little secret of C++ is that code written in the high-level, object-oriented style often tends to be longer than code written in the old-fashioned C style. It starts to smell like Java.

For a good criticism of Java, and deep inheritance hierarchies in general, see:
http://steve-yegge.blogspot.com/2006/03/execution-in-king...

[snip discussion of TLS, const, and restrict]

You seem to have a good understanding of const and restrict. Your analysis is correct. I'm glad to hear that __thread will be standardized soon. pthread_getspecific is slow on Linux.

Moving to Python 3

Posted Feb 15, 2011 13:05 UTC (Tue) by marcH (subscriber, #57642) [Link]

> Perhaps I am confused after all. I fail to see how Go can be statically typed, with, in particular, specific types for each function parameter, as well as "duck typed", where any type which provides certain methods will be accepted. If types must be known at compile-time then what would be the point of "duck typing"?

The point is: no need to design and maintain a type *hierarchy*. This point looks orthogonal to the static versus dynamic debate.

Moving to Python 3

Posted Feb 11, 2011 10:48 UTC (Fri) by mgedmin (subscriber, #34497) [Link]

Anthony Baxter had an entertaining presentation about writing VoIP code in Python back in 2004, titled "Scripting Language", My Shiny Metal Arse. Apparently, it is fast enough.

Moving to Python 3

Posted Feb 11, 2011 6:36 UTC (Fri) by njs (subscriber, #40338) [Link]

The multiprocessing module can use shared memory to pass state:
http://docs.python.org/library/multiprocessing.html#shari...

I've never used it, but in principle it should be pretty much equivalent to threads.

Moving to Python 3

Posted Feb 13, 2011 20:42 UTC (Sun) by spaetz (guest, #32870) [Link]

What's wrong with multiprocess. According to the docs, some parts don't work on *bsd. Which breaks intrroperability.

Moving to Python 3

Posted Feb 11, 2011 1:09 UTC (Fri) by cmccabe (guest, #60281) [Link] (9 responses)

> I was a huge python fan until a few years ago when I started making bigger
> and bigger apps in it, and threading them more and more. And I was not
> getting the performance gain from multi-core machines that I would get
> with other languages.

Just out of curiosity, what applications did you feel you needed threads for?

If you're sharing a lot of state between threads, then I have to ask why? It really makes everything so much more difficult. My experience has been that once your program starts using locking, you're not object-oriented any more; you're "mutex-oriented." You can't just freely reuse objects and code because you might violate the constraints that the code was written under.

If you're not sharing a lot of state, then processes are just as good as threads.

Moving to Python 3

Posted Feb 13, 2011 13:39 UTC (Sun) by kleptog (subscriber, #1183) [Link] (5 responses)

I'm not a huge fan of threads but where they mostly come in is dealing with I/O. Say you have an app that has to send/receive data over 3 different pipes and in between it also has to do actual work. While you can write a main loop that does a select() over each descriptor and calls the right code when something becomes readable/writable, it's conceptually much clearer having a thread whose job it is to read any data and process it. Especially when you have to deal with issues like write() blocking, etc...

This right away gives you 4 threads. Add a thread to monitor everything (since thread death does not get signalled anywhere) and you're at 5.

There's not so much shared state as that I/O on any port can execute callbacks which could access anything the initiator of the request wanted (go closures!). There's barely any locking, python's atomic instructions is sufficient (though I imagine Queue does it under the hood).

One effect of the fact that I/O falls outside the GIL means that the process running at full speed can take 110% CPU. (There's a lot of I/O).

Back to the issue at hand: Python2's unicode handling bites me daily. Whoever decided that using str() on a unicode string should *except* when you have a unicode character, should be shot. Just error *every* time, then I won't get called at 3 in the morning to fix the bloody thing (usually buried in some library, even some standard python libs have had bugs in the past).

Moving to Python 3

Posted Feb 14, 2011 19:26 UTC (Mon) by cmccabe (guest, #60281) [Link] (1 responses)

> I'm not a huge fan of threads but where they mostly come in is dealing
> with I/O. Say you have an app that has to send/receive data over 3
> different pipes and in between it also has to do actual work. While you
> can write a main loop that does a select() over each descriptor and calls
> the right code when something becomes readable/writable, it's conceptually
> much clearer having a thread whose job it is to read any data and process
> it. Especially when you have to deal with issues like write() blocking,

Twisted is a great way to do multiple I/O operations without using threads. It wraps the ugly select() interface in something much nicer.

http://en.wikipedia.org/wiki/Twisted_(software)

Moving to Python 3

Posted Feb 17, 2011 8:10 UTC (Thu) by rqosa (subscriber, #24136) [Link]

Or libevent with pyevent (or with gevent).

Moving to Python 3

Posted Feb 17, 2011 8:26 UTC (Thu) by rqosa (subscriber, #24136) [Link] (2 responses)

That way can scale poorly, because there must be at least one thread per FD.

Using an epoll-driven main loop and a pool of worker threads (with one work queue per worker thread) makes the amount of threads become independent from the amount of FDs, so you can adjust the amount of threads to whatever gives the best performance. It also has the benefit of avoiding the overhead of thread-start-on-FD-open and thread-quit-on-FD-close, since you can reuse the existing threads. (Make it so that any idle thread will wait on a semaphore until its work queue becomes non-empty. Also, rather than using epoll directly, use libevent, so that it's portable to non-Linux systems.)

Moving to Python 3

Posted Feb 17, 2011 8:41 UTC (Thu) by rqosa (subscriber, #24136) [Link]

> at least one thread per FD

Forgot to mention this in my previous post: the "one thread/process per FD" pattern is the main design issue that made possible the Slowloris DoS attack, which LWN covered 2 years ago.

Moving to Python 3

Posted Feb 17, 2011 20:50 UTC (Thu) by kleptog (subscriber, #1183) [Link]

I think you misread my post: I have a fixed number of FDs and a fixed number of threads. Each FD has a completely different purpose and protocol and so an event loop is not really practical. You have to get several different components of the system (which don't know of each other's existence) to work through a single event loop. Sure it's possible, but threads are a nice way to isolate them.

Of course in the general case you are right, a service like a webserver should try to reduce the number of threads. But also in the special case of CPython it's pointless to use more threads, since the GIL prevents more than one thread running at a time anyway.

Moving to Python 3

Posted Feb 14, 2011 4:33 UTC (Mon) by ssmith32 (subscriber, #72404) [Link] (2 responses)

As the other comment mentioned - I/O stuff is common.

Where I first started running up against problems was in a GUI that had a central append-only (which eased the locking constraints) data structure, and a corresponding tree of threads. Each thread could generate a new node(s) on the data structure, which might need more analysis by more threads. The operations on the data structure that needed to be thread safe had the thread safety encapsulated in the data structure. It just fell naturally into a threaded design.

As far as being mutex-oriented programming - with an appropriately designed shared data structure, I've rarely ran into that problem. There's usually a couple operations that you think hard about, get right, encapsulate, and move on. OOP really complements this well.

As far as the GIL-specific issue I ran into - with the GIL, your python interpreter runs on a single core, and all threads in that instance of the interpreter run on that core. So it makes multi-core kind of useless for heavily threaded programs. If you go the multi-process route, you get multiple interpreters. But like I said, sometimes threads can be nice :|

Moving to Python 3

Posted Feb 14, 2011 19:39 UTC (Mon) by cmccabe (guest, #60281) [Link] (1 responses)

> Where I first started running up against problems was in a GUI that had a
> central append-only (which eased the locking constraints) data structure,
> and a corresponding tree of threads. Each thread could generate a new
> node(s) on the data structure, which might need more analysis by more
> threads. The operations on the data structure that needed to be thread
> safe had the thread safety encapsulated in the data structure. It just
> fell naturally into a threaded design.

In my experience, when you start sharing a lot of data between threads, you can take one of two approaches. You can have a giant lock that covers all threads. This is sort of like the old BKL or the GIL itself. One giant lock is simple, but it limits scalability a lot. Alternately, you can have many different little locks. This is great for performance and scalability, but hard on the poor programmers. Debugging becomes very difficult because runs are not repeatable. Code reuse is impaired because everything is tangled in this web of locks and you can't easily move code around.

When you have many little locks, the usual approach is to impose absolute ordering, so that if you take lock A before B at any point, you must always take A before B. That's what the kernel does. It seems to be the best strategy, but again, why impose this on yourself when you don't have to? Just don't share state unless you have to.

It's sad that programmers have been trained to think that threads are "simple" and "natural" and processes are "hard." That's the exact reverse of reality. I blame Windows and its high per-process overhead.

Moving to Python 3

Posted Feb 15, 2011 11:15 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

Data sharing looks easy in threadland, as long as you don't look too closely. Processes make you worry about files and sockets and pipes and god-knows-what, and J. Newbie Programmer thinks that all looks like a lot more work than just having threads that can all see all the data.

(See also: "Why do I have to go through this big scary graphics library with lots of things to set up? Why can't I just stuff pixels into the graphics card?")

Moving to Python 3

Posted Feb 10, 2011 10:01 UTC (Thu) by talex (guest, #19139) [Link] (2 responses)

The new unicode handling looks good, but of course we have to wait a few years for the older distributions to include it. The main problem with Python 3 is that one distribution has decided that "python" is now "python3", breaking all existing scripts. Meanwhile, other distributions provide only "python" and not "python2". As far as I can see, every Python program that wants to support all distributions must now start with:

#!/usr/bin/env python
import sys
if sys.version_info[0] > 2:
        import os
        os.execvp("python2", ["python2"] + sys.argv)
# ... chain-load Python 2 code without using syntax that Python 3 will choke on

This is pretty horrible.

Moving to Python 3

Posted Feb 10, 2011 10:37 UTC (Thu) by tetromino (guest, #33846) [Link]

This is why Gentoo
(a) installs python executables as /usr/bin/pythonX.Y, with /usr/bin/python2 and /usr/bin/python3 as customizeable symlinks, and /usr/bin/python as a symlink to an intelligent wrapper controlled by eselect; and
(b) provides the python_convert_shebangs() function in the python eclass, allowing ebuild writers to easily change all the "#!/usr/bin/env python*" strings in the scripts in the package to call whichever python version is appropriate for that package.

Moving to Python 3

Posted Feb 18, 2011 21:03 UTC (Fri) by valhalla (guest, #56634) [Link]

Or you could use distutils / distribute, and then shebangs are taken care of at installation time.

Most PKGBUILD for said distribution just run python2 setup.py and everything works.

Missing details

Posted Feb 10, 2011 16:50 UTC (Thu) by southey (guest, #9466) [Link]

Sure there are some immediate issues (such as print and division) but these have been around for some time:
- the use of the -3 flag (Python 2.6)
- usage of 'from __future__ import' ('from __future__ import division' was Python 2.2 - 2002!)
Also Python 2.7 has brought more attention than Python 2.6 to need to migrate.

However, I think there are two bigger issues involved. One for developers has been the API changes because many Python 2.x projects often used depreciated functions that were removed for Python 3.

The second is the far more important is which Linux distributions are providing Python 3.x as the default Python?
While you can build Python 3 yourself or have it as secondly package in recent distros, you can not use it as default without disrupting any Python 2.x code especially system related code.

Moving to Python 3

Posted Feb 10, 2011 17:57 UTC (Thu) by ssam (guest, #46587) [Link] (2 responses)

so what are the major libraries that people are waiting for. is there a table somewher?

personally i need the numpy, scipy and matplotlib family, which i think are mostly ported.

Moving to Python 3

Posted Feb 11, 2011 11:45 UTC (Fri) by shane (subscriber, #3335) [Link]

You could try here:

http://getpython3.net

Moving to Python 3

Posted Feb 12, 2011 18:43 UTC (Sat) by jensend (guest, #1385) [Link]

Numpy started supporting Python 3 with 1.5 last summer; SciPy just finished the second release candidate of .9, which supports Py3K, and the final version should arrive quite soon. Matplotlib looks like they need some help getting things ported though.

With NumPy and SciPy finally Python 3-compatible, two of the biggest reasons why people have stuck with Python 2 are finally gone. With the release of GTK+ 3 with PyGObject, distros that have depended on PyGTK+ will start to move as well. Chances seem good that we may see most of the momentum shift to Python 3 this year.

Dealing with external scripts

Posted Feb 11, 2011 22:19 UTC (Fri) by schwitrs (subscriber, #3822) [Link] (2 responses)

I maintain a scientific software package which is partially written in Python, but more relevantly, has Python user scripts which I do not and cannot maintain. It is very important that old results are reproducible. A requirement of very minor, obvious changes to the scripts might fly, but the Python 2 to Python 3 transition doesn't seem to be this. Has anyone tried to deal with this sort of problem?

Dealing with external scripts

Posted Feb 11, 2011 22:48 UTC (Fri) by foom (subscriber, #14868) [Link]

Yes -- most people seem to be dealing with this problem simply by sticking with Python2...

Dealing with external scripts

Posted Feb 12, 2011 17:23 UTC (Sat) by ssam (guest, #46587) [Link]

probably best not to worry. python2 will still be around for a long time. there is still plenty of FORTRAN 77 around, especially in science.

Avoid Version Numbers

Posted Feb 15, 2011 1:48 UTC (Tue) by ldo (guest, #40946) [Link] (2 responses)

I hate version numbers. Dont check for version numbers; instead, check for something as close as possible to the actual functionality you need.

Python even helps you, by providing ways to query it for things. E.g. to check for the bytes function, you dont have to go through the rigmarole of watching for a NameError exception; just write

hasattr(__builtins__, "bytes")

What could be simpler? And similarly the PYTHON3 version check could be replaced with something like

not hasattr(__builtins__, "unichr")

(Negated so its True for Python 3.x and False for earlier versions.)

Avoid Version Numbers

Posted Feb 15, 2011 14:20 UTC (Tue) by Webexcess (guest, #197) [Link]

I was actually using:

    not str is bytes

But when you really mean "is this python3" then that's not clear to someone reading the code so I changed it.

Avoid Version Numbers

Posted Mar 8, 2011 22:55 UTC (Tue) by bluss (guest, #47454) [Link]

Ironically, __builtins__ is officially an implementation detail of CPython. Recommended way to find the module containing the built-in functions on Python2 is: import __builtin__; on Python 3: import builtins.

Quite frankly

Posted Feb 17, 2011 19:51 UTC (Thu) by NikLi (guest, #66938) [Link] (2 responses)

i don't see any new features in Python 3 that would make me want to switch.

I mean, "why?". And then there is a certain pressure to make the switch and show to the world that Python3 was a huge success, unlike the "perl6 fiasco", and then show the success of the python development model, PEPs, BDFL, etc. This seems like burning bridges as in "if we go back to python2, we will be like those perl6 people we've been making fun of".

The techinal differences are minimal and many of them in the domain of "nits".

I find arguments that say that "in the future there will be only Python3" misleading. There are many firms that use python extensively for their services and THEY WON'T SWITCH ANY TIME SOON. Count google as one of them, and i bet you they haven't got the least interest to rewrite their infrastructure to python3 because "dict.keys is an iterator", etc.

And, last but not least, YOU DON'T BREAK "Hello World".

Quite frankly in education i'd rather teach my students Python2, so they'll go work for google or something...

Quite frankly

Posted Feb 17, 2011 21:17 UTC (Thu) by foom (subscriber, #14868) [Link] (1 responses)

I attempted to ask the "are you really sure?" question a few different times during the development of 3.0, but obviously the answer was "yes!" every time.

But, at this point, there have been 3 releases of 3.x made now, and 2.x isn't being developed anymore. So, if you want any new features ever, you either have to pick up maintenance of 2.x, or switch to 3.x.

I wouldn't say it's impossible to conceive of the first alternative happening, but I'm certainly not interested in doing that, even though I'd really prefer if 3.x just magically ceased to exist. In another 2-3 years when 3.2 is installed ubiquitously alongside 2.x, maybe I'll even start writing python3 code. Stranger things could happen. :)

2.x features and improvements

Posted Feb 21, 2011 23:26 UTC (Mon) by pboddie (guest, #50784) [Link]

But, at this point, there have been 3 releases of 3.x made now, and 2.x isn't being developed anymore. So, if you want any new features ever, you either have to pick up maintenance of 2.x, or switch to 3.x.

However, most of the implementations apart from CPython work with 2.x features, and although there have been noises amongst some of them about 3.x support being a possibility, the priorities of their developers would appear to be the development of other kinds of features than language features. So, for example, PyPy would seem to be sticking with 2.x support and concentrating on performance - it's already faster than CPython for some things and getting faster - so if you value those kinds of features over the language tidying that 3.x represents, then you're not going to switch to 3.x.

But then again, neither are the implementation developers, so 2.x is still a very safe bet.

Moving to Python 3 (I Feel Sorry)

Posted Mar 13, 2011 17:22 UTC (Sun) by litosteel (guest, #33304) [Link] (1 responses)

why don't you move on to Ruby. And let the fun begins.
I'm really sorry by the Python, Perl and other lang's people.
If a languages changes from version X to version Y, may is not a language, but a joke...
(sorry for the flames, but can't stop my feelings)

Moving to Python 3 (I Feel Sorry)

Posted Mar 13, 2011 23:45 UTC (Sun) by foom (subscriber, #14868) [Link]

I assume your comment was a joke? Cause Ruby has been incompatibly changing way more than Python...