|
|
Subscribe / Log in / New account

Twisted and Python 3

By Jake Edge
June 2, 2016
Python Language Summit

Amber Brown led a session at the 2016 Python Language Summit on the progress in porting the Twisted event-driven networking framework to Python 3. The lack of a Python 3 version of Twisted has been considered one of the larger barriers to adopting the new version of the language, so progress on that front is of great interest in the Python community.

According to Brown, roughly 50% of the lines of code in Twisted have been ported at this point. But, of things that people are likely to want to use, that number is more like 78%, she said. Users who want to write their own Network News Transfer Protocol (NNTP) server will be out of luck, but for most of the commonly used protocols, Twisted will run on Python 3.

[Amber Brown]

That amounts to some 40,000 lines of code ported, which opens up 100,000 lines of code in third-party libraries to be able to use Python 3. The 40 or so patches merged for the port had roughly 6500 lines inserted and 4400 lines deleted.

In the seven years since Python 3 came out, there has been little progress in porting Twisted until recently; she asked, "why has it taken so long?" The bytes versus Unicode divide was one of the major barriers and early releases in the 3.x series did not have support for byte-handling features that Twisted really needs. The change to how strings are handled was good, and cleaned up a lot of ambiguity, she said. But Twisted deals with protocols on the wire, so it needs to use byte strings.

Python 3 lacks explicit Unicode strings using the u'' notation, while Python 2 is missing b'' for byte strings. In addition, until relatively recent Python 3 versions, there was no way to use the "%" formatting operator for byte strings. PEP 461 was adopted to add formatting for byte strings at the behest of the Twisted and Mercurial projects. But there is no .format() method for byte strings, so the Python 2.7 Twisted code using that all has to change.

That leads to more time spent porting and more code to review, she said. There are effectively three string types in the Python world: bytes, Unicode strings, and strings. And there are inconsistencies among them. For example, sys.path() returns bytes on Unix, but strings on other operating systems. In addition, cgi.parse_multipart() returns strings on Python 3, which is just wrong.

There is an "avalanche of changes" that comes from porting to Python 3, she said. New style classes by default broke a lot of things, as did the differences between bound and unbound methods. But, she said, those in the room are all aware of these problems.

Porting to Python 3 was "the most expensive thing we have ever done for Twisted", she said. On average, she spent two hours a day for a year and a half working on it. That cost upwards of $60,000 just for her time, most of it unpaid. That doesn't include lots of time spent by others, including reviewers, and there are "thousands of hours left to go". She is now down deep into porting the protocols in Twisted, which is the harder part.

The "unfortunate reality" is that if she didn't do that work, it would not have happened. Other Twisted developers had written off porting to Python 3. Earlier versions in the 3.x series would have made the job too large, she said; it is only since Python 3.3 was released that porting has become tractable.

But porting to Python 3 has been a "massive drain" on the development of features in Twisted. Half of the patches in the review queue are for the port. As with most projects, reviewers are a scarce resource, and the port patches require a lot of care and knowledge of the problem domain.

That leads to the question of who is using Python 3. The reality is that Python is falling by the wayside for performance-sensitive applications, she said. People are turning to Go or other options. And Twisted on Python 3 is a less attractive target for developers than Twisted on PyPy 5.1—because of the performance.

So, Twisted has spent an enormous amount of time changing its codebase to end up with slower code. PyPy and Pyston make Python competitive with Go in terms of performance, but only really support Python 2.7 at this point. There are some 3,500 C API functions in Python, which is a huge barrier for projects like PyPy and Pyston. She asked: "How do we stop this from happening again?" Long term, it may well be that asyncio (along with async/await) will provide much of the functionality of the Twisted core.

Guido van Rossum asked about interoperability between Twisted and asyncio. Brown said that it is possible to await a Twisted Deferred so mixing the two is possible. Twisted will be able to share its event loop with that used by asyncio, she said. "The golden age of Twisted and asyncio is 2016", she said to a round of applause. There are still some patches to be merged and some edge cases to be worked out, but there is enough of Twisted working for Python 3 that it can be done.

Brown said she wondered what users with large Python 2.7 codebases would do in 2020 when 2.7 is deprecated and no longer gets updates. She thinks they will simply keep running it. Van Rossum said "that's fine", but that they won't get updates. For Twisted, though, Brown thinks the project will probably end up supporting Twisted on 2.7 for five years after users can realistically port to Python 3, which probably means 2022 or beyond.


Index entries for this article
ConferencePython Language Summit/2016


(Log in to post comments)

Twisted and Python 3

Posted Jun 4, 2016 2:39 UTC (Sat) by nas (subscriber, #17) [Link]

People will absolutely keep running Python 2.x code after the official depreciation date (e.g. 2020). There is just too much code out there and porting to Python 3 is too difficult. If the Python core team doesn't maintain a 2.x interpreter, some other group will. I'm dismayed that many core Python developers have this idea that people can be forced to move to Python 3. At this point, I'm not even sure if Python 3 is going to succeed. I suspect there is still more new Python 2 code being written than Python 3 code.

We need to make the transition easier. To that end, I'm experimenting with a "pragmatic" Python 3.x fork. The idea is to have an intermediate stepping stone between Python 2.7.x and stock Python 3.x. I want a 2to3 tool that does all of the easy and mechanical changes (print function, import renames, try/except syntax, removal of backticks, etc). For the rest, the interpreter would support as much of the 2.7.x behavior as possible, with warnings where behavior will change in Python 3. For example, "hello " + b"world" would result in "helloworld" and a warning generated. None would still compare smaller than all other types.

I think this is a doable project. I don't think we want this kind of behavior in the Python 3 branch. The string / bytes split *is* a useful for avoiding bugs. However, given the fact that millions or billions of lines of Python 2.x code exist in the world, it makes sense to make porting it as easy as possible.

Twisted and Python 3

Posted Jun 4, 2016 3:28 UTC (Sat) by nas (subscriber, #17) [Link]

Since the advice is to release early and often:

https://github.com/nascheme/ppython

This is only the result of a couple of hours of hacking. My plan is to find a bunch of Python 2 programs, run 2to3 on them, then see what breaks when running them under "Pragmatic Python". Ideally I want code that comes out of 2to3 to run successfully without manual changes. Also, code that runs on "Pragmatic Python" without any warnings should ideally correctly run on Python 3 without changes.

Twisted and Python 3

Posted Jun 11, 2016 13:51 UTC (Sat) by santagada (guest, #109284) [Link]

great work, please post about it in python-ideas... I think if you can make all the changes be a switch like --compatible then it could be part of cpython... people porting code just activate the switch and iterate until they have no errors.

Twisted and Python 3

Posted Jun 13, 2016 6:53 UTC (Mon) by mgedmin (subscriber, #34497) [Link]

> Python 3 lacks explicit Unicode strings using the u'' notation, while Python 2 is missing b'' for byte strings.

Something must've been lost in the transcription. Python 2.6 introduced b'byte string' notation for forwards-compatibility with Python 3.x. Python 3.3 reintroduced u'unicode string' notation for backwards-compatibility with Python 2.x (specifically, to make it easier to have a single code base that is compatible with both Python 2 and 3 without using conversion tools like 2to3 or 3to2).


Copyright © 2016, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds