PyCon: Asynchronous I/O

By Jake Edge
March 27, 2013

Introduced as needing no introduction, Python's creator and benevolent dictator for life (BDFL), Guido van Rossum, took the stage on March 17 for a PyCon 2013 keynote. One might expect a high-level talk of the language's features and/or future from the BDFL, but that was not at all the case here. Unlike many keynote speakers, Van Rossum launched into a technical talk about a rather deep subject, while granting permission to leave to those recovering from the previous night's party. A single feature that is in the future of Python 3, asynchronous I/O, was his topic.

Van Rossum started looking into the problem after a post on the python-ideas mailing list that was "innocently" proposing changes to the asyncore module in the standard library. The subject of the posting, "asyncore: included batteries don't fit" piqued his interest, so he followed the thread, which grew to a "centithread" in two weeks. He "decided to dive in", because he had done a lot of work recently on the asynchronous API for Google App Engine. Unlike previous times that asynchronous I/O had come up, he now understood why people cared and why it was so controversial.

Existing approaches

The basic idea behind asynchronous I/O is "as old as computers". Essentially it is the idea that the program can do something else while it is waiting for I/O to complete. That is unlike the normal operation of Python and other languages, where doing an I/O operation blocks the program. There have been lots of approaches to asynchronous I/O over the years, including interrupts, threads, callbacks, and events.

Asynchronous I/O is desirable because I/O is slow and the CPU is not needed to handle most of it, so it would be nice to use the CPU while the I/O is being done. When clicking a button for a URL, for example, asynchronous I/O would allow the user interface to stay responsive, rather than giving the user a "beach ball" until the "other server burps up the last byte" of its response. The user could initiate another I/O operation by clicking on a new URL, so there might be multiple outstanding I/O requests.

A common paradigm is to use threads for asynchronous I/O. Threads are well-understood, and programmers can still write synchronous code because a thread will just block when waiting for I/O, but the other threads will still run. However, threads have their limits, and operating system (OS) threads are somewhat costly. A program with ten threads is fine, but 100 threads may start to cause some worry. Once you get up to 1000 threads, you are "already in trouble".

For example, handling lots of sockets is problematic. The OS kernel imposes limits on the number of sockets, but those limits are typically one or two orders of magnitude larger than the number of threads that can be supported. That means you can't have a thread per connection if you want to be able to support the maximum number of connections on the system.

Beyond that, though, a "big problem" with OS threads is that they are preemptively scheduled, which means that a thread can be interrupted even if it isn't waiting for I/O. That leads to problems with variables and data structures shared between threads. Avoiding race conditions then requires adding locks, but that can lead to lock contention which slows everything down. Threads may be a reasonable solution in some cases, but there are tradeoffs.

The way to do asynchronous I/O without threads is by using select() and poll(), which is the mechanism that asyncore uses. But asyncore is showing its age, it isn't very extensible, and most people ignore it entirely and write their own asynchronous code using select() and poll(). There are various frameworks that can be used, including Twisted, Tornado, or ZeroMQ. Most of the C libraries (e.g. libevent, libev, libuv) that handle asynchronous I/O have Python wrappers available, but that gives them a "C-like API style". Stackless and gevent (along with a few others) provide another set of alternatives.

And that is part of the problem: there are too many choices. "Nobody likes callbacks" as an interface, or at least Van Rossum doesn't, and many of the choices rely on that. The APIs tend to be complicated partly because of the callbacks, and the standard library doesn't cooperate, so it is of "very little use" when using one of the solutions.

Advocates of gevent would claim that it solves all those problems, but "somehow it doesn't do it" for him. There are some "scary implementation details", including CPython- and x86-specific code. It does some "monkey patching" of the standard library to make it "sort of work". It also does not avoid the problem of knowing where the scheduler can switch tasks. There is a specific call that gets made to cause task switches to happen, but you never know when that may be called. Some library function could be making that call (or may in the future), for example. It essentially is the same situation as with OS threads. Beyond switching at unexpected times, there is also the problem of not switching enough, which can cause some tasks to be CPU starved. He "can't help" wanting to know if a particular line of code could end up suspending the current task.

A new framework

So, Van Rossum is working on "yet another standard framework that is going to replace all the other standard frameworks ... seriously", he said with a chuckle—to applause. The framework will standardize the event loop. There aren't too many choices for how to implement an event loop, he said. The ones that exist are all doing more or less the same things.

The event loop is special because it serializes the execution of the program. It guarantees that while your code is running, nothing else is, and that the shared state cannot be changed until "you say you are done with it", Van Rossum said. All of that implies that there should only be one event loop in a program. You can't really mix a Twisted event loop and a gevent event loop in the same program, which means that the existing frameworks do not interoperate.

Van Rossum looked at the existing frameworks and their event loops to look for commonality. The essential elements of an event loop are fairly straightforward. There needs to be a way to start and stop the loop. Some way to schedule a callback in the future (which might be "now") needs to be available, as well as a way to schedule repeated, periodic callbacks. The last piece is a way to associate callbacks with file descriptors (or other OS objects that represent I/O in some way) of interest. Depending on the OS paradigm, those callbacks can be made when the file descriptor is "ready" (Unix-like) or when it is "done" (Windows). There is also the need for the framework to abstract away choosing the proper I/O multiplexing mechanism (select(), poll(), epoll(), others) for the system in an intelligent way.

The existing frameworks do not interoperate today, and each has its strengths and weaknesses. Twisted is good for esoteric internet protocols, while Tornado is well-suited for web serving, but making them work together is difficult. There are various "pairwise" solutions for interoperability, but there are lots of pairs that are not covered.

So, he has come up with Python Enhancement Proposal (PEP) 3156 and a reference implementation called Tulip. Using a slide of the xkcd comic on standards, Van Rossum noted that he was solving the problem of too many standards by adding a new standard. But, he pointed out that PEP 3156 could actually be considered a standard because it will eventually end up in the standard library. That was greeted with some laughter.

"I know this is madness", he said, as everyone has their favorite framework. Suggestions to put Twisted in the standard library or to "go back further in history" and adopt Stackless (along with other ideas) were floated in the original mailing list thread. He did not completely make up his own framework, though, instead he looked at the existing solutions and adopted pieces that he felt made sense. Certain things from Twisted, particularly its higher-level abstraction for I/O multiplexing (which works well for Windows), as well as its Transports and Protocols, were adapted into Tulip.

So PEP 3156 is the interface for a standard event loop, while Tulip is an experimental prototype that will eventually turn into a reference implementation. Tulip will be available to use in Python 3.3, even after it is incorporated "under a better name" into the standard library for Python 3.4. Tulip will also serve as a repository for extra functionality that doesn't belong in the standard library going forward.

PEP 3156 is not just an event loop API proposal, it also proposes an interface to completely swap out the event loop. That means that other frameworks could plug in their event loop using a conforming adaptor and the user code would still work because it makes Tulip/3156 calls. The hope is that eventually the frameworks switch to using the standard event loop.

Callbacks without callbacks

There is even more to the PEP, to the point where some have suggested he split it into two pieces, which he may still do. The second major piece is a new way to write callbacks. Futures, a mechanism for running asynchronous computations, was introduced in PEP 3148 and added in Python 3.2. Tulip/3156 has adapted Futures to be used with coroutines as a way to specify callbacks, without actually using callbacks.

A Future is an abstraction for a value that has not yet been computed. The Futures class used in Tulip is not exactly the same as the Python 3.2 version, because instead of blocking when a result is required, as the earlier version does, it must use ("drum roll please") the yield from statement that came from PEP 380, which got added in Python 3.3. It is "an incredibly cool, but also brain-exploding thing", Van Rossum said.

While he wanted to emphasize the importance of yield from, he wasn't going to teach it in the talk, he said. The best way to think about it is that yield from is a magic way to block your coroutine without blocking the application. The coroutine will unblock and unsuspend when the event it is waiting on completes (or is ready). The way to think about Futures is to "try to forget they are there". A yield from and a Future just kind of cancel out and the value is what would be returned from the equivalent blocking function. That is the "best I can say it without bursting into tears", he said.

The fact that Futures have an API, with result() and exception() methods, as well as callbacks, can largely be ignored. One just calls a function that returns a Future and does a yield from on the result. Error handling is simplified compared to using callbacks because a normal try/except block can be used around the yield from.

Coroutines are basically just generators, and the @coroutine decorator is empty in the current Tulip code. It is purely there for the human reader of the code, though there may be some debug code added eventually. Coroutines by themselves do not give concurrency, it is the yield from that drives the coroutine execution.

Van Rossum was running on low on time, and said there was "lots more" he could talk about. If the interoperability story fails, the xkcd comic comes true, he said, but he is hopeful that over time the new framework "will help us move to a world where we can actually all get along". So that if someone finds some code that uses Twisted and other code that uses gevent, both of which are needed in their application, they will be able to use both.

"When can you have it?", he asked. The code and the PEP are very much in flux right now. He is pushing hard to have something complete by November 23, which is the cutoff for Python 3.4. By then, Tulip should be available in the PyPI repository. Once 3.4 is out the door, the rest of the standard library can be looked at with an eye toward making them work with the asynchronous framwork. Some pieces (e.g. urllib, socketserver) will likely need to be deprecated or will be emulated on top of PEP 3561. Older Pythons (i.e. 2.7) are "out of luck". He has no plans to support them, and hopes that the new framework serves as a carrot to move people to 3.3 (and beyond)—there are so many "silly things in older versions of the language". After a round of acknowledgments, Van Rossum left the stage, heading off to some code sprints scheduled as part of PyCon over the next two days.

Index entries for this article
Conference	PyCon/2013

PyCon: Asynchronous I/O

Posted Mar 28, 2013 12:35 UTC (Thu) by pj (subscriber, #4506) [Link]

I'm glad to hear that python will finally standardize on an async core, ahd it sounds like on some standard usage patterns. ayay!

PyCon: Asynchronous I/O

Posted Mar 28, 2013 18:36 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

>a reference implementation called Tulip

And thus Guido stands revealed as a Calvinist! ;-)

PyCon: Asynchronous I/O: glib

Posted Apr 1, 2013 19:52 UTC (Mon) by talex (guest, #19139) [Link]

I recently added (experimental) Tulip support to 0install (it uses it if the glib Python bindings aren't available). The conversion from glib to tulip was straight-forward and, although the code was designed for glib, it was actually cleaner after conversion, IMHO:

https://github.com/0install/0install/commit/2b13b69ace187...

(the patch also adds a compatibility wrapper to make glib look like tulip, but it only implements the minimal functionality that I needed)

This should be a big win for the Windows port, which currently has to bundle the whole glib/gobject/gi stack just for the event loop.