Development

PyCon: Asynchronous I/O

By Jake Edge
March 27, 2013

Introduced as needing no introduction, Python's creator and benevolent dictator for life (BDFL), Guido van Rossum, took the stage on March 17 for a PyCon 2013 keynote. One might expect a high-level talk of the language's features and/or future from the BDFL, but that was not at all the case here. Unlike many keynote speakers, Van Rossum launched into a technical talk about a rather deep subject, while granting permission to leave to those recovering from the previous night's party. A single feature that is in the future of Python 3, asynchronous I/O, was his topic.

Van Rossum started looking into the problem after a post on the python-ideas mailing list that was "innocently" proposing changes to the asyncore module in the standard library. The subject of the posting, "asyncore: included batteries don't fit" piqued his interest, so he followed the thread, which grew to a "centithread" in two weeks. He "decided to dive in", because he had done a lot of work recently on the asynchronous API for Google App Engine. Unlike previous times that asynchronous I/O had come up, he now understood why people cared and why it was so controversial.

Existing approaches

The basic idea behind asynchronous I/O is "as old as computers". Essentially it is the idea that the program can do something else while it is waiting for I/O to complete. That is unlike the normal operation of Python and other languages, where doing an I/O operation blocks the program. There have been lots of approaches to asynchronous I/O over the years, including interrupts, threads, callbacks, and events.

Asynchronous I/O is desirable because I/O is slow and the CPU is not needed to handle most of it, so it would be nice to use the CPU while the I/O is being done. When clicking a button for a URL, for example, asynchronous I/O would allow the user interface to stay responsive, rather than giving the user a "beach ball" until the "other server burps up the last byte" of its response. The user could initiate another I/O operation by clicking on a new URL, so there might be multiple outstanding I/O requests.

A common paradigm is to use threads for asynchronous I/O. Threads are well-understood, and programmers can still write synchronous code because a thread will just block when waiting for I/O, but the other threads will still run. However, threads have their limits, and operating system (OS) threads are somewhat costly. A program with ten threads is fine, but 100 threads may start to cause some worry. Once you get up to 1000 threads, you are "already in trouble".

For example, handling lots of sockets is problematic. The OS kernel imposes limits on the number of sockets, but those limits are typically one or two orders of magnitude larger than the number of threads that can be supported. That means you can't have a thread per connection if you want to be able to support the maximum number of connections on the system.

Beyond that, though, a "big problem" with OS threads is that they are preemptively scheduled, which means that a thread can be interrupted even if it isn't waiting for I/O. That leads to problems with variables and data structures shared between threads. Avoiding race conditions then requires adding locks, but that can lead to lock contention which slows everything down. Threads may be a reasonable solution in some cases, but there are tradeoffs.

The way to do asynchronous I/O without threads is by using select() and poll(), which is the mechanism that asyncore uses. But asyncore is showing its age, it isn't very extensible, and most people ignore it entirely and write their own asynchronous code using select() and poll(). There are various frameworks that can be used, including Twisted, Tornado, or ZeroMQ. Most of the C libraries (e.g. libevent, libev, libuv) that handle asynchronous I/O have Python wrappers available, but that gives them a "C-like API style". Stackless and gevent (along with a few others) provide another set of alternatives.

And that is part of the problem: there are too many choices. "Nobody likes callbacks" as an interface, or at least Van Rossum doesn't, and many of the choices rely on that. The APIs tend to be complicated partly because of the callbacks, and the standard library doesn't cooperate, so it is of "very little use" when using one of the solutions.

Advocates of gevent would claim that it solves all those problems, but "somehow it doesn't do it" for him. There are some "scary implementation details", including CPython- and x86-specific code. It does some "monkey patching" of the standard library to make it "sort of work". It also does not avoid the problem of knowing where the scheduler can switch tasks. There is a specific call that gets made to cause task switches to happen, but you never know when that may be called. Some library function could be making that call (or may in the future), for example. It essentially is the same situation as with OS threads. Beyond switching at unexpected times, there is also the problem of not switching enough, which can cause some tasks to be CPU starved. He "can't help" wanting to know if a particular line of code could end up suspending the current task.

A new framework

So, Van Rossum is working on "yet another standard framework that is going to replace all the other standard frameworks ... seriously", he said with a chuckle—to applause. The framework will standardize the event loop. There aren't too many choices for how to implement an event loop, he said. The ones that exist are all doing more or less the same things.

The event loop is special because it serializes the execution of the program. It guarantees that while your code is running, nothing else is, and that the shared state cannot be changed until "you say you are done with it", Van Rossum said. All of that implies that there should only be one event loop in a program. You can't really mix a Twisted event loop and a gevent event loop in the same program, which means that the existing frameworks do not interoperate.

Van Rossum looked at the existing frameworks and their event loops to look for commonality. The essential elements of an event loop are fairly straightforward. There needs to be a way to start and stop the loop. Some way to schedule a callback in the future (which might be "now") needs to be available, as well as a way to schedule repeated, periodic callbacks. The last piece is a way to associate callbacks with file descriptors (or other OS objects that represent I/O in some way) of interest. Depending on the OS paradigm, those callbacks can be made when the file descriptor is "ready" (Unix-like) or when it is "done" (Windows). There is also the need for the framework to abstract away choosing the proper I/O multiplexing mechanism (select(), poll(), epoll(), others) for the system in an intelligent way.

The existing frameworks do not interoperate today, and each has its strengths and weaknesses. Twisted is good for esoteric internet protocols, while Tornado is well-suited for web serving, but making them work together is difficult. There are various "pairwise" solutions for interoperability, but there are lots of pairs that are not covered.

So, he has come up with Python Enhancement Proposal (PEP) 3156 and a reference implementation called Tulip. Using a slide of the xkcd comic on standards, Van Rossum noted that he was solving the problem of too many standards by adding a new standard. But, he pointed out that PEP 3156 could actually be considered a standard because it will eventually end up in the standard library. That was greeted with some laughter.

"I know this is madness", he said, as everyone has their favorite framework. Suggestions to put Twisted in the standard library or to "go back further in history" and adopt Stackless (along with other ideas) were floated in the original mailing list thread. He did not completely make up his own framework, though, instead he looked at the existing solutions and adopted pieces that he felt made sense. Certain things from Twisted, particularly its higher-level abstraction for I/O multiplexing (which works well for Windows), as well as its Transports and Protocols, were adapted into Tulip.

So PEP 3156 is the interface for a standard event loop, while Tulip is an experimental prototype that will eventually turn into a reference implementation. Tulip will be available to use in Python 3.3, even after it is incorporated "under a better name" into the standard library for Python 3.4. Tulip will also serve as a repository for extra functionality that doesn't belong in the standard library going forward.

PEP 3156 is not just an event loop API proposal, it also proposes an interface to completely swap out the event loop. That means that other frameworks could plug in their event loop using a conforming adaptor and the user code would still work because it makes Tulip/3156 calls. The hope is that eventually the frameworks switch to using the standard event loop.

Callbacks without callbacks

There is even more to the PEP, to the point where some have suggested he split it into two pieces, which he may still do. The second major piece is a new way to write callbacks. Futures, a mechanism for running asynchronous computations, was introduced in PEP 3148 and added in Python 3.2. Tulip/3156 has adapted Futures to be used with coroutines as a way to specify callbacks, without actually using callbacks.

A Future is an abstraction for a value that has not yet been computed. The Futures class used in Tulip is not exactly the same as the Python 3.2 version, because instead of blocking when a result is required, as the earlier version does, it must use ("drum roll please") the yield from statement that came from PEP 380, which got added in Python 3.3. It is "an incredibly cool, but also brain-exploding thing", Van Rossum said.

While he wanted to emphasize the importance of yield from, he wasn't going to teach it in the talk, he said. The best way to think about it is that yield from is a magic way to block your coroutine without blocking the application. The coroutine will unblock and unsuspend when the event it is waiting on completes (or is ready). The way to think about Futures is to "try to forget they are there". A yield from and a Future just kind of cancel out and the value is what would be returned from the equivalent blocking function. That is the "best I can say it without bursting into tears", he said.

The fact that Futures have an API, with result() and exception() methods, as well as callbacks, can largely be ignored. One just calls a function that returns a Future and does a yield from on the result. Error handling is simplified compared to using callbacks because a normal try/except block can be used around the yield from.

Coroutines are basically just generators, and the @coroutine decorator is empty in the current Tulip code. It is purely there for the human reader of the code, though there may be some debug code added eventually. Coroutines by themselves do not give concurrency, it is the yield from that drives the coroutine execution.

Van Rossum was running on low on time, and said there was "lots more" he could talk about. If the interoperability story fails, the xkcd comic comes true, he said, but he is hopeful that over time the new framework "will help us move to a world where we can actually all get along". So that if someone finds some code that uses Twisted and other code that uses gevent, both of which are needed in their application, they will be able to use both.

"When can you have it?", he asked. The code and the PEP are very much in flux right now. He is pushing hard to have something complete by November 23, which is the cutoff for Python 3.4. By then, Tulip should be available in the PyPI repository. Once 3.4 is out the door, the rest of the standard library can be looked at with an eye toward making them work with the asynchronous framwork. Some pieces (e.g. urllib, socketserver) will likely need to be deprecated or will be emulated on top of PEP 3561. Older Pythons (i.e. 2.7) are "out of luck". He has no plans to support them, and hopes that the new framework serves as a carrot to move people to 3.3 (and beyond)—there are so many "silly things in older versions of the language". After a round of acknowledgments, Van Rossum left the stage, heading off to some code sprints scheduled as part of PyCon over the next two days.

Comments (3 posted)

Brief items

Quotes of the week

The code itself was like a magnificent ASCII waterfall or those Incan irrigation systems I often saw in AP Spanish. It worked, and nothing I read on Google told me that things were actually terribly wrong and that any programmer who read my code would never let me nor my progeny near the web again.

— Michelle Bu (courtesy of a re-tweet by Garrett LeSage)

4. WAKE UP IN A PANIC TO THE SOUND OF MONSTERS IN YOUR ROOM.

5. After getting out of bed and changing your pants, realize that after your computer restarted, Chrome helpfully re-opened all of your tabs, including Netflix, and so it restarted playing the episode of Supernatural that you watched before bed.

— Michael Lehenbauer, detailing the final two "steps to reproduce the problem" in his Chrome bug report.

Comments (1 posted)

OpenSSH 6.2 released

OpenSSH 6.2 is out. New features include some new encryption modes, the ability to require multiple authentication protocols (requiring both public key and a password, for example), key revocation list support, better seccomp-filter sandbox support, and more.

Full Story (comments: 24)

GCC 4.8.0 released

The GCC 4.8.0 release is out. "Extending the widest support for hardware architectures in the industry, GCC 4.8 has gained support for the upcoming 64-bit ARM instruction set architecture, AArch64. GCC 4.8 also features support for Hardware Transactional Memory on the upcoming Intel Haswell CPU architecture." There's a lot of new stuff in this release; see the changes file and LWN's GCC 4.8.0 coverage for details.

Full Story (comments: none)

Calligra document viewer available on Android

Sebastian Sauer has announced the availability of the first version of the Calligra office suite for Android systems. For now, the focus is on providing a viewer for ODT documents. "Since bringing a whole Office suite to another platform is a huge task and I am a small team I had to focus. Later on I plan to add doc/docx support, editing, saving and Calligra Sheets (spreadsheets) and Calligra Stage (presentations)." The application can be installed from the Play Store.

Comments (7 posted)

GTK+ 3.8.0 released

GTK+ 3.8.0 has been released. This version includes support for Wayland 1.0, and contains many new features and performance improvements.

Full Story (comments: 36)

Terminology 0.3 available

Carsten "Rasterman" Haitzler has released version 0.3 of Terminology, an EFL-based terminal emulator billed as "the fanciest terminal emulator out there." The newest additions to Terminology's fanciful lineup include tabs, split mode, and the ability to play multimedia in the background via escape codes. Which does sound pretty fancy after all.

Full Story (comments: 12)

Upstart 1.8 available

James Hunt has released version 1.8 of the Upstart init system. This version adds two new features: upstart-file-bridge, "a new bridge that allows jobs to react to file events", and upstart-monitor, a tool for watching event flows (and which includes both GUI and CLI modes).

Full Story (comments: none)

GNOME 3.8 released

The GNOME 3.8 release is out. "The exciting new features and improvements in this release include a integrated application search, privacy and sharing settings, notification filtering, a new classic mode, OwnCloud integration, previews of clocks, notes, photos and weather applications, and many more." See the release notes for details.

Full Story (comments: 138)

Newsletters and articles

Development newsletters from the past week

Caml Weekly News (March 26)
What's cooking in git.git (March 21)
What's cooking in git.git (March 26)
Haskell Weekly News (March 20)
Openstack Community Weekly Newsletter (March 22)
Perl Weekly (March 25)
PostgreSQL Weekly News (March 24)
Ruby Weekly (March 21)

Comments (none posted)

Russell: GCC and C vs C++ Speed, Measured

Rusty Russell ran an investigation to determine whether code compiled with the GCC C++ compiler is slower than code from the C compiler. "With this in mind, and Ian Taylor’s bold assertion that 'The C subset of C++ is as efficient as C', I wanted to test what had changed with some actual measurements. So I grabbed gcc 4.7.2 (the last release which could do this), and built it with C and C++ compilers". His conclusion is that the speed of the compiler is the same regardless of how it was built; using C++ does not slow things down.

Comments (24 posted)

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

John Regehr explains how new optimizations in GCC 4.8.0 can break code making use of undefined behavior. "A C compiler, upon seeing d[++k], is permitted to assume that the incremented value of k is within the array bounds, since otherwise undefined behavior occurs. For the code here, GCC can infer that k is in the range 0..15. A bit later, when GCC sees k<16, it says to itself: 'Aha-- that expression is always true, so we have an infinite loop.'"

Comments (71 posted)

Replacing Google Reader (The H)

The H has an extensive survey of available RSS reader applications, both open source and proprietary. "ownCloud is a complete self-hosted service platform that provides file sharing and collaboration features including calendaring, to do lists, a document viewer, and integration with Active Directory and LDAP. The software also includes a feed reader application, which started as a Google Summer of Code effort and takes many design cues from Google Reader".

Comments (13 posted)

Page editor: Nathan Willis
Next page: Announcements>>