September 16, 2009
This article was contributed by Nathan Willis
Two traditionally proprietary companies made open source releases
recently: Facebook released a Python-based web server and application
framework called Tornado, and
Apple released a thread-pool management system called Grand
Central Dispatch. It is not the first open source code release for
either company, but both projects are worth examining. Tornado
is designed to suit specific types of web applications and is reportedly
very fast, while Grand Central Dispatch may cause some developers to re-think
task-parallelism.
This Tornado serves you
Tornado is actually a product of FriendFeed, the
social-networking-aggregator acquired by Facebook in August. It consists
of a web server and surrounding framework (all written in Python), tailored
to handle a very large number of established, open connections. The web
server component (tornado.web) is "non blocking" — meaning that it is
event-driven, designed around the Linux kernel's epoll
facility, and can thus maintain large numbers of open TCP sockets without
tying up excessive memory and without large numbers of threads.
Event-driven Web servers like Tornado are single-threaded; each thread
can manage potentially thousands of open connections as long as the application
does not block while it waits for data from the socket — the thread simply
polls them each in turn. Additional connections can be handled by running
multiple server processes on SMP systems. In contrast, traditional web
servers are blocked from
handling additional connections while they wait for I/O, or must spawn
additional threads to handle additional connections at the cost of
context-switching and increased
memory use.
In addition to the web server itself, the Tornado release includes a
suite of modules used to build web applications, including XHTML, JSON, and
URL decoding, a MySQL database wrapper, a localization and translation
module, a Python templating engine, an HTTP client, and an authentication
engine. The latter supports third-party schemes such as OAuth and OpenID, plus site-specific schemes used by
Facebook, Yahoo, and Twitter.
The Tornado code is hosted on GitHub and is
available under the Apache 2.0 license. Tornado works with Python 2.5 and
2.6, and requires PycURL and a
working JSON library. Documentation is available on
tornadoweb.org, and a live demo "chat" application is running on http://chan.friendfeed.com:8888/.
FriendFeed's Bret Taylor announced the
release on his blog, comparing Tornado to web.py and Google webapp.
He claims that in Apache Benchmark tests, Tornado was able to handle four
times the number of requests per second (or more) of competing frameworks,
including web.py, Django, and
CherryPy.
Taylor's post, and the subsequent discussion, sparked some controversy
among users and developers of the Twisted framework, who objected to
disparaging comments about Twisted's code maturity and suitability.
Twisted founder Glyph Lefkowitz posted a lengthy response
responding to the claims made about Twisted, but, overall, approving of the
Tornado release itself. Matt Heitzenroder posted his own head-to-head performance
tests that show Tornado beating Twisted.web, but not dramatically.
Aside from performance numbers, many in the open source community seemed
impressed by what Tornado offers — a simple framework for building "long
polling" web applications, including support for everything from
templating to cookie management to localization in a single package. Since
Tornado has proven itself viable as the framework underlying FriendFeed, it
is likely to pick up a significant following as an open source project.
Invisible threads
Apple's Grand Central Dispatch (GCD) is an operating system-level
feature that debuted in the recent release of OS X 10.6 ("Snow Leopard").
GCD is essentially a mechanism to allow application developers to
parallelize their code, but let the OS worry about intelligently managing
the threads. GCD determines the maximum number of concurrent threads for
the system and manages the queues for all running applications. Thus the
application developer only needs to write GCD-capable code, and trusts the
OS to take optimal advantage of multiple cores and multiple processors.
Apple's source code
release consists of the Apache-licensed user space API library libdispatch and changes to the XNU
kernel, Apple's open source Mach-based kernel common to OS X and Darwin.
The XNU changes reportedly improve performance of the event notification
interface Kqueue. GCD also relies on a
non-standard extension to C, C++, and Objective-C known as "blocks,"
however, so blocks support in the compiler is a prerequisite for
application developers wishing to take advantage of GCD. Blocks are
supported for the LLVM compiler through the
compiler-rt project.
Because GCD abstracts thread creation from the application developer, it
is most similar to OpenMP or Intel's
Threading Building
Blocks (TBB). All three allow the developer to designate portions of
code as "tasks" to be parallelized in some fashion. GCD is different in
that it leverages a language feature (blocks) rather than the preprocessor
directives of OpenMP or templates of TBB. In addition, TBB is limited to
C++, though OpenMP is available for C, C++, and Fortran.
Blocks are essentially inline-defined, anonymous functions. They are
designated by a caret (^) in place of a function name, take arguments like
any function, and can optionally return a value. Blocks are different in
that they have read-only access to variables from their parent scope (a
feature similar to "closures" in languages such as Ruby). Consequently, in
replacing a for loop with GCD's parallel equivalent,
dispatch_apply, the developer can write a block containing the
loop's contents without the hassle of passing extra arguments to it just to
access variables that were available to the loop.
From Apple's Concurrency
Programming Guide, the following example loop iterates count
times:
for (i = 0; i < count; i++) {
printf("%u\n",i);
}
which could be expressed as a block ready for GCD as follows:
dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_apply(count, queue, ^(size_t i) {
printf("%u\n",i);
});
When executed, GCD creates count tasks, one for each iteration
of the block, placing them on a task queue. GCD makes a default queue
available through dispatch_get_global_queue(), but developers
can create private queues if they wish; to serialize access to a
shared data structure, for example. In the traditional parallelizing-a-for-loop
example, tasks are queued asynchronously, but GCD provides several
mechanisms for monitoring completion of tasks, such as callbacks and
semaphores.
Apple provides a basic introduction
to GCD and programming with blocks on its developers' site. In addition,
the OS X scientific research community at MacResearch.org has a detailed tutorial
complete with GCD examples and the equivalent code written for OpenMP.
MacResearch.org has basic performance numbers posted for its tutorial code,
and Apple has posted a benchmarking
sample that compares GCD against serialized code and native POSIX
threads.
So far, GCD is only implemented for Mac OS X, but reaction from the
developer community has been positive. Having the operating system worry
about the details of thread pool management seems like a winning idea; most
of the discussion on Mac forums has revolved around the wisdom of relying
on a language extension such as blocks. Ars Technica commented
on places where Linux could benefit from a native GCD implementation, such
as in higher-level frameworks like QtConcurrent,
but notes that use of the Apache license limits integration to projects
using GPL version 3 and later.
Impact
Apple and Facebook have a history of making periodic releases of code
projects under open source terms, even though both enjoy a reputation for
maintaining "walled gardens" around their core products. As is predictable
when large proprietary companies release open source code, considerable
energy has been expended on the web speculating as to what each company
hoped to "gain" from the release. A leading theory for GCD is that Apple
hopes to further the adoption of blocks into standard C and C++, but no
consensus has yet emerged for why Tornado was released.
In fact, neither Tornado nor GCD has made major waves in the open
source community, but if the initial reaction is a good indicator, both are
solid and valuable products. GCD is the likelier of the two to stir up
passionate debate going forward, as fully assimilating it into mainstream
Linux would require touching not one but two of the fundamental pillars of
the community: the kernel and the compiler. Although LLVM has its fans,
the Linux community is still predominantly a GCC ecosystem. Pushing Apple
code into the Linux kernel and into GCC won't happen lightly.
(
Log in to post comments)