|LWN.net needs you!|
Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing
One tends to think of "the NASDAQ" as a single exchange based in the US, but, in fact, NASDAQ OMX operates exchanges all over the world - and they run on Linux. In the US for instance, that includes markets like the NASDAQ Stock Market, The NASDAQ Options Market, and NASDAQ OMX PSX, its newest market that launched on October 8. At a brief presentation at the Linux Foundation's invitation-only End User Summit in Jersey City, NASDAQ OMX vice president Bob Evans talked about the ups and downs of using Linux in a seriously mission-critical environment.
NASDAQ OMX's exchanges run on thousands of Linux-based servers. These servers handle realtime transaction processing, monitoring, and development as well. The big challenge in this environment, of course, is performance; real money depends on whether the exchange can keep up with the order stream. Latency matters as much as throughput, though; orders must be responded to (and executed) within bounded period of time. Needless to say, reliability is also crucially important; down time is not well received, to say the least.
To meet these requirements, NASDAQ OMX runs large clusters of thousands of machines. These clusters can process hundreds of millions of orders per day - up to one million orders per second - with 250µs latency.
According to Bob, Linux has incorporated some useful technologies in recent years. The NAPI interrupt mitigation technique for network drivers has, on its own, freed up about 1/3 of the available CPU time for other work. The epoll system call cuts out much of the per-call overhead, taking 33µs off of the latency in one benchmark. Handling clock_gettime() in user space via the VDSO page cuts almost another 60ns. Bob was also quite pleased with how the Linux page cache works; it is effective enough, he says, to eliminate the need to use asynchronous I/O, simplifying the code considerably.
On the other hand, there are some things which have not worked out as well for them. These include I/O signals; they are complex to program with and, if things get busy, the signal queue can overflow. The user-space libaio asynchronous I/O (AIO) implementation is thread-based; it scales poorly, he says, and does not integrate well with epoll. Kernel-based asynchronous I/O, instead, lacks proper socket support. He also mentioned the recvmsg() system call, which requires a call into the kernel for every incoming packet.
There is some new stuff coming along which shows some promise. The new recvmmsg() system call can receive multiple packets with a single call. For now, though, it is just a wrapper around the internal recvmsg() implementation and does not hold the socket lock across the entire operation. But, he said, recvmmsg() is a good example of how the ability to add new APIs to Linux is a good thing. He also likes the combination of kernel-based AIO and the eventfd() system call; that makes it possible to integrate file-based AIO into an applications normal event-processing loop. There is also some potential in syslets, which he sees as a way of delivering cheap notifications to user space; it's not clear whether syslets will scale usefully, though.
What NASDAQ OMX would really like to see in Linux now is good socket-based AIO. That would make it possible to replace epoll/recvmsg/sendmsg sequences with fewer system calls. Even better would be if the kernel could provide notifications for multiple events at a time. Best would be if the interface to this functionality were completely based on sockets. He described a vision of an "epoll-like kernel object" which would handle in-kernel network traffic processing. The application could post asynchronous send and receive requests to the queue, and receive notifications when they have been executed. He would like to see multiple sockets attached to a single object, and a file descriptor suitable for passing to poll() for notifications. With a setup like that, it should be possible to push more network traffic through the kernel with lower latencies.
In summary, NASDAQ OMX seems to be happy with its use of Linux. They also seem to like to go with current software - the exchange is currently rolling out 18.104.22.168 kernels. "Emerging APIs" are helping operations like NASDAQ OMX realize real-world performance gains in areas that matter. Linux, Bob says, is one of the few systems that are willing to introduce new APIs just for performance reasons. That is an interesting point of view to contrast with Linus Torvalds's often-stated claim that nobody uses Linux-specific APIs; it seems that there are users, they just tend to be relatively well hidden.
Copyright © 2010, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds