LWN.net Logo

KS2007: The customer panel

By Jonathan Corbet
September 8, 2007
LWN.net Kernel Summit 2007 coverage

An occasional kernel summit feature is the customer panel, which gives people doing interesting things with Linux the opportunity to share their experiences (and frustrations) with the developers. At the 2007 gathering, this panel was made up of Sean Kamath (Dreamworks), Head Bubba (Credit Suisse), and Marcus Rex, who represented the Linux Foundation vendor and user advisory councils. Together, they presented an interesting picture of the sorts of troubles one runs into when pushing the edge with Linux.

Dreamworks

Sean lead off the group. Dreamworks uses about 2700 systems to perform rendering, along with some 1200 desktops - and they all run Linux. They are mostly multi-core SMP 64-bit systems with a lot of shared data, so Dreamworks makes heavy use of NFS and the automounter.

Animation design and rendering involves the use of large amounts of memory. So Dreamworks has a lot of people running very large applications, in the form of interactive animation tools and batch rendering engines. One of their biggest problems appears to be swapping; most machines tend to run into swap most of the time despite being equipped with generous amounts of memory. The batch rendering jobs which run during the night can push things to the point that the out-of-memory killer comes into play, with the usual results: somehow the wrong process always gets killed. It is only recently that they discovered the /proc OOM-control parameters and started making use of them. It was noted that this is a common problem; we provide many useful features in the kernel, but they remain unused because people do not know about them.

Even when the OOM killer does not come into play, Dreamworks employees have the morning hangover problem in a big way. The overnight rendering cranker will have shoved everything useful for interactive work (including, possibly, a large rendering application) out to swap, where it languishes until somebody tries to resume work the next morning. It takes a long time for everything to swap back in, to the point that it is often quicker to just restart the application. This is, of course, the classic use case for the swap prefetch patch. There was a brief discussion of swap prefetch, but everybody seemed to realize that there was little point in trying to resolve that question in this setting. So there is still no decision on swap prefetch - a situation which should not be surprising at this point.

Sean would like better ways to quantify memory usage, and to control it as well. There appears to be help on the way in the form of the memory controller patch, once that makes its way into the mainline. With the exception of a small number of other issues (an NFS mounting regression which should be fixed in 2.6.23, for example), Dreamworks appears to be happy with its Linux-based rendering setup.

Credit Suisse

Coming next was the Credit Suisse IT manager known as Head Bubba. Credit Suisse, a bank with some 45,000 employees, uses Linux to manage and execute literally millions of dollars per second in trades. Most interestingly, he said that the in-house developers are working increasingly with current, upstream kernels, often enhanced with the realtime patches. These developers would rather work with the community than with the distributors, who, they feel, are holding things back. So Credit Suisse may set up an internal "center of excellence" for Linux to support the use of stock mainline kernels in its operation. Mr. Bubba did not say this himself, but there would appear to be a message here that the distributors need to hear.

That said, there can be challenges associated with the use of mainline kernels. The applications run at Credit Suisse are very carefully tuned to work with the Linux scheduler. If the behavior of the scheduler changes, things do not work so well anymore. So, among other things, Credit Suisse developers need good information about the changes which are coming. Your editor resisted the temptation to try to sell him an LWN subscription on the spot.

They like the realtime patches; as it happens, running in a real time mode not only reduces latency (a critical consideration for them), but improves throughput at the same time. What they really want is the combination of realtime and RDMA. It's not necessarily RDMA itself that they like, but, at this time, RDMA seems to best provide the sort of performance characteristics that Credit Suisse needs.

Better diagnostic tools are needed; offerings like Concurrent NightStar and Intel ATOM were mentioned. Mr. Bubba asked for a better SystemTap implementation, setting off a small side-discussion on the quality of that tool. Some developers discuss those patches in terms which are hard to reproduce in a work-safe publication; others feel that most of the work is reasonable. Tracing tools are a problem area; this subject came up again later in the session.

TCP/IP jitter is a big problem for Credit Suisse. In some situations, TCP connections are subject to pauses of up to 40ms caused by the Nagle algorithm and the congestion avoidance code. TCP slow start, it seems, just does not work well for some applications. This is not a small issue; Credit Suisse has a lot of customers who lurk for some time, then decide to spring into action when the market conditions are to their liking. 40ms can be enough for those conditions to move on, robbing a trader of the intended profit from the deal.

When brief pauses so visibly cause your customers to lose money, you tend to put significant resources into tracking them down and fixing them. Head Bubba had a set of plots demonstrating the problem with a variety of network interfaces and parameter settings. Part of the appeal of RDMA come from the fact that RDMA seems to be less subject to this kind of problem. Credit Suisse is also playing with things like user-space TCP/IP stacks. But it would be nicer to get the TCP problems fixed.

Developers at Credit Suisse understand that saying "our top-secret proprietary in-house application isn't working" is a hard way to get results from the development community. A little more information is required before a serious attempt to solve the problem can be made. Part of the problem would appear to be that Credit Suisse would like to get full bandwidth out of high-performance network adapters while sending large numbers of small (64-byte) packets. Network stack tuning tends to be tuned more for transfers of larger amounts of data.

Finding a way to demonstrate the problem for the development community would be a most useful thing to do. So Credit Suisse's developers are working on a set of test cases which will reproduce the sort of behavior exhibited by those applications; the test cases will then be made available to the community. David Miller, the networking maintainer, welcomed this plan; any problem which he can reproduce, he says, will get fixed immediately.

Linux Foundation

Finally, Marcus Rex presented the wishlists which have been developed for the Linux Foundation by its advisory councils. Many of these have been seen before and won't be covered in great detail here. They would like to see a single kernel which can work with all of the various virtualization options out there - a problem which is pretty well solved at this point. Better power management - "green computing" and all that - remains on the wishlist. Better hardware support is an item which will probably never go away; there are still members asking for better binary module support, unfortunately.

There is interest in better security options - and, in particular, security which is easier to manage. Some users are asking for an integrated kernel debugger, though nobody is entirely sure why. A related wish has to do with tracing tools - this led to another discussion of SystemTap. There are problems with the fact that much of the tracing code remains outside of the mainline kernel. It is hard to blame the tracing hackers, though - they have been trying, in one way or another, to merge tracing for several years. At this point, many of them have despaired of ever getting tracing into the mainline. We may see, in the near future, increased pressure to get some of the tracing code merged even if there are still developers who are opposed to it.

Also on the wishlist were the usual scalability stuff, especially for the sorts of loads (databases, for example) which don't currently scale quite as well on Linux. IPv6 readiness is wanted; there are increasingly stringent governmental requirements for IPv6 coming into force which must be addressed. They would like to see more formal testing happening. And there was a request around ZFS: it seems that it's not the filesystem itself they want, but the relatively easy administration that ZFS offers. RDMA was also on the list. These topics did not generate a whole lot of discussion, though; the belated arrival of coffee outside of the meeting room appeared to be a strong distraction by this stage of the session.


(Log in to post comments)

KS2007: The customer panel

Posted Sep 9, 2007 1:40 UTC (Sun) by linuxbox (subscriber, #6928) [Link]

Is no one really sure why people ask for an integrated kernel debugger? I value KDB a great deal. I regularly patch it into mainline kernels. I'm obviously not alone.

KS2007: The customer panel

Posted Sep 9, 2007 5:14 UTC (Sun) by NCunningham (guest, #6457) [Link]

+1. I just about habitually tell TuxOnIce users to apply it when they're trying to diagnose problems with hibernation. It's pretty useful for finding out which driver is stopping them from successfully hibernating. (It doesn't help in all circumstances, but it does in a good proportion).

Nagle and Credit Suisse

Posted Sep 9, 2007 8:32 UTC (Sun) by gdt (guest, #6284) [Link]

Did Credit Suisse say why their application does not disable the Nagle algorithm using the TCP_NODELAY socket option? If the application source is not available then they can write a Netfilter module to set the socket option (I wrote a similar module to select the TCP algorithm for binary programs, I owe the netdev people a similar patch which uses the routing table).

Nagle and Credit Suisse

Posted Sep 9, 2007 13:36 UTC (Sun) by corbet (editor, #1) [Link]

They do disable Nagle and do just about everything else they can possibly think of. But they still run into problems.

Dreamworks

Posted Sep 10, 2007 13:20 UTC (Mon) by joedrew (subscriber, #828) [Link]

I work in the 3D animation business, so Dreamworks' presentation isn't surprising to me. For those who don't know, it's not just Dreamworks that runs Linux -- it's the entire 3D animation industry. Sony Pictures Imageworks, Digital Domain, Dreamworks, Disney, etc. All large studios run Linux. (And they're the real reason that companies like NVIDIA and AMD have had good, but proprietary, 3D drivers for so long -- there's huge money in the high-end professional market.)

Their apparent lack of technical knowledge isn't surprising, though. Their core competency isn't Linux, it's 3D, and so they spend their R&D budget on that. They don't use Linux because it's Free Software, they use it because it's free software. (Its capabilities help too.) That's a good thing, though -- 3D animation companies are a huge user of Linux, but they're a user of Linux that tends much more towards the "Aunt Mae" than the usual user of Linux.

Dreamworks

Posted Sep 10, 2007 23:52 UTC (Mon) by pr1268 (subscriber, #24648) [Link]

> They don't use Linux because it's Free Software, they use it because it's free software.

That's an interesting perspective. It didn't initially occur to me that Dreamworks' server farms of thousands of computers would pose an enormous expense just for the operating system licensing alone, had they decided to use a proprietary OS. Not to mention that certain OSes charge license fees based on the number of physical CPUs per server, or by the number of incoming connections (I won't mention any names).

Dreamworks

Posted Sep 12, 2007 1:13 UTC (Wed) by landley (guest, #6789) [Link]

There are historical reasons as well. Don't forget that Silicon Graphics
was a Unix shop (VAX through Irix), and SGI discontinued Irix in favor of
Linux at the start of the decade.

KS2007: The customer panel

Posted Sep 10, 2007 18:39 UTC (Mon) by garloff (subscriber, #319) [Link]

I'm glad someone found /proc/$PID/oom_adj.

KS2007: The customer panel

Posted Sep 20, 2007 11:05 UTC (Thu) by renox (subscriber, #23785) [Link]

About the need to send quicky small packets: we have a similar need in my company for redondancy purpose (pinging other computer to ensure that they are working): a 40ms delay is a lot (we're not using TCP though to do this but UDP, and we're still in 2.4, so I don't know if in 2.6 we would have the same issue of not).

About the 'swap prefetch', it's sad how it was rejected out of the kernel as kernel dev saw no need of this feature: here's a customer complaining, maybe kernel devs will change their mind (though it would need a new maintainer..).

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds