LWN.net Logo

Advertisement

E-Commerce & credit card processing - the Open Source way!

Advertise here

The Native POSIX Thread Library

Readers of the LWN Kernel Page have been aware of the intensive effort to improve threading performance on Linux - at least from the kernel point of view. Now, with the announcement of version 0.1 of the Native POSIX Thread Library (NPTL), the user-space side of this project has come into view. This article will take a look at the technical and performance aspects of NPTL; the next will wander briefly into the political issues.
Advertisement

Threads, of course, are processes (or something that looks like processes) that share an address space and various other resources. Multi-threaded applications can be tricky to write (they end up presenting the same sorts of problems with race conditions that operating system programmers have to deal with), but they can be a good solution to a number of programming challenges. Your web browser, for example, likely keeps one thread around to respond quickly to user events (mouse clicks and such) while another thread downloads a web page and yet another one renders it onto the screen. Java programs also tend to be highly threaded. Some applications can create many thousands of threads; obviously, such applications can only be reasonably run on systems with top-quality thread support.

Threading can be implemented entirely in user space, in kernel space, or a combination of both. User-space threads are traditionally lighter weight, since they do not require system calls and do not run in independent processes. They can be tricky to make work in all situations, however, and a pure user-space implementation can not make good use of multiprocessor systems, since all threads run within a single process. So most operating systems provide some degree of kernel support for threading. Linux has long had this support via the flexible clone() system call, which allows a great deal of control over which resources are to be shared with the new thread, and which are to be private.

Pure kernel-based threads are often perceived as being slow, however, since the kernel scheduler must be invoked to switch between threads. So conventional wisdom has often said that the best way to get good thread performance is with the "M:N model." M:N is a hybrid approach, where M user space threads run on each of N kernel threads. The multiple kernel threads allow the application to use all processors on the system, while keeping the performance benefits of doing (most) switching between user-space threads. Many people have said that the key to fixing the (not great) performance of Linux threads is adopting the M:N approach.

So it is interesting to note that NPTL has, instead, stayed with the 1:1 pure kernel thread model. NPTL authors Ulrich Drepper and Ingo Molnar took a close look at the problem, and came to the conclusion that 1:1 was, in the end, the more promising approach. Their reasoning can be found in the NPTL white paper (available in PDF format); the main points are:

  • The kernel problems which slow down thread performance can be eliminated; that has been the focus of Ingo Molnar's work.

  • An M:N threading implementation requires two schedulers: the usual kernel scheduler, and a user-space implementation. Getting the two to work together for best performance is difficult - especially if you do not want to impact performance for the system as a whole. Rather than duplicate the scheduling function, the NPTL implementers felt it was better to use the (highly optimized) kernel scheduler exclusively.

  • Signal handling is the bane of many threading implementations, and M:N implementations have an even harder time of it. The 1:1 model leaves signal handling in the kernel.

  • User-space threading implementations have to go to great length to ensure that one thread, when it performs a blocking operation, does not block all threads running under that process; this can be a complex task. Kernel-based threads naturally schedule (and block) independently of each other.

Finally, the 1:1 implementation is generally simpler, since user space need not duplicate functionality already found in the kernel.

Of course, all of that means little if the 1:1 model is unable to perform up to expectations. The benchmarking process has just begun, but the initial signs are encouraging. Ingo ran one [Benchmark results] test where he started up and ran 100,000 concurrent threads - in less than two seconds. This test would have taken about 15 minutes before the threading improvements went into the kernel.

Ulrich Drepper has posted some other benchmarks which mostly measure thread creation and shutdown time; some of his results can be seen in the chart to the right.. Such a test should naturally favor the M:N model, since user-space thread creation and destruction can be performed without any system calls. And, in fact, the M:N Next Generation POSIX Threading (NGPT) implementation beat standard Linux threads by at least a factor of two in these tests. The NPTL library, however, beat NGPT by about a factor of four. So the initial indications are that NPTL can deliver the goods. And this is only the 0.1 release.


(Log in to post comments)

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.