OLS: On how user space sucks
Dave set out to reduce the time it took his Fedora system to boot. In an attempt to figure out what was taking so long, he instrumented the kernel to log certain basic file operations. As it turned out, the boot process involved calling stat() 79,000 times, opening 27,000 files, and running 1382 programs. That struck him as being just a little excessive; getting a system running shouldn't require that much work. So he looked further. Here are a few of the things he found:
- HAL was responsible for opening almost 2000 files. It will read
various XML files, then happily reopen and reread them multiple
times. The bulk of these files describe hardware which has never been
anywhere near the system in question. Clearly, this is an application
which could be a little smarter about how it does things.
- Similar issues were found with cups, which feels the need to open the
PPD files for every known printer. The result: 2500 stat()
calls and 400 opens. On a system with no attached printer.
- X.org, says Dave, is "awesome." It attempts to figure out where a
graphics adapter might be connected by attempting to open almost any
possible PCI device, including many which are clearly not present on
the system. X also is guilty of reopening library files many times.
- Gamin, which was written to get poll() loops out of
applications, spends its time sitting in a high-frequency
poll() loop. Evidently the real offender is in a lower-level
library, but it is the gamin executable which suffers. As Dave points
out, it can occasionally be worthwhile to run a utility like
strace on a program, even if there are no apparent bugs. One
might be surprised by the resulting output.
- Nautilus polls files related to the desktop menus every few seconds,
rather than using the inotify API which was added for just this
purpose.
- Font files are a problem in many applications - several applications
open them by the hundred. Some of those applications never present
any text on the screen.
- There were also various issues with excessive timer use. The kernel blinks the virtual console cursor, even if X is running and nobody will ever see it. X is a big offender, apparently because the gettimeofday() call is still too slow and maintaining time stamps with interval timers is faster.
There were more examples, and members of the audience had several more of their own. It was all great fun; Dave says he takes joy in collecting train wrecks.
The point of the session was not (just) to bash on particular applications, however. The real issue is that our systems are slower than they need to be because they are doing vast amounts of pointless work. This situation comes about in a number of ways; as applications become more complex and rely on more levels of libraries, it can be hard for a programmer to know just what is really going on. And, as has been understood for many years, programmers are very bad at guessing where the hot spots will be in their creations. That is why profiling tools so often yield surprising results.
Programs (and kernels) which do stupid things will always be with us. We
cannot fix them, however, if we do not go in and actually look for the
problems. Too many programmers, it seems, check in their changes once they
appear to work and do not take the time to watch how their programs
work. A bit more time spent watching our applications in operation might
lead to faster, less resource-hungry systems for all of us.
| Index entries for this article | |
|---|---|
| Conference | Linux Symposium/2006 |
