OLS: On how user space sucks
[Posted July 20, 2006 by corbet]
Dave Jones's OLS talk, titled "
Why
user space sucks," was certain to be
popular at a setting like this. So many of the people in the standing room
only crowd might well have wondered why this talk was not scheduled into
the larger room. Perhaps the powers that be feared that a non-kernel talk would not have
a large audience - even when it is given by a well-known kernel hacker.
Dave set out to reduce the time it took his Fedora system to boot. In an
attempt to figure out what was taking so long, he instrumented the kernel
to log certain basic file operations. As it turned out, the boot process
involved calling stat() 79,000 times, opening 27,000 files, and
running 1382 programs. That struck him as being just a little excessive;
getting a system running shouldn't require that much work. So he looked
further. Here are a few of the things he found:
- HAL was responsible for opening almost 2000 files. It will read
various XML files, then happily reopen and reread them multiple
times. The bulk of these files describe hardware which has never been
anywhere near the system in question. Clearly, this is an application
which could be a little smarter about how it does things.
- Similar issues were found with cups, which feels the need to open the
PPD files for every known printer. The result: 2500 stat()
calls and 400 opens. On a system with no attached printer.
- X.org, says Dave, is "awesome." It attempts to figure out where a
graphics adapter might be connected by attempting to open almost any
possible PCI device, including many which are clearly not present on
the system. X also is guilty of reopening library files many times.
- Gamin, which was written to get poll() loops out of
applications, spends its time sitting in a high-frequency
poll() loop. Evidently the real offender is in a lower-level
library, but it is the gamin executable which suffers. As Dave points
out, it can occasionally be worthwhile to run a utility like
strace on a program, even if there are no apparent bugs. One
might be surprised by the resulting output.
- Nautilus polls files related to the desktop menus every few seconds,
rather than using the inotify API which was added for just this
purpose.
- Font files are a problem in many applications - several applications
open them by the hundred. Some of those applications never present
any text on the screen.
- There were also various issues with excessive timer use. The kernel
blinks the virtual console cursor, even if X is running and nobody
will ever see it. X is a big offender, apparently because the
gettimeofday() call is still too slow and maintaining time
stamps with interval timers is faster.
There were more examples, and members of the audience had several more of
their own. It was all great fun; Dave says he takes joy in
collecting train wrecks.
The point of the session was not (just) to bash on particular applications,
however. The real issue is that our systems are slower than they need to
be because they are doing vast amounts of pointless work. This situation
comes about in a number of ways; as applications become more complex and
rely on more levels of libraries, it can be hard for a programmer to know
just what is really going on. And, as has been understood for many years,
programmers are very bad at guessing where the hot spots will be in their
creations. That is why profiling tools so often yield surprising results.
Programs (and kernels) which do stupid things will always be with us. We
cannot fix them, however, if we do not go in and actually look for the
problems. Too many programmers, it seems, check in their changes once they
appear to work and do not take the time to watch how their programs
work. A bit more time spent watching our applications in operation might
lead to faster, less resource-hungry systems for all of us.
(
Log in to post comments)