is the main repository for
the Linux kernel source, numerous development trees, and a great deal of
associated material. It also offers mirroring for some other Linux-related
projects - distribution CD images, for example. Users of kernel.org have
occasionally noticed that the service is rather slow. Kernel tree releases
are a long time in making it to the front page, and the mirror network
tends to lag behind. This important part of the kernel's development
infrastructure, it seems, is not keeping up with demand.
Discussion on the mailing lists reveal that the kernel.org servers (there
are two of them) often run with load averages in the range of 2-300. So
it's not entirely surprising that they are not always quite as responsive
as one would like. There is talk of adding servers, but there is also a
sense that the current servers should be able to keep up with the load. So
the developers have been looking into what is going on.
The problem seems to originate with git. Kernel.org hosts quite a few git
repositories and a version of the gitweb system as well - though gitweb is
often disabled when the load gets too high. The git-related problems, in
turn, come down to the speed with which Linux can read directories. According to kernel.org administrator H. Peter
During extremely high load, it appears that what slows kernel.org down more
than anything else is the time that each individual getdents() call takes.
When I've looked this I've observed times from 200 ms to almost 2 seconds!
Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
packed tree, you can do the math yourself.
Clearly, something is not quite right with the handling of large
filesystems under heavy load. Part of the problem may be that Linux is not
dedicating enough memory to caching directories in this situation, but the
real problems are elsewhere. It turns out that:
- The getdents() system call, used to read a directory, is, according to Linus, one of the most
expensive in Linux. The locking is such that only one process can be
reading a given directory at any given time. If that process must
wait for disk I/O, it sleeps holding the inode semaphore and blocks
all other readers - even if some of the others could work with parts
of the directory which are already in memory.
- No readahead is done on directories, so each block must be read, one
by one, with the whole process stopping and waiting for I/O each time.
- To make things worse, while the ext3 filesystem tries hard to lay out
files contiguously on the disk, it does not make the same effort with
directories. So the chances are good that a multi-block directory
will be scattered on the disk, forcing a seek for each read and
defeating any track caching the drive may be doing.
It has been reported that the third of the above-listed problems can be
addressed by moving to XFS, which
does a better job at keeping directories together. Kernel.org could make such
a switch - at the cost of about a week's downtime for each server. So one
should not expect it to happen overnight.
The first priority for improving the situation is, most likely, the
implementation of some sort of directory readahead. That change would cut
the amount of time spent waiting for directory I/O and, crucially, would
require no change to existing filesystems - not even a backup and restore -
to get better performance. An early readahead patch has been circulated,
but this issue looks complex enough that a few iterations of careful work
will be required to arrive at a real solution. So look for something to
show up in the 2.6.21 time frame.
to post comments)