Why kernel.org is slow
Discussion on the mailing lists reveal that the kernel.org servers (there are two of them) often run with load averages in the range of 2-300. So it's not entirely surprising that they are not always quite as responsive as one would like. There is talk of adding servers, but there is also a sense that the current servers should be able to keep up with the load. So the developers have been looking into what is going on.
The problem seems to originate with git. Kernel.org hosts quite a few git repositories and a version of the gitweb system as well - though gitweb is often disabled when the load gets too high. The git-related problems, in turn, come down to the speed with which Linux can read directories. According to kernel.org administrator H. Peter Anvin:
Clearly, something is not quite right with the handling of large filesystems under heavy load. Part of the problem may be that Linux is not dedicating enough memory to caching directories in this situation, but the real problems are elsewhere. It turns out that:
- The getdents() system call, used to read a directory, is, according to Linus, one of the most
expensive in Linux. The locking is such that only one process can be
reading a given directory at any given time. If that process must
wait for disk I/O, it sleeps holding the inode semaphore and blocks
all other readers - even if some of the others could work with parts
of the directory which are already in memory.
- No readahead is done on directories, so each block must be read, one
by one, with the whole process stopping and waiting for I/O each time.
- To make things worse, while the ext3 filesystem tries hard to lay out files contiguously on the disk, it does not make the same effort with directories. So the chances are good that a multi-block directory will be scattered on the disk, forcing a seek for each read and defeating any track caching the drive may be doing.
It has been reported that the third of the above-listed problems can be addressed by moving to XFS, which does a better job at keeping directories together. Kernel.org could make such a switch - at the cost of about a week's downtime for each server. So one should not expect it to happen overnight.
The first priority for improving the situation is, most likely, the
implementation of some sort of directory readahead. That change would cut
the amount of time spent waiting for directory I/O and, crucially, would
require no change to existing filesystems - not even a backup and restore -
to get better performance. An early readahead patch has been circulated,
but this issue looks complex enough that a few iterations of careful work
will be required to arrive at a real solution. So look for something to
show up in the 2.6.21 time frame.
