Adaptive file readahead
The current kernel readahead implementation uses a window 128KB in length. When readahead seems appropriate, the kernel will speculatively bring in the next 128KB of file data. If the application continues to read sequentially through that data, the next 128KB chunk will be brought in when the application is part-way through the first one. This implementation works, but Wu Fengguang thinks that it can be made better.
In particular, Wu thinks that the fixed readahead window size should, instead, adapt to both the application's behavior and the global state of the system. His adaptive readahead patch is an implementation of this thought. It is a work of daunting complexity, but the core ideas are reasonably straightforward.
The adaptive readahead patch tries to balance two constraints: readahead should be performed aggressively, but not to the point that the system starts thrashing or readahead pages get recycled before the application uses them. Every time a readahead decision is to be made for a specific file, the adaptive code looks at how much memory is available for readahead and how quickly the application has been working through the file. If memory is tight, or if the disk holding the file is congested, readahead will not be performed at all.
The code also looks at the pressure on the inactive page lists and tries to figure out whether any readahead pages are in danger of falling off that list and being reclaimed. In that situation, the readahead pages will be moved back up the list, keeping them in memory for a bit longer. This "rescue" operation helps to keep previous readahead work from being wasted; since it is only performed when the application consumes data from the file, it will not happen if the reading process has stalled entirely. But, when the application is working through the data, it will get another chance to benefit from readahead which has already been performed. No more readahead will be started in that situation, however.
If, instead, the application is making use of its readahead pages and the memory is available, the readahead window can grow up to 1MB. For streaming media or data processing applications which work their way sequentially through large files, this enlarged window can lead to significant performance gains.
In fact, Wu claims results which are "pretty optimistic." They include a 20-100% improvement for applications doing parallel reads, and the ability to run 800 1KB/sec simultaneous streams on a 64MB system without thrashing. The page cache hit rate is claimed to be 91%, which is quite good.
The adaptive readahead patch might, thus, be a worthwhile addition to the
Linux memory management subsystem. There has been little discussion (none,
actually) of the patch on the list, however. Complicated patches working
in an obscure corner of memory management do not receive the same level of
review as, say, new filesystems, it would seem. In any case, a patch of
this nature will require a good deal of testing before it can be considered
for any sort of merge. So, while adaptive readahead may indeed make its
way into the mainline, it's not something to expect to see in the very near
future.
Index entries for this article | |
---|---|
Kernel | Memory management/Readahead |
Kernel | Readahead |
Posted Oct 13, 2005 23:16 UTC (Thu)
by emj (guest, #14307)
[Link] (2 responses)
Posted Oct 16, 2005 17:48 UTC (Sun)
by Ross (guest, #4065)
[Link]
Posted Oct 17, 2005 8:13 UTC (Mon)
by shane (subscriber, #3335)
[Link]
Posted Oct 14, 2005 18:13 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
There are two reasons for readahead. The article mentions one: you can speed up an application thread by doing the I/O in a parallel thread, while the application thread is munching on what it previously read.
For this purpose, you want to set the readahead window size exactly to the application burst size -- the amount of reading the thread does in each read/process cycle.
This effect is irrelevant if the application doesn't have anything else to do or never waits for reads (a multi-threaded application).
But just as important is the fact that a block device has greater capacity when given more stuff at a time to read. It can schedule disk and head motion better and combine contiguous reads to eliminate some overhead.
This effect drops off as the sizes go up. And it's irrelevant if you aren't driving the disk drive at capacity.
An ideal adaptive readahead system would use feedback to see when increasing the window size stops increasing throughput. That handles both the readahead effects. You also stop when the memory manager says you're using too much memory (which is the inactive list thing mentioned in the article). With something that automatic, there would be no need for that arbitrary 1M readahead limit.
The only Problem I have with readahead is when I'm watching a movie in mplayer and I get small pauses when the disk has to wake up from its power saving sleep state...sleep + readahead
Seems like it would be caused by a lack of readahead. If the pages were being read in before they were needed they spin-up pause shouldn't cause a stall in playback. Maybe there should be an extra-aggressive readahead mode when disks are set to automatically spin down.sleep + readahead
For IDE disks you can set the sleep time with hdparm. Perhaps you should sleep + readahead
tweak this?
Adaptive file readahead