LWN.net Logo

On-demand readahead

"Readahead" is the act of speculatively reading a portion of a file's contents into memory in the expectation that a process working with that file will soon want that data. When readahead works well, a data-consuming process will find that the information it needs is available to it when it asks, and that waiting for disk I/O is not necessary. The Linux kernel has done readahead for a long time, but that does not mean that it cannot be done better. To that end, Fengguang Wu has been working on a set of "adaptive readahead" patches for a couple of years.

Adaptive readahead was covered here in 2005. The patches have been languishing in the -mm tree for one simple reason: their complexity is at such a level that few people are able to review them in any useful way. The new on-demand readahead patch is a response to a request from Andrew Morton for a simpler patch to help get the merge process going. The new code is indeed simpler, having dispensed with much of the logic found in the full adaptive readahead mechanism.

To a great extent, the on-demand patch reimplements what Linux readahead does currently, but in a simpler and more flexible way. Like the current code, the on-demand patch maintains a "readahead window" consisting of a portion of the file starting with the application's last read. Pages inside the readahead window should already be in the page cache - or, at least, under I/O to get there as soon as possible. The window moves forward as the application reads data from the file.

The current code actually implements two windows, being the "current window" (a set of in-cache pages which includes the application's current position) and the "ahead window," being the pages most recently read in by the kernel. Once the application's position crosses from the current window into the ahead window, a new I/O operation is started to make a new ahead window. In this way, the kernel tries to always keep sufficiently far ahead of the application that the file data will be available when requested.

The on-demand patch, instead, has a single readahead window. Rather than maintain a separate "ahead window," the new readahead code marks a page (using a flag in the page structure) as being at the "lookahead index." When an application reads its way into the marked page, the readahead window is extended and a new I/O operation is started. There is some resistance to the idea of using a page flag, since those bits are perennially in short supply. Andrew Morton has suggested using some more approximate heuristics instead. That approach might occasionally make the wrong decision, but the penalty is low and does not affect the correctness of the system's operation as a whole.

While the on-demand patch appears to do relatively little, it does have the advantage of removing a bunch of complexity from the current readahead code. It is able to make its decisions without the overhead of trying to track events like an attempted readahead of pages which are already in the cache. The checks for sequential access are made less strict as well, causing readahead to stay active in situations where the current code would turn it off. The result, according to some benchmarks posted with the patch, is improvements in application speed between 0.1% and 8% or so - with some performance regressions in some cases. Interestingly, some of the best results come with a benchmark running on a MySQL database, which is not where one would normally expect to see a lot of sequential activity.

This patch set is clearly simple enough to be reviewed; in the absence of any strong objections, it could conceivably be ready for 2.6.23. Then, perhaps, Fengguang can start working on adding some of the more complex logic which makes up the full adaptive readahead mechanism.


(Log in to post comments)

Can swap-in be improved?

Posted May 24, 2007 7:44 UTC (Thu) by mgb (guest, #3226) [Link]

The place where readahead is really needed is when swapping in a long-idle application.

Anyone with a modest desktop will have experienced the problem of switching back to an application that has sat idle for half an hour. The drive thrashes away for minutes as apparently individual pages are faulted in. This often takes much longer than starting the application from scratch.

I wonder if the kernel could detect this situation, perhaps by noticing an unusually high sustained rate of reads from the swap file for a given process, and try temporarily reading some larger number of contiguous pages instead of one.

There are undoubtedly corner cases where this would make things worse but I suspect that overall something like this would be a considerable improvement.

Can swap-in be improved?

Posted May 24, 2007 10:03 UTC (Thu) by MathFox (guest, #6104) [Link]

Swap-in is non-sequential reads, so it's hard to predict which pages to read ahead. OTOH, memory is pretty cheap nowadays (1 GB ~ $50); buying a DIMM may quickly give you a more responsive system. Professional system administrators don't want their systems to swap under normal load.

Can swap-in be improved?

Posted May 24, 2007 14:43 UTC (Thu) by i3839 (guest, #31386) [Link]

More ram isn't necessary to avoid swapping. If you have swap usage you don't have a moderate desktop, but instead a decadent corpulent bloated desktop. That, or you're using something that needs hogs of memory, like video encoding apps, but that's beyond moderate too. I'm doing fine with 256 MB for years, and although ram is dirt cheap, I simply see no reason to buy more of it.

mgb: You might be interested in Con's swap prefetch.
See http://members.optusnet.com.au/ckolivas/kernel/

Can swap-in be improved?

Posted Jul 12, 2007 18:18 UTC (Thu) by xorbe (subscriber, #3165) [Link]

Yeah, but if you are not using an app, Linux throws out your program to cache more of your hard drive. Unavoidable.

Can swap-in be improved?

Posted May 24, 2007 14:44 UTC (Thu) by mgb (guest, #3226) [Link]

DIMMs are cheap in this part of the world but laptops to holder larger DIMMs are not. In some circumstances it is economical to have sufficient RAM to hold one or two apps but to use swap when three or four are open and temporarily idle.

The thought was that one could overcome the slow random swap-in reads by reading ahead pages from swap which had been used earlier and would probably therefore be used again.

Can swap-in be improved?

Posted May 24, 2007 18:32 UTC (Thu) by NAR (subscriber, #1313) [Link]

When I leave from work, I don't turn off the laptop, just lock the screen. Next morning when I unlock the screen, it takes about 10-15 seconds to have the browser or OpenOffice swapped back. Probably a cronjob needs some memory, that's why they are swapped, but it's still annoying.

Bye,NAR

Can swap-in be improved?

Posted May 25, 2007 1:34 UTC (Fri) by interalia (subscriber, #26615) [Link]

A plausible culprit is the cronjob that runs updatedb so that your 'locate' database is up-to-date. That churns the cache quite a bit since it involves a lot of disk seeking.

Can swap-in be improved?

Posted May 26, 2007 2:10 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

One thing that would improve this and various other things is actual swapping. What we call swapping in Linux is really just virtual memory paging. Swapping is a much older technology in which you remove an entire process from memory to let other processes run, then bring the entire process back when it's turn comes around again. It's also known as long term scheduling.

If Linux could notice that a process hasn't accessed any of its pages (hasn't run) in a long time, it could write all the resident pages out to contiguous swap space. The next time the process runs, it could read it all back.

Furthermore, if Linux is stealing a page that was accessed quite recently, it ought to steal them all and then wait a while before swapping them all back in again and swapping out somebody else.

By the way, it would be a mistake to believe that all that paging activity when you sit down to a long-idle session is Linux "swapping." A lot of it is reading in memory mapped program files. Speeding that up isn't as easy.

Can swap-in be improved?

Posted May 31, 2007 17:04 UTC (Thu) by anton (guest, #25547) [Link]

A lot of it is reading in memory mapped program files. Speeding that up isn't as easy.
I think it's not very hard. You just need some history-based prediction mechanism. A student of mine has done his Master's thesis (in German) on speeding up the startup time of programs by recording the blocks accessed when starting a program, and then prefetching them (Windows XP does something like that for the OS startup). Similarly, one could record the pages or blocks accessed soon after a major page fault, and then prefetch all of them the next time the page faults.

Can swap-in be improved?

Posted Jun 2, 2007 2:30 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

That's actually different, and a little harder, than what we're talking about here. We're talking about resuming a process that has been idle for a while. So all the history you need is the list of pages that were resident the last time the process ran.

That by itself can't get you to the level of speed that we discussed with anonymous pages, where all those pages were contiguous on disk.

I know I've seen somewhere a system where you don't ordinarily start a program, but rather reload and resume an entire process. In that case, the two prefetching problems are probably the same.

Can swap-in be improved?

Posted Jun 7, 2007 13:29 UTC (Thu) by anton (guest, #25547) [Link]

Yes, fetching all the pages that were resident is certainly an approach for the problem that would work.

Using a history-based prefetcher might have the following advantages, though:

  • It could fetch fewer pages (e.g., not pages loaded on startup, but no longer needed, but that were still resident when the process was swapped out), which would be faster and less disruptive for other processes.
  • The prefetcher would also be useful in other situations, like on startup, or when a process enters a new phase (and therefore needs many new pages from executable or data files).
That being said, a well-balanced history-based prefetcher is certainly a much larger project than loading the once-resident pages.

Can swap-in be improved?

Posted Jul 25, 2007 23:00 UTC (Wed) by Blaisorblade (guest, #25465) [Link]

> A lot of it is reading in memory mapped program files. Speeding that up
> isn't as easy.

Surely swap prefetch won't help here, indeed. But good readahead would.

About swapping, the reason it's old is that, as Rik van Riel (and another
reader below) puts it, systems are slower than ages ago. Would you swap
out firefox, ever? Or X?

> Furthermore, if Linux is stealing a page that was accessed quite
> recently, it ought to steal them all and then wait a while before
> swapping them all back in again and swapping out somebody else.

Well, the swap token was introduced by Rik van Riel to implement something
similar, based on a research paper. Actually it's mostly a mean to prevent
thrashing (which can be what you describe, depending on the times
involved). It didn't work at first and was disabled, but now it should
have been fixed; a new variation of the algorithm was implemented in
7602bdf2fd14a40dd9b104e516fdc05e1bd17952 (in 2.6.20-rc1). LWN described it
well, see Kernel Page Index for more details.

Can swap-in be improved?

Posted May 25, 2007 8:58 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

there is another set of patches that was sent out in the last week or so that addresses readahead for swap-in

the reports on it were extremely favorable

On-demand readahead

Posted May 24, 2007 9:00 UTC (Thu) by tajyrink (subscriber, #2750) [Link]

Considering the seek times have remained about the same on desktop hard drives for the last 15 years, while transfer speeds has increased maybe 50 fold and capacity is thousand times more than back then, there are quite huge gains to be had with even non-complex improvements if seeks can be reduced. It's one area where readahead may help to some extent, though mainly it's really in applications where you really shouldn't open 2000 files when logging in eg. to certain modern desktop environments.

On-demand readahead

Posted May 24, 2007 15:08 UTC (Thu) by kleptog (subscriber, #1183) [Link]

I'm not surprised MySQL get the benefits. Databases tend to do a lot of almost-but-not-quite sequential reads. There have been reports of people getting substandard performance on postgres because they run afoul of some assumption the kernel is making with respect to readahead. Increasing the readahead buffer tends to help, but it'd be better is the system was slightly smarter.

Given the current costs of seeking, it's really better to read in 10x times the amount of data, even if you're almost sure you're not going to need it. The cost of coming back later is very very high.

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds