Fixing writeback from direct reclaim
Posted Jul 23, 2010 21:55 UTC (Fri) by giraffedata (subscriber, #1954)
But remember that the "free list" is not the list of pages available for use. It's much more complicated than that, and in fact nearly all the pages are available for use because of page stealing.
Some pages are more expensive to steal than others, and that's what makes the problem so hard. But a clean page with evidence that it isn't going to be accessed soon is about as cheap to use to satisfy an allocation as a member of the free list.
The free list is a list of pages that don't contain data the kernel has any way of using in the future, and a large free list represents memory waste.
Posted Jul 25, 2010 13:34 UTC (Sun) by i3839 (guest, #31386)
But a large free list is useful to handle sudden memory pressure situations
well without destroying latency. So if you have a spiky load having a bigger
free list seems like a good idea to me. If you make sudden huge allocations
then a bigger free list won't help much, what you want then is little dirty
data hanging around.
Posted Jul 25, 2010 15:13 UTC (Sun) by giraffedata (subscriber, #1954)
I don't know that there is any noticeable difference in the time it takes to get a page from the free list and the time it takes to steal a clean inactive page not in the free list. But maybe, since it does take a certain number of CPU cycles to forget what used to be in the page.
I believe the only point of a minimum free list size is to provide some reserve for use in contexts where stealing is not possible. In particular, when the page stealer itself needs memory in order to progress, it can't wait for a page steal to get it. I.e. the parameter in question is for adjusting the probability of deadlock and maybe OOM.
Posted Jul 26, 2010 19:29 UTC (Mon) by i3839 (guest, #31386)
The page stealer can't possibly need memory itself, that would be too stupid. It can't really wait anyway because it's generally called from interrupt, when a page fault happens (or maybe the call is deferred to the process causing the fault). And if it waits then it's a lot slower than the page allocator.
Deadlock shouldn't be possible whatever value you set. OOM is only more likely with higher values because the kernel only OOMS when it can't allocate memory for itself.
Posted Jul 27, 2010 21:12 UTC (Tue) by giraffedata (subscriber, #1954)
The page stealer does need memory itself. I've always hated the lack of strict resource ordering in Linux, such that it avoids deadlock only by parameters being set so that it's really unlikely, but that's the way it is. The kmalloc pool sits above and below many other layers. The page stealer is more complex than you're probably thinking, because it can involve, for example, writing the contents of a page to its backing store on a network filesystem.
There is a flag (PF_MEMALLOC) on a memory request that says, "this request is part of memory allocation itself" or, equivalently, "don't wait for memory under any circumstance." The requester is supposed to have some way to respond to a failed memory allocation that is better than a deadlock. For example, it could try to find an easier page to steal.
Page fault handling does happen in process context. It normally requires I/O, so interrupt context is pretty much out of the question.
I remember a similar discussion some years ago, in which someone as an experiment set his minimum free list size to zero, and the system froze.
Of course, everything here must be taken with a grain of salt because this stuff changes frequently, so what's true of one particular version of kernel isn't necessary true of another. I do remember being tormented by the network subsystem's requests for memory as part of page stealing and then someone later doing something to ameliorate that.
Posted Jul 28, 2010 19:01 UTC (Wed) by i3839 (guest, #31386)
Anyway, those cases come from cleaning dirty pages, and the deadlock comes into view when that is triggered by the page stealer, but those paths try to allocate memory and trigger the page stealer again. I wouldn't really say the page stealer is the one needing memory, but rather that to steal dirty pages you sometimes need memory, especially with network file systems. The main problem there being that they need to receive packets to have forward progress in the dirty data writeout (and you don't know before receiving whether it's a critical packet or not). I think they added a memory pool to mostly fix this case.
Most page faults don't require IO, just memory mapping updates. If a process allocates memory it gets virtual memory, only when it actually uses it real memory is allocated for it. In that case a page fault occurs and a page needs to be allocated and mapped. Same for copy-on-write handling. Looking at the code it seems that the kernel just has to enable interrupts and can continue handling the page fault in process context without doing much special.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds