Weekly Edition Return to the Kernel page |
Avoiding the OOM killer with mem_notifyHaving applications that use up all the available memory can be a fairly painful experience. For Linux systems, it generally means a visit from the out-of-memory (OOM) killer, which will try to find processes to kill. As one would guess, coming up with rules governing which process to kill is challenging—someone, somewhere, will always be unhappy with a choice the OOM killer makes. Avoiding it altogether is the goal of the mem_notify patch. When memory gets tight, it is quite possible that applications have memory allocated—often caches for better performance—that they could free. After all, it is generally better to lose some performance than to face the consequences of being chosen by the OOM killer. But, currently, there is no way for a process to know that the kernel is feeling memory pressure. The patch provides a way for interested programs to monitor the /dev/mem_notify file to be notified if memory starts to run low. /dev/mem_notify is a character device that signals memory pressure by becoming readable. Interested programs can open the file and then use poll() or select() to monitor the file descriptor. Alternatively, signal-driven I/O can be enabled via the FASYNC flag and the system will deliver a SIGIO signal to the process when the device becomes readable. If it becomes readable, the process should free any memory that it can afford to give up. If enough memory is freed this way, the kernel will have no need to call in the OOM killer. The crux of the patch is how to decide that memory pressure is occurring. mem_notify modifies shrink_active_list() to look for movement of an anonymous page to the inactive list, which is an indication that some will likely be swapped out soon. When that occurs, memory_pressure_notify() (with the pressure flag set to 1) will be called for that zone. When the number of free pages for the zone increase above a threshold—based on pages_high and lowmem_reserve for the zone—memory_pressure_notify() is called again, but with the pressure flag set to 0, effectively ending the memory pressure event for that zone. If there are numerous processes waiting for a memory pressure notification, it could be counterproductive to wake them all at once—the "thundering herd" problem. To combat this, the patch set adds the ability to wake fewer processes than are waiting on the poll event, by adding the poll_wait_exclusive() function. poll_wait_exclusive() will in turn call add_wait_queue_exclusive() so that a member of the wake_up() family can be used that will limit the number of processes woken up. Previously, only poll_wait() was available, it uses add_wait_queue(), which does not provide this ability. Also, to reduce the frequency of processes waking up to reclaim memory, memory_pressure_notify() will only do that once every five seconds. The /proc/zoneinfo output has been changed to include the mem_notify status. This can be used by a human for diagnostic purposes or by a program to check the current status of zones for memory pressure. The embedded community has a lot of interest in seeing this feature get added to the kernel. Devices like phones and PDAs are often running close to their memory limits and the OOM killer is currently unavoidable when the user opens yet another application. With this patch in place, programs that use a lot of memory, but could get by with less, can be changed to free up their caches and the like when memory gets tight. As memory hungry programs get changed, other users will benefit as well. The patch, submitted by Kosaki Motohiro, has been through several iterations on linux-kernel. The work was originally started by Marcelo Tosatti, with the fifth version recently posted by Kosaki. Previous versions have been well received and with relatively few comments on this iteration, it would seem to be getting close to being merged. (Log in to post comments)
Avoiding the OOM killer with mem_notify Posted Jan 31, 2008 3:34 UTC (Thu) by salimma (subscriber, #34460) [Link] Presumably the OOM killer looks at applications' changes in memory footprint when picking a victim? Otherwise there'd be no incentive for an app to play nice and voluntarily relinquish memory.
Avoiding the OOM killer with mem_notify Posted Jan 31, 2008 9:09 UTC (Thu) by njs (subscriber, #40338) [Link] It's possible to take this metaphor of processes fighting over memory too far. I don't think in practice app writers consider themselves to have "won" if they've managed to partially crash the user's system but keep their own process running while they did it. The goal is to not invoke the capricious god OOM at all. Interesting possibility enabled by this patch: userspace OOM killer. You don't *have* to reduce memory by freeing caches -- killing other processes is quite effective too :-). And if you have a relatively integrated environment like a phone UI, you may know perfectly well from userspace that killing that java game is better than killing the windowing system, which is better than killing the gsm daemon. (Even on desktops, one knows that killing X will also automatically kill all its clients -- so one should always start by killing those clients first, because if that works, then you've managed to escape the OOM situation with strictly less damage.)
Requesting 'real' memory Posted Jan 31, 2008 11:08 UTC (Thu) by epa (subscriber, #39769) [Link] The OOM killer is needed because the kernel has overallocated memory. Surely for critical processes there is a way to request 'hard' memory, where you can be sure that it really exists either as RAM or swap space, and you can be certain you're not going to be arbitrarily killed later for using the memory you requested. The tradeoff is that a memory allocation request can fail - but better to have malloc() return 0 where the app can handle it sensibly than to have it pretend to work and then randomly kill your process later. Can you turn off overallocation (and OOM killing) on a per-process basis?
Requesting 'real' memory Posted Jan 31, 2008 11:43 UTC (Thu) by njs (subscriber, #40338) [Link] >and you can be certain you're not going to be arbitrarily killed later for using the memory you requested Well, here's what makes designing the OOM-killer hard -- attempting to use memory that the system doesn't have actually *doesn't* kill you, it just wakes up the OOM-killer and then it's perfectly possible that the OOM-killer will go after someone else. Imagine a scenario where one app allocates 99% of the system's memory (and not with some virtually allocated overallocation bs, like they actually touch the pages or whatever), and then stops. Then you try to run "ls", and it overflows that last 1% of memory and wakes up the OOM-killer. The OOM-killer tries to identify and then attack the runaway giant process, not ls, even though ls was the one who tried to get more memory. So it doesn't actually help much for you, personally, to make sure your memory is not overallocated -- if anything it will hurt, since it increases memory pressure overall and also makes your process a bigger target. You can turn off overallocation globally, but that doesn't necessarily help either, since in the scenario above it just means that ls (and every other program you try to run) fails, while the runaway monster just sits there. Google say you can disable the OOM killer on a per-process basis, though, which I hadn't known: http://linux-mm.org/OOM_Killer
Requesting 'real' memory Posted Feb 1, 2008 20:21 UTC (Fri) by giraffedata (subscriber, #1954) [Link] So it doesn't actually help much for you, personally, to make sure your memory is not overallocated Right, that's as dodgy as expecting to avoid the OOM Killer pseudo-crash by having the kernel notify users that memory is tight. What you want is a combination of the two: a process turns off overallocation for itself, and in exchange, is made immune to the OOM Killer. That way, processes that need determinism can have it while processes that don't want to waste swap space can have that. Linux doesn't have this. AIX does.
Requesting 'real' memory Posted Feb 1, 2008 23:55 UTC (Fri) by njs (subscriber, #40338) [Link] > What you want is a combination of the two: a process turns off overallocation for itself, and in exchange, is made immune to the OOM Killer. I don't see the connection between these. Turning off overallocation just means that you get a different error handling API. It certainly doesn't stop you from running the system out of memory. Making a process immune from the OOM killer is clearly a root-level operation; all you have to do to force allocation is to touch pages after you allocate them, obviously not a root level sort of ability.
Requesting 'real' memory Posted Feb 2, 2008 2:00 UTC (Sat) by giraffedata (subscriber, #1954) [Link] Turning off overallocation just means that you get a different error handling API How is getting killed by the OOM Killer an error handling API? Turning off overallocation means you get an error handling API where you had none before. It certainly doesn't stop you from running the system out of memory. Turning off overallocation for one process doesn't stop you from running the system out of memory; that's why the OOM Killer is still there. But he only kills other processes. The connection is that if Process X is not overallocating memory (swap space), then the OOM Killer is guaranteed to be able to relieve memory pressure without having to kill Process X. You can't say that about an overallocating process. Think of it as two separate pools of swap space; one managed by simple allocation; the other with optimistic overallocation and an OOM Killer. A process decides which one works best for it. all you have to do to force allocation is to touch pages after you allocate them, No, that's not enough. In overallocating mode, the swap space does not get allocated until the kernel decides to steal a page frame. By then, the process is in no position to be able to deal with the fact that there's not enough swap space for him.
Requesting 'real' memory Posted Feb 3, 2008 1:39 UTC (Sun) by njs (subscriber, #40338) [Link] Ah, I see, that's not what overallocation means. It has nothing to do with swap. As far as the memory manager is concerned, the total amount of memory available in the system is RAM + swap -- if you have 1G ram and 2G swap, then you have 3G total memory. This 3G total is distributed among processes. If overallocation is turned off, then each time a process calls malloc() (well, really mmap()/sbrk()/fork()/etc., but never mind), either some pages from that 3G are shaved off and reserved for that process's use, or if there are not enough pages remaining then the syscall fails. If overallocation is turned on, then malloc() never actually allocates memory. What it does instead is set up some virtual pages in the process's address space, and then the first time the process tries to write anywhere on each of those not-really-there pages, the process takes a fault, the memory manager allocates one of those 3G of pages, sticks it in place of the fake page, and then finally allows the write to continue. The upside of this is that if a process malloc()'s a big chunk of memory and then only uses part of it, it's as if they only malloc()'ed exactly what they ended up needing. The downside is that since the actual allocation is now happening somewhere in the middle of a single user-space cpu instruction, there's no way to signal an error back to the process if the allocation fails, and so in that case the only thing you can do is wake up an OOM-killer. > The connection is that if Process X is not overallocating memory (swap space), then the OOM Killer is guaranteed to be able to relieve memory pressure without having to kill Process X. Which means that this just isn't true. Memory pressure is caused by actually-allocated pages, and the only difference between a process using overallocation and one that isn't is that the overallocating process may have some virtual pages set up to trigger allocation sometime in the future. Whether such pages exist has no bearing whatsoever on memory pressure *now*. The only way the OOM-killer can relieve memory pressure is to kill off processes that are using memory, and Process X qualifies.
Requesting 'real' memory Posted Feb 3, 2008 2:29 UTC (Sun) by giraffedata (subscriber, #1954) [Link] Though you start out saying overallocation has nothing to do with swap, your second sentence shows that it is strongly related to swap, saying that swap space is half the equation in determining how much memory is available to allocate. But what overallocation are you talking about? You describe it like some well-defined Linux function. Are you talking about something that Linux implements? AFAIK, Linux does not implement a per-process overallocation mode, and we were talking about what should be. It's clear how the mode should work: The same way it does in AIX, which is what I described. "Turning off" overallocation has to mean that once you've allocated memory, you can use it and can't be killed for lack of memory. Otherwise, why bother having the mode? And the way to do that is to allocate swap space to the process at the moment you allocate the virtual addresses to it. You could alternatively permanently allocate some real memory to the process, but that would be really wasteful (and we already have a means to do that: mlockall()). BTW, it's not helpful to talk about memory not being actually allocated by malloc (brk/mmap/etc). malloc() does actually allocate virtual memory. Allocating swap space and allocating real memory are separate things that exist in support of using virtual memory which has been allocated. I also don't think "virtual" amounts to "fake." It's a different form of memory.
I assume you mean pages of real memory. Filling up of real memory is the primary cause of memory pressure, but it's easy to relieve that pressure: just push data you aren't using out to swap space and free up the real memory. When swap space is full, that's when the pressure backs up into the real memory and the OOM Killer is needed. That's why I say swap space is the key to giving guaranteed-usable virtual memory to a process.
Requesting 'real' memory Posted Feb 3, 2008 4:06 UTC (Sun) by njs (subscriber, #40338) [Link] Obviously we're totally talking past each other, but I'll try one more time... >Though you start out saying overallocation has nothing to do with swap, your second sentence shows that it is strongly related to swap, saying that swap space is half the equation in determining how much memory is available to allocate. Well, sure, swap is, by any measure, an important part of a VM system, but that doesn't mean it's "strongly related" to any other particular part of a VM system. My point is that from the point of view of overallocation, the difference between swap and physical RAM is just irrelevant. > But what overallocation are you talking about? You describe it like some well-defined Linux function. Are you talking about something that Linux implements? Yes. I'm talking about "overallocation" or "overcommit", which in this context is a technical term with a precise meaning. When people are talking about it in this thread, they are referring to a particular policy implemented by the Linux kernel and enabled by default. Evidentally you haven't encountered this particular design before, which is why I described it in my previous comment... >AFAIK, Linux does not implement a per-process overallocation mode, and we were talking about what should be. No, AFAIK it doesn't, but it does support a global overallocation/no-overallocation switch, and it's obvious what it would mean to take that switch and make it process-granular. Maybe there's yet another policy that Linux should implement, but if you want to talk about that then trying to redefine an existing term to do so will just confuse people. >And the way to do that is to allocate swap space to the process at the moment you allocate the virtual addresses to it. Huh, so is this how traditional Unix works? Is this tight coupling between memory allocation policy and swap management policy the original source of that old advice to make your swap space = 2xRAM? I've long wondered where that "rule" came from. I guess I've heard before that in traditional Unix all RAM pages are backed on disk somewhere, either via the filesystem or via swap, but I hadn't thought through the consequences before. I'm guessing this is the original source of confusion. Linux doesn't work like that at all; I'm not sure there's any modern OS that does. In a system where RAM is always swap-backed, having 1G RAM and 2G of swap means that all processes together can use 2G total; in Linux, they can use 3G total, because if something is in RAM it doesn't need to be in swap, and vice-versa. (What happens in your scheme if someone is running without swap? I bet there are people in this thread who both disable overallocation and run without swap (hi Zooko!).) "Allocated memory" in my comment really just means "anonymous transient data that the kernel has committed to storing on the behalf of processes". It can arrange for it to be stored in RAM, or in swap, or whatever. There is no such thing as "allocated swap" or "allocated RAM" in Linux. (Except via mlockall(), I guess, if you want to call it that, but I don't think calling it that is conceptually useful -- it's more of a way to pin some "allocated memory" Does that make my previous comment make more sense?
Requesting 'real' memory Posted Feb 4, 2008 9:43 UTC (Mon) by giraffedata (subscriber, #1954) [Link] but it does support a global overallocation/no-overallocation switch, and it's obvious what it would mean to take that switch and make it process-granular. That would be only slightly different from the scheme I described. The only difference is that the global switch lets you add a specified amount of real memory to the size of swap space in calculating the quota. If you could stop the kernel from locking up that amount of real memory for other things, you could have the OOM-proof process we're talking about, with less swap space. I think the only reason I haven't seen it done that way is that swap space is too cheap to make it worthwhile to bring in the complexity of allocating the real memory. If I were to use the Linux global switch, I would just tell it to consider 0% of the real memory and throw some extra disk space at it, for that reason.
What happens in your scheme if someone is running without swap? I bet there are people in this thread who both disable overallocation and run without swap They don't have that option. The price they pay to have zero swap space is that nothing is ever guaranteed to be free from being OOM-killed. Which is also the case for the Linux users today who disable overallocation and run without swap.
Requesting 'real' memory Posted Feb 5, 2008 13:44 UTC (Tue) by filipjoelsson (subscriber, #2622) [Link] > But what overallocation are you talking about? You describe it like some > well-defined Linux function. Are you talking about something that Linux > implements? AFAIK, Linux does not implement a per-process overallocation > mode, and we were talking about what should be. I think the overallocation he talks about is on the userspace level. When I'm programming an application to read, store and analyze data on a laptop (a field application - the user wants to have preliminary analysis right away), I can make a big fat allocation to use for cache. I store the data in a database, which also uses a big fat cache. Until now, I have been had to either make both caches no larger than a quarter the size of the RAM (roughly), since swapping out really defeats the point of the cache. If I had bigger caches, I could just hope that the sum of the actually used memory would not be larger than RAM (or else it'd start swapping out, or if I run without swap - trigger the OOM). With this patch, I can safely overcommit - and when I get the notification, I can cut down on the caches and survive. The analysis step of the app will be slower, but my user will not have been near a crash - and the analysis would have been slower anyway because of swapping. One difference is that he can run without swap in a much more safe manner. Anyway, this is not at all a case of a partial crash. You could argue that it is a case of sloppy programming, but why should I reinvent memory managing? I'd much rather let the kernel do that for me. Oh, and lastly - on embedded platforms, you often have to do without swap. Swapping to flash does not strike me as a good idea.
Requesting 'real' memory Posted Feb 8, 2008 5:00 UTC (Fri) by goaty (guest, #17783) [Link] Don't forget about stack! For heavily multi-threaded processes on Linux, stack is usually the biggest user of virtual memory. As an aside, I know some operating systems use a "guard page" stack implementation, where writes into the first page off the bottom (top?) of the stack trigger a page fault, which allocates another page worth of real memory, and also moves the "guard page". The benefit being that virtual memory use is much closer to actual memory use, and turning off overcommit is much more viable. The downside is that the ABI requires the compiler to generate code to probe every page when it needs to allocate a stack frame larger than the page size. Which ironically can end up using more real memory than Linux's virtual-memory-hungry approach.
Requesting 'real' memory Posted Feb 8, 2008 21:23 UTC (Fri) by nix (subscriber, #2304) [Link] Linux has used a guard-page stack implementation since forever. It's transparent to userspace: the compiler doesn't need to do a thing. (Obviously when threading things get more complex.)
Requesting 'real' memory Posted Feb 1, 2008 0:01 UTC (Fri) by zooko (subscriber, #2589) [Link] You can set sys.vm.overcommit_memory to policy #2. Unfortunately, it isn't entirely clear that this will banish the OOM killer entirely, or if it will just make it very rare. http://www.linuxinsight.com/proc_sys_vm_overcommit_memory...
Requesting 'real' memory Posted Feb 1, 2008 1:21 UTC (Fri) by zlynx (subscriber, #2285) [Link] I ran my Linux laptop with strict overcommit enabled for a while. Unfortunately, it does not help. Almost all desktop applications expect memory allocation to succeed. From some of the application errors I saw, developers seem to have become very lax about checking for NULL from malloc. C++ and Python applications did better, because they get an exception, and they have to do *something* with it.
Requesting 'real' memory Posted Feb 1, 2008 3:28 UTC (Fri) by zooko (subscriber, #2589) [Link] Even if what you say is true, I would think that this would make the effects of memory exhaustion more deterministic/reproducible/predictable. C++ and Python apps, and also C apps that use malloc sparingly, would be less likely to crash than others, I guess. Perhaps this degree of predictability isn't enough to be useful.
Requesting 'real' memory Posted Feb 1, 2008 17:56 UTC (Fri) by zlynx (subscriber, #2285) [Link] I did not notice any extra predictability. The effect was that the desktop programs crash apparently randomly. It was much like the OOM killer. And just like the OOM killer, it was generally the big stuff that blew up, like Evolution and Open Office. I lost gnome-terminal a few times. The C++ and Python apps still crashed, they were simply more polite about it. By the way, I don't read it that way, but your phrasing "Even if what you say is true" *could* be offensive. It seems to be saying that I wrote untruthfully. Even if you don't see the same effect on your system, I did see it just the way I described it on mine.
Requesting 'real' memory Posted Feb 1, 2008 19:34 UTC (Fri) by giraffedata (subscriber, #1954) [Link] Desktop applications aren't where I would expect to see deterministic memory allocation exploited. Allocation failures and crashes aren't such a big deal with these applications because if things fall apart, there's a user there to pick up the pieces. Overallocation and OOM Killer may well be the optimum memory management scheme for desktop systems. Where it matters is business-critical automated servers. For those, application writers do spend time considering running out of memory -- at least they do in cases where an OOM killer doesn't make it all pointless anyway. They check the success of getting memory and do it at a time when there is some reasonable way to respond to not getting it. And they shouldn't spend time worrying about freeing up swap space for other processes (i.e. mem_notify is no good). That resource management task belongs to the kernel and system administrator.
Requesting 'real' memory Posted Feb 1, 2008 20:14 UTC (Fri) by giraffedata (subscriber, #1954) [Link] You can set sys.vm.overcommit_memory to policy #2. Unfortunately, it isn't entirely clear that this will banish the OOM killer entirely, or if it will just make it very rare. It's entirely clear to me that it banishes the OOM killer entirely. The only reason the OOM killer exists is that sometimes the processes use more virtual memory than there is swap space to put its contents. With Policy 2, virtual memory isn't created in the first place unless there is a place to put the contents.
Requesting 'real' memory Posted Feb 1, 2008 20:47 UTC (Fri) by zooko (subscriber, #2589) [Link] But doesn't the kernel itself dynamically allocate memory? And when it does so, can't it thereby use up memory so that some user process will be unable to use memory that it has already malloc()'ed? Or do I misunderstand?
Requesting 'real' memory Posted Feb 1, 2008 21:26 UTC (Fri) by giraffedata (subscriber, #1954) [Link] The kernel reserves at least one page frame for anonymous virtual memory (actually, it's a whole lot more than that, but in theory one frame is enough for all the processes to access all their virtual memory as long as there is adequate swap space). So any kernel real memory allocation can fail, and the code is painstakingly written to allow it to handle that failure gracefully (more gracefully than killing an arbitrary process). It allocates memory ahead of time so as to avoid deadlocks and failures at a time that there is no graceful way to handle it.
Requesting 'real' memory Posted Feb 1, 2008 21:50 UTC (Fri) by zooko (subscriber, #2589) [Link] Right, but I wasn't asking about the kernel's memory allocation failing -- I was asking about the kernel's virtual memory allocation succeeding by using memory that had already been offered to a process as the result of malloc(). Oh -- perhaps I misunderstood and you were answering my question. Are you saying that the kernel will fail to dynamically allocate memory rather than allocate memory which has already been promised to a process (when overcommit_memory == 2)? Thanks, Zooko
Requesting 'real' memory Posted Feb 1, 2008 23:05 UTC (Fri) by giraffedata (subscriber, #1954) [Link] The kernel doesn't use virtual memory at all (well, to be precise let's just say it doesn't use paged memory at all). The kernel's memory is resident from the moment it is allocated, it can't ever be swapped out, and the kernel uses no swap space.
Requesting 'real' memory Posted Feb 5, 2008 23:05 UTC (Tue) by dlang (subscriber, #313) [Link] the problem that you will have when you disable overallocating memory is that when your 200M firefox process tries to spawn a 2k program (to handle some mime type) it first forks, and will need 400M of ram, even though it will immediatly exec the 2k program and never touch the other 399.99M of ram. with overallocation enabled this will work. with it disabled you have a strong probability of running out of memory instead. yes, it's more reproducable, but it's also a completely avoidable failure.
Requesting 'real' memory Posted Feb 6, 2008 3:25 UTC (Wed) by giraffedata (subscriber, #1954) [Link] I wonder why we still have fork. As innovative as it was, fork was immediately recognized, 30 years ago, as impractical. vfork took most of the pain away, but there is still this memory resource allocation problem, and some others, and fork gives us hardly any value. A fork-and-exec system call would fix all that. Meanwhile, if you have the kind of system that can't tolerate even an improbable crash, and it has processes with 200M of anonymous virtual memory, putting up an extra 200M of swap space which will probably never be used is a pretty low price for the reliability of guaranteed allocation.
Requesting 'real' memory Posted Feb 6, 2008 5:26 UTC (Wed) by dlang (subscriber, #313) [Link] many people would disagree with your position that vfork is better then fork. (the issue came up on the lkml within the last week and was dismissed with something along the lines of 'vfork would avoid this, but the last thing we want to do is to push more people to use vfork') I agree that a fexec (fork-exec) or similar call would be nice to have, but it wouldn't do much good for many years (until a significant amount of software actually used it) as for your comment of just add swap space to avoid problems with strict memory allocation. overcommit will work in every case where strict allocation will work without giving out-of-memory errors, and it will work in many cases where strict allocation would result in errors. overcommit will also work in many (but not all) cases where strict allocation would result in out of memory errors. if it's trivial to add swap space to avoid the OOM errors in strict allocation, that same swap space can be added along with overcommit and the system will continue to work in even more cases. the only time strict allocation will result in a more stable system is when your resources are fixed and your applications are fairly well behaved (and properly handle OOM conditions), even then the scenerio of one app allocating 99% of your ram, preventing you from running other apps, is still a very possible situation. the only difference is that the timing of the OOM error is more predictable (assuming that you can predict what software will be run when in the first place)
Requesting 'real' memory Posted Feb 7, 2008 0:35 UTC (Thu) by giraffedata (subscriber, #1954) [Link] Many people would disagree with your position that vfork is better then fork No, they wouldn't, because I was talking about the early history of fork and comparing the original fork with the original vfork. The original fork physically copied memory. The original vfork didn't, making it an unquestionable improvement for most forks. A third kind of fork, with copy-on-write, came later and obsoleted both. I didn't know until I looked it up just now that a distinct vfork still exists on modern systems.
the only time strict allocation will result in a more stable system is when your resources are fixed and your applications are fairly well behaved (and properly handle OOM conditions) The most important characteristic of a system that benefits from strict allocation is that there be some meaningful distinction between a small failure and a catastrophic one. If all your memory allocations must succeed for your system to meet requirements, then it's not better to have a fork fail than to have some process randomly killed, and overallocation is better because it reduces the probability of failure. But there are plenty of applications that do make that distinction. When a fork fails, such an application can reject one piece of work with a "try again later" and a hundred of those is more acceptable than one SIGKILL.
Avoiding the OOM killer with mem_notify Posted Jan 31, 2008 15:12 UTC (Thu) by salimma (subscriber, #34460) [Link] Nokia has something similar on their Linux-based Maemo platform -- run it without swap, start a bunch of applications, and a lot of the built-in applications would enter a reduced-memory-usage mode -- noticeable because it takes much longer to switch to them than it normally would. I wonder whether the apps currently just poll the system to find out how much memory is left, or they have their own mechanism, though.
You forget about higher god! Posted Jan 31, 2008 9:47 UTC (Thu) by khim (subscriber, #9252) [Link] Applications are always under threat from OOM-killer, but there are another, more powerfull god: the end user! If applications does not play nice and forces other applications to be killed by OOM-killer (one way or another) then eventually this information reaches the user and application is either silenced forever or fixed. So the applications (or rather the application writers) have every incentive to play well with others...
memory congestion avoidance Posted Jan 31, 2008 10:01 UTC (Thu) by sasha (subscriber, #16070) [Link] All the topic looks for me like TCP congestion avoidance protocol. Currently, we are at the early start of its development -- we are going to get a notification that the congestion exists. However, it is not enough to have a notification system, because different applications are going to play by different rules. So, I' looking forward for some clear rules for "memory congestion avoidance" written down as a standard and implemented in the most of the high-level languages, especially languages with garbage collectors.
Avoiding the OOM killer with mem_notify Posted Jan 31, 2008 17:06 UTC (Thu) by im14u2c (subscriber, #5246) [Link] How effective can this be, though, for many C programs? If I malloc a bunch of memory, perhaps as caches, and then am asked to free it, that doesn't magically release pages back to the OS. Now, if malloc uses mmap for some of the larger allocations, those can be released back to the OS by munmap. But, for the general sbrk managed heap, I have to free stuff near the end of the heap before I can ask for my brk to be lowered. There's no guarantee I can do that. For this to be useful, whatever I malloc needs to have an additional level of indirection in user space, so I can move the objects I wish to keep and then compact the heap. Otherwise, simply freeing stuff up won't be enough. It may be useful to compare/contrast this to the HURD's approach, which is simply to force user space to do its own VM management. There, the kernel and user-space dicker about physical pages only, and user space figures out how best to handle the burden when a given app wants more pages than the OS can give it. The answer could be garbage collection, discarding caches, swapping or whatever makes sense to a given application. The main thing is that the app knows way ahead of time that real RAM is in short supply, and avoids getting into the overcommitted state entirely. And since the kernel isn't doing the swapping, it seems like you wouldn't get into situations where you need to free memory so you have enough memory so that you can write out pages and the like. Example: Imagine that to wake an app so it can free some pages, you have to bring it in from swap, but swap is too full to write any dirty anonymous pages out. If your policy is that each app self-swaps, this should never happen since the OS guarantees it'll have enough pages to do its work, and user space will just muddle along with what its given. (In theory, it seems like a user space app could get by with just a few pages... a couple executable pages and a couple data pages.) I'm guessing mem_notify will try to wake apps sufficiently far ahead that it can avoid those "need RAM to free RAM" situations in practice, but setting proper thresholds seems like it ought to be rather tricky.
Avoiding the OOM killer with mem_notify Posted Feb 1, 2008 3:03 UTC (Fri) by vomlehn (subscriber, #45588) [Link] > I'm guessing mem_notify will try to wake apps sufficiently far ahead that it can avoid those "need RAM to free RAM" situations in practice, but setting proper thresholds seems like it ought to be rather tricky I don't see how the kernel can possibly know enough for it to notify applications far enough ahead about the need to free memory; the memory allocation behavior of applications is just too unpredictable. An approach that seems like it would be better would be to notify the kernel that certain pages in your application are being used to cache data. The kernel is then free to simply grab them if it needs them. If your application decides it needs the data later, it uses as sytem call to notify the kernel that the pages are no longer being used as a cache. If the kernel didn't need the pages, they would still have their old data and the application could use them directly. On the other hand, if the kernel did have to grab the pages in the interim, the system call used to grab the pages back would return an error. Your application would then know it needs to remap the pages and regenerate the data. Of course, it's possible the pages can't be remapped because memory is too low. The application would handle that as though the data wasn't cached and it couldn't get the memory to read it. It already has to be able to do this, so this doesn't add to the application's complexity. The advantages of this approach are that the pages are immediately available to the kernel without having to wake the process up. No need to figure out complex threshholds, no need to allocate enough memory for the process to run, no delay in making the needed memory available. You could even allow for priorities when telling the kernel the pages are being used for cache so that the kernel would grab lower priority pages first. I wish I had the time to code this and submit it because I think that mem_notify is an awful botch that will cause unending pain as people add patch on patch to try to make it work. But that's just my personal opinion...
Avoiding the OOM killer with mem_notify Posted Feb 1, 2008 19:47 UTC (Fri) by zlynx (subscriber, #2285) [Link]
Applications can already see if they're missing memory pages by using mincore().
You may not have been carefully reading the mem_notify patch descriptions.
What it does is trigger on memory pages going into the inactive list. This is what happens to
prepare memory pages that are good candidates for swapping.
Here is the Changelog from version 5 of the mem_notify patch, see the v3 changes:
Changelog
-------------------------------------------------
v4 -> v5 (by KOSAKI Motohiro)
o rebase to 2.6.24-rc8-mm1
o change display order of /proc/zoneinfo
o ignore very small zone
o support fcntl(F_SETFL, FASYNC)
o fix some trivial bugs.
v3 -> v4 (by KOSAKI Motohiro)
o rebase to 2.6.24-rc6-mm1
o avoid wake up all.
o add judgement point to __free_one_page().
o add zone awareness.
v2 -> v3 (by Marcelo Tosatti)
o changes the notification point to happen whenever
the VM moves an anonymous page to the inactive list.
o implement notification rate limit.
v1(oom notify) -> v2 (by Marcelo Tosatti)
o name change
o notify timing change from just swap thrashing to
just before thrashing.
o also works with swapless device.
Avoiding the OOM killer with mem_notify Posted Feb 1, 2008 4:57 UTC (Fri) by ikm (subscriber, #493) [Link] > When memory gets tight, it is quite possible that applications have memory allocated—often caches for better performance—that they could free. Not many programs actually have any adjustable caches. More often it would be some useless unreclaimed junk (and of course i'm talking java here). So this change should primarily benefit them, I guess.
Avoiding the OOM killer with mem_notify Posted Feb 1, 2008 13:27 UTC (Fri) by nix (subscriber, #2304) [Link] I don't think I've ever written a substantial program that didn't have *some* sort of caching in it, to trade off space against time somewhere where `spend the time, every time' was undesirable. Often these caches are not expected to be terribly large, and have the exciting expiration policy `never', but it would be fairly trivial to respond to a mem_notify signal by just ditching the entire contents of all of those caches.
Avoiding the OOM killer with mem_notify Posted Feb 1, 2008 21:36 UTC (Fri) by droundy (subscriber, #4559) [Link] There are certain obscure programs like firefox and gimp that have very large caches which could be dumped under pressure.
Avoiding swap IO with mem_notify Posted Feb 2, 2008 17:49 UTC (Sat) by riel (subscriber, #3142) [Link] The patch series is indeed designed primarily to increase system performance by avoiding the IO penalty of swapping out (and back in) memory that contains data that is useless or can be easily recalculated. Decompressing (part of) a jpeg just has to be faster than swapping in something from disk, simply because disk seek times are on the order of 10ms. Avoiding the OOM killer is a secondary goal. I am not sure why that is the headline of the article...
Avoiding swap IO with mem_notify Posted Feb 3, 2008 4:14 UTC (Sun) by njs (subscriber, #40338) [Link] Oh! This makes *much* more sense. (Especially the otherwise unintelligible part of the original article that talks about pages getting swapped out, which has nothing to do with OOM.) In fairness, though, the LKML patch announcement just talks about it being good to avoid the OOM.
Avoiding swap IO with mem_notify Posted Feb 3, 2008 21:02 UTC (Sun) by oak (subscriber, #2786) [Link] The article talks also about embedded systems. Those use use flash which doesn't suffer from the seek problem like hard disks do. On embedded memory usage is much more of a problem though and kernel gets pretty slow too on devices without swap when memory gets really tight (all kernel does is page read-only pages from disk to memory and then discard them again until it finally does an OOM-kill). I thought the point of the patch is for user-space to be able to do the memory management in *manageable places* in code. As mentioned earlier, a lot of user-space code[1] doesn't handle memory allocation failures. And even if it's supposed to be, it can be hard to verify (test) that the failures are handled in *all* cases properly. If user-space can get a pre-notification of a low-memory situation, it can in suitable place in code free memory so that further allocations will succeed (with higher propability). That also allows doing somehing like what maemo does. If system gets notified about kernel low memory shortage, it kills processes which have notified it that they are in "background-killable" state (saved their UI state, able to restore it and not currently visible to user). I think it also notifies applications (currently) through D-BUS about low memory condition. Applications visible to user or otherwise non-background killable are then supposed to free their caches and/or disable features that could take a lot of additional memory. If the caches are from heap instead of memory mapped, it's less likely to help because of heap fragmentation and it requiring more work/time though. [1] Glib and anything built on top of it, like Gtk, assume that if process is still running, it got the memory, otherwise it's aborted.
Avoiding the OOM killer with mem_notify Posted Feb 7, 2008 16:04 UTC (Thu) by ringerc (guest, #3071) [Link] As a (fairly bad) application developer I'm not sure I understand how I can use this effectively. Say I'm notified that the kernel is running low on RAM. Furthermore, say I have a 265MB block - a single allocation - and a bunch of small heap allocations before and after it in the heap. Assuming I can afford to throw away the 256MB allocation I can delete() or free() it; however, since I have more memory both higher and lower in the heap I don't see how the OS can reclaim the RAM. I presume it relies on the huge allocation being a separate memory mapping (where anonymous mmap() has been used by operator new or by malloc() ) that can be unmapped as per /proc/self/maps? So ... what about if my cache is a tree of heap-allocated objects of variable sizes (say, a tree of polymorphic subtype instances)? Free()ing these objects is unlikely to help, since many pages will have other things allocated on them too, and in any case the OS has no way to know if a given page is free. This is especially likely when using the libstdc++ new(), which (I understand) internally pre-allocates chunks of memory and parcels them out without invoking lower level memory manegement. My understanding was that in this case all the kernel can do is swap out the inactive page(s). I presume that to benefit from mem_notify I'd need to modify the application to perform single large allocations dedicated only to use for this particular cache, presumably either managing it with a C++ allocator interface (with the STL/Boost/etc) or manually manage allocations within the block? We all know it's often cleaner to use a container or an allocator in C++ when you want to manage lots of small allocations of a particular type (especially when each is a fixed size but they happen at unpredictable times) ... but many people don't, and in fact don't understand them at all. It doesn't help that the STL and core language provide absolutely no flexible pool allocators etc; you have to go to Boost for those, and many OSS / Free Software projects are strangely reluctant to add a dependency on Boost. What might help would be if there was a tiny portable C/C++ library (suitable to be compiled directly into apps) that provided entirely standard and highly portable routines for non-Linux platforms, and on Linux provided mem-pressure-aware C++ allocators and C memory allocation calls + notifier callback hooks. All with identical interfaces, of course, though the non-Linux ones would never actually detect and respond to memory pressure (unless other platform support was added). A set of C++ "cache allocators" for various usage patterns that could monitor memory pressure and automatically notify interested parties then invalidate and release themselves would be particularly cool. -- Craig Ringer
|
Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.