Toward a smarter OOM killer
Toward a smarter OOM killer
Posted Nov 4, 2009 17:02 UTC (Wed) by holstein (guest, #6122)In reply to: Toward a smarter OOM killer by mjthayer
Parent article: Toward a smarter OOM killer
For a server, that would let the sysadmin log the event, react to it, etc.
For a desktop, one can imagine a popup warning the user, perhaps with a list of memory-hog processes. This will let the user use the session support of Firefox to restart it's browsing session...
In many case, restarting anew one guilty process can be enough to prevent OOM killing spree.
Posted Nov 4, 2009 17:22 UTC (Wed)
by mjthayer (guest, #39183)
[Link] (37 responses)
I think it might be doable by making the algorithm choosing pages to swap out more smart. So that each process was guaranteed a certain amount of resident memory (depending on the number of other processes and logged in users and probably a few other factors), and that a process that tried to hog memory would end up swapping out its own pages when other running processes dropped to their guaranteed minimum. If I ever have time, I will probably even try coding that up.
Posted Nov 4, 2009 18:12 UTC (Wed)
by jabby (guest, #2648)
[Link] (36 responses)
STOP USING the old "TWICE RAM" guideline!
That dates back to when 64MB of RAM was considered "beefy". I seriously had to provision a server last night with 16GB of physical memory and the customer wanted a 32GB swap partition!! Seriously?! If your system is still usable after you're 2 to 4 gigs into your swap, I'd be shocked.
Posted Nov 4, 2009 18:21 UTC (Wed)
by mjthayer (guest, #39183)
[Link] (1 responses)
Not sure that 32GB of swap would be appropriate even then though...
Posted Nov 6, 2009 11:40 UTC (Fri)
by patrick_g (subscriber, #44470)
[Link]
Posted Nov 4, 2009 18:31 UTC (Wed)
by clugstj (subscriber, #4020)
[Link]
It all depends on the workload.
Posted Nov 4, 2009 18:36 UTC (Wed)
by ballombe (subscriber, #9523)
[Link] (4 responses)
Posted Nov 5, 2009 6:54 UTC (Thu)
by gmaxwell (guest, #30048)
[Link] (3 responses)
Or using tmpfs for /tmp
Posted Nov 5, 2009 11:11 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link]
Posted Nov 5, 2009 18:25 UTC (Thu)
by khc (guest, #45209)
[Link] (1 responses)
Posted Nov 5, 2009 18:46 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Nov 4, 2009 18:49 UTC (Wed)
by knobunc (guest, #4678)
[Link] (2 responses)
Posted Nov 4, 2009 19:59 UTC (Wed)
by zlynx (guest, #2285)
[Link] (1 responses)
I don't know, I haven't run a Linux laptop in almost a year now since an X.org bug killed my old laptop by overheating it.
Posted Nov 5, 2009 18:22 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Nov 4, 2009 21:43 UTC (Wed)
by drag (guest, #31333)
[Link] (11 responses)
I expect the only time it'll want to use swap in a busy system is if the
Posted Nov 4, 2009 22:54 UTC (Wed)
by mjthayer (guest, #39183)
[Link] (10 responses)
The algorithm is roughly as follows.
* Assign each process a contingent of main memory, e.g. by dividing the total available by the number of users with active running processes, and giving each process an equal share of the contingent of the user running it.
This should mean that if a process starts leaking memory badly or whatever, after a while it will just loop evicting its own pages and not trouble the other processes on the system. It should also mean that all not-too-large processes on the system should stay reasonably snappy, making it easier to find and kill the out-of-control process.
Posted Nov 4, 2009 22:56 UTC (Wed)
by mjthayer (guest, #39183)
[Link]
Posted Nov 7, 2009 1:21 UTC (Sat)
by giraffedata (guest, #1954)
[Link] (8 responses)
Algorithms for this were popular in the 1970s for batch systems. Unix systems were born as interactive systems where the idea of not dispatching a process at all for ten seconds was less palatable than making the user kill some stuff or reboot, but with Unix now used for more diverse things, I'm surprised Linux has never been interested in long term scheduling to avoid page thrashing.
Posted Nov 7, 2009 3:39 UTC (Sat)
by tdwebste (guest, #18154)
[Link]
On embedded devices I have constructed processing states with runit to control the running processes. This simple but effective long term scheduling to avoid out memory/swapping, works well when you know in advance what processes will be running on the device.
Posted Nov 7, 2009 10:01 UTC (Sat)
by dlang (guest, #313)
[Link] (1 responses)
but back in the 70's they realized that most of the time most programs don't use all their memory at any one time. so the odds are pretty good that the page of ram that you swap out will not be needed right away.
and the poor programming practices that are commone today make this even more true
Posted Nov 7, 2009 17:06 UTC (Sat)
by giraffedata (guest, #1954)
[Link]
I think you didn't follow the scenario. We're specifically talking about a page that is likely to be needed right away. It's a page that the normal page replacement policy would have left alone because it expected it to be needed soon -- primarily because it was accessed recently.
But the proposed policy would steal it anyway, because the process that is expected to need it is over its quota and the policy doesn't want to harm other processes that aren't.
What was known in the 70s was that at any one time, a program has a subset of memory it accesses a lot, which was dubbed its working set. We knew that if we couldn't keep a process' working set in memory, it was wasteful to run it at all. It would page thrash and make virutally no progress. Methods abounded for calculating the working set size, but the basic idea of keeping the working set in memory or nothing was constant.
Posted Nov 9, 2009 9:02 UTC (Mon)
by mjthayer (guest, #39183)
[Link] (4 responses)
While we are on the subject, does anyone reading this know where RSS quotas are handled in the current kernel code? I was able to find the original patches enabling them, but the code seems to have changed out of recognition since then.
Posted Nov 9, 2009 12:34 UTC (Mon)
by hppnq (guest, #14462)
[Link]
Personally, I would hate to think that my system spends valuable resources managing runaway processes. ;-)
Posted Nov 13, 2009 22:32 UTC (Fri)
by efexis (guest, #26355)
[Link] (1 responses)
So, what I would want is something that assumes that most of the system is being well behaved, but will quickly chop off anything that is not, and will stop the badly bahaved stuff from dragging the well behaved stuff down with it. The well behaved stuff quite simply doesn't need managing; that's my job. The badly behaved stuff needs taking care of quickly, by something that your idea seems to reflect *perfectly* (it's not often you read someones ideas and your brain flips "that's -exactly- what I need").
How would I find out if you do get chance to hammer out the code that achieves this? Is there an non-LKML route to watch this (please don't say twitter :-p )
Posted Nov 16, 2009 13:45 UTC (Mon)
by mjthayer (guest, #39183)
[Link]
Posted Nov 16, 2009 13:50 UTC (Mon)
by mjthayer (guest, #39183)
[Link]
Posted Nov 4, 2009 23:49 UTC (Wed)
by jond (subscriber, #37669)
[Link] (2 responses)
Posted Nov 6, 2009 8:46 UTC (Fri)
by iq-0 (subscriber, #36655)
[Link]
Posted Nov 18, 2009 16:12 UTC (Wed)
by pimlottc (guest, #44833)
[Link]
Posted Nov 5, 2009 17:52 UTC (Thu)
by sbergman27 (guest, #10767)
[Link]
I do use the twice ram rule. I'd rather the system get slow than crash or have the OOM
Posted Nov 5, 2009 21:14 UTC (Thu)
by anton (subscriber, #25547)
[Link]
Posted Nov 6, 2009 6:35 UTC (Fri)
by motk (subscriber, #51120)
[Link] (1 responses)
You were saying? :)
Posted Nov 6, 2009 8:43 UTC (Fri)
by mjthayer (guest, #39183)
[Link]
Posted Nov 18, 2009 16:47 UTC (Wed)
by pimlottc (guest, #44833)
[Link] (2 responses)
Posted Nov 18, 2009 18:37 UTC (Wed)
by dlang (guest, #313)
[Link] (1 responses)
nowdays it depends on your system and how you use it.
if you use the in-kernel suspend code, the act of suspending will write all your ram into the swap space, so swap must be > ram
if you don't use the in-kernel suspend code you need as much swap as you intend to use. How much swap you are willing to use depends very much on your use case. for most people a little bit of swap in use doesn't hurt much to use and by freeing up additional ram results in a overall faster system. for other people the unpredictable delays in applications due to the need to pull things from swap is unacceptable. In any case, having a lot of swap activity if pretty much unacceptable for anyone.
note that if you disable overcommit you need more swap or allocations (like when a large program forks) will fail so you need additional swap space > max memory footprint of any process you intend to allow to fork (potentially multiples of this). With overcommit disabled I could see you needing swap significantly higher than 2x ram in some conditions.
my recommendation is that if you are using a normal hard drive (usually including the SSD drives that emulate normal hard drives), allocate a 2G swap partition and leave overcommit enabled (and that's probably a lot larger than you will ever use)
if you are using a system that doesn't have a normal hard drive (usually this sort of thing has no more than a few gig of flash as it's drive) you probably don't want any swap, and definantly want to leave overcommit on.
Posted Nov 19, 2009 16:46 UTC (Thu)
by nye (subscriber, #51576)
[Link]
FWIW, I agree, except that I'd make it a file instead of a partition - it's just as fast, and it leaves some flexibility just in case.
I use a 2GB swapfile on machines ranging from 256MB to 8GB of RAM - it may be overkill but that much disk space costs next to nothing. I wouldn't want to set it higher, because if I'm really using swap to that extent, the machine's probably past the point of usability anyway.
Posted Nov 19, 2009 13:29 UTC (Thu)
by makomk (guest, #51493)
[Link] (2 responses)
Posted Nov 19, 2009 18:33 UTC (Thu)
by dlang (guest, #313)
[Link] (1 responses)
in both cases you have to read from disk to continue, the only difference is if you are reading from the swap space or the initial binary (and since both probably require seeks, it's not even a case of random vs sequential disk access)
Posted Dec 5, 2009 17:19 UTC (Sat)
by misiu_mp (guest, #41936)
[Link]
Trashing is what happens when processes get their pages continuously swapped in and out as the system schedules them to run. Thats when everything grinds to a halt because each context switch or memory access needs to swap out some memory in order to make place for some other memory to be read in from the swap or the binary.
Using up swap alone does not affect performance much, if you dont access whats on the swap. If you continuously do that - that's trashing.
Posted Nov 5, 2009 13:45 UTC (Thu)
by hppnq (guest, #14462)
[Link]
Just as an exercise, of course. ;-)
Toward a smarter OOM killer
death by swap
death by swap
>>> I think Ubuntu still do that by default :)death by swap
Yes Ubuntu still do that by default...despite many bug reports like mine.
Note that my bug report is old, very old (pre-Gutsy time) => https://bugs.launchpad.net/ubuntu/+source/partman-auto/+bug/134505
No reaction at all from the Ubuntu devs....very discouraging.
death by swap
death by swap
death by swap
death by swap
death by swap
death by swap
swap :)
death by swap
death by swap
death by swap
doesn't mean you *must*.
death by swap
amount of swap space will dictate how likely or how much swap space the
Linux kernel wants to use.
active amount of used memory exceeds the amount of main memory.
death by swap
* When a page of memory is to be evicted to the swap file, make sure that it has either not been accessed for a certain minimum length of time, or that the process owning it is over it's contingent, or that it is owned by the process on who's behalf it is to be swapped out. If not, search for a new page to evict.
death by swap
If you get to the point that you're stealing a page from a process simply because that process is over its quota of real memory, you should steal ALL that process' pages. It can't fit its working set into memory, so it isn't going to make decent progress, so the memory you do give it is wasted. You're also wasting the swap I/O it's doing. After a while, after other processes have had a chance to progress, you can swap them out and give the first process the memory it needs. If you can't do that because it's run amok and simply demands more memory than you can afford, that's when you kill that process.
death by swap
death by swap
death by swap
death by swap
but back in the 70's they realized that most of the time most programs don't use all their memory at any one time. so the odds are pretty good that the page of ram that you swap out will not be needed right away.
death by swap
I suppose I see three cases here. One is that the page was part of the process' working set at an earlier point in time, but no longer is. In that case swapping it out is the right thing to do. The other is that the process is in control, but it's working set is bigger than the available memory. Then I agree that there is a good case for putting it on hold until enough memory is available, although that is a non-trivial problem which is somewhat outside of the scope of what I am trying to do. And the third case is the one that I am interested in - a runaway process which will eventually be OOMed. In this case, the quota will stop it from trampling on the working set of every other process in memory in the meantime.
You may want to look at Documentation/cgroups/memory.txt. Otherwise, it seems there is no way to enforce RSS limits. Rik van Riel wrote a patch a few years ago but it seems to have been dropped.
death by swap
** Encouragement encouragement encouragement **
** Encouragement encouragement encouragement **
death by swap
>I suppose I see three cases here. One is that the page was part of the process' working set at an earlier point in time, but no longer is. In that case swapping it out is the right thing to do. The other is that the process is in control, but it's working set is bigger than the available memory. Then I agree that there is a good case for putting it on hold until enough memory is available, although that is a non-trivial problem which is somewhat outside of the scope of what I am trying to do. And the third case is the one that I am interested in - a runaway process which will eventually be OOMed. In this case, the quota will stop it from trampling on the working set of every other process in memory in the meantime.
Actually case 2 could be handled to some extent by lowering the priority of a process that kept on swapping for too long.
death by swap
lots of swap, especially in cheapo VMs. There's a whole raft of programs
that you cannot start with say, 256M RAM and little swap without overcommit.
Mutt and irssi are two that spring to mind. Lots of swap lets you
"overcommit" with the risk being you end up swapping rather than you end up
going on a process killing spree.
death by swap
The only reason this couldn't be a sane default is that on systems with 32MB a overcommit_ratio of 1000% is still too small (but still if you have 32MB and no swap, your probably still better off with this limit)
death by swap
death by swap
fine even though swap space usage was usually 8+ GB. We ran this way for months with no
complaints. Just because you're using a lot of swap space doesn't mean you are paging
excessively. Note that if a page is paged out and then brought back into memory, it stays
written in swap to save writting it again if that page gets pages out again. You can't tell a
whole lot about how much swap you are *really* using by looking at the swap used number.
systat monitoring and sar -W are more useful than the swap used number for accessing
swapping.
running loose on it.
I have seen several cases where a process slowly consumed more and
more memory, but apparently always had a small working set, so it
eventually consumed all the swap space and the OOM killer killed it
(sometimes it killed other processes first, though). The machine was so usable
during this, that I did not notice that anything was amiss until some
process was missing. IIRC one of these cases happened on a machine
with 24GB RAM and 48GB swap; there it took several days until the swap
space was exhausted.
death by swap
death by swap
Memory: 64G real, 2625M free, 62G swap in use, 40G swap free
death by swap
death by swap
STOP USING the old "TWICE RAM" guideline!
You know, I've been hearing this lately, but the problem is there seems to be no consensus on what the guideline should be. Some swear by no swap at all, while others say running without at least some is dangerous. No one seems to agree on what an appropriate amount is. Until there is a new accepted rule of thumb, everyone will keep using the old one, even if it's wrong.
death by swap
death by swap
death by swap
death by swap
death by swap
I would presume that executables do not make up much of the used memory. So reusing their pages will probably not be much gain.
That can possibly happen when the total working set (actively used memory) of the busy processes exceeds the amount of ram, or more realistically, when the swap (nearly) runs out so there is nowhere to evict unused pages to free up ram - leaving space for only small chunks to run at a time.
Usually (in my desktop experience) soon after that the oom killer start kicking in, which causes the system to trash even more (as the oom have needs too) and it takes hours for it to be done.
When it happens I usually have no choice but to reboot loosing some data, so for me the oom killer has been useless and over-commitment the root of all evil.
What could work for you, is to run a dummy process (allocate memory as you like) and have that killed first by the OOM (use Evgeniy Polyakov's patch), so it would 1) notify the administrator that the system has run into this problem, and 2) free up enough memory so something can actually be done about it.
Toward a smarter OOM killer
