User: Password:
|
|
Subscribe / Log in / New account

Taming the OOM killer

Taming the OOM killer

Posted Feb 5, 2009 21:06 UTC (Thu) by dlang (subscriber, #313)
In reply to: Taming the OOM killer by michaeljt
Parent article: Taming the OOM killer

the problem is that a system that goes heavily into swap may not come back out for hours or days.

if you are willing to hit reset in this condition then you should be willing to deal with the OOM killer killing the box under the same conditions.


(Log in to post comments)

Taming the OOM killer

Posted Feb 6, 2009 8:00 UTC (Fri) by michaeljt (subscriber, #39183) [Link]

Is I said, perhaps some work could be put into improving this situation then rather than improving the OOM killer. Like using the same heuristics they are developing for the killer to determine processes to freeze and move completely into swap, freeing up memory for other processes. This is of course somewhat easier to correct if the heuristics go wrong (unless they go badly wrong of course, and take down the X server or whatever) than if the process is just shot down.

Taming the OOM killer

Posted Feb 12, 2009 19:14 UTC (Thu) by efexis (guest, #26355) [Link]

There's no reason for OOM killer to kick in if there's swap available, stuff can just be swapped out (swapping may need memory, which case you set a watermark where swapping is forced before free memory drops below that point, to ensure that swapping can happen). OOM means exactly what it says - you're out of memory, silicon or magnetic it makes no difference.

Personally I have swap disabled or set very low, as a runaway process will basically mean I lose contact with a server, unable to log in to it or anything, until it has finished chewing through all available memory *and* swap (causing IO starvation, IO being the thing I need to log in and kill the offending task) until it hits the limit and gets killed.

Everything important is set to be restarted, either directly from init, or indirectly from daemontools or equivalent, which is restarted by init should it go down (which has never happened).

Taming the OOM killer

Posted Feb 13, 2009 23:33 UTC (Fri) by michaeljt (subscriber, #39183) [Link]

I have been thinking about this a bit more, since my system was just swapped to death again (and no, the OOM killer did not kick in). Has anyone tried setting a per-process memory limit in percentage of the total physical RAM? That would help limit the damage done by runaway processes without stopping large processes from forking.

Taming the OOM killer

Posted Feb 14, 2009 0:03 UTC (Sat) by dlang (subscriber, #313) [Link]

if you swapped to death and OOM didn't kick in, you have probably allocated more swap than you are willing to have used.

how much swap did you allocate? any idea how much was used?

enabling overcommit with small amounts of swap will allow large programs to fork without problems, but will limit runaway processes. it's about the textbook case for using overcommit.

Taming the OOM killer

Posted Feb 16, 2009 9:04 UTC (Mon) by michaeljt (subscriber, #39183) [Link]

> how much swap did you allocate? any idea how much was used?

Definitely too much (1 GB for 2 GB of RAM), as I realised after reading this: http://kerneltrap.org/node/3202. That page was also what prompted my last comment. It seems a bit strange to me that increasing swap size should so badly affect system performance in this situation, and I wondered whether this could be fixed with the right tweak, such as limiting the amount of virtual memory available to processes, say to a default of 80 percent of physical RAM. This would still allow for large processes to fork, but might catch runaway processes a bit earlier. I think that if I find some time, I will try to work out how to do that (assuming you don't answer in the mean time to tell me why that is a really bad idea, or that there already is such a setting).

Taming the OOM killer

Posted Feb 16, 2009 15:38 UTC (Mon) by dlang (subscriber, #313) [Link]

have you looked into setting the appropriate values in ulimit?

Taming the OOM killer

Posted Feb 17, 2009 8:23 UTC (Tue) by michaeljt (subscriber, #39183) [Link]

> have you looked into setting the appropriate values in ulimit?

Indeed. I set ulimit -v 1600000 (given that I have 2GB of physical RAM) and launched a known bad process (gnash on a page I know it can't cope with). gnash crashed after a few minutes, without even slowing down my system. I just wonder why this is not done by default. Of course, one could argue that this is a user or distribution problem, but given that knowledgeable people can change the value, why not in the kernel? (Again, to say 80% of physical RAM. I tried with 90% and gnash caused a noticeable performance degradation.) This is not a rhetorical question, I am genuinely curious.

Taming the OOM killer

Posted Feb 17, 2009 8:29 UTC (Tue) by dlang (subscriber, #313) [Link]

simple, the kernel doesn't know what is right for you. how can it know that you really don't want this program that you start to use all available ram (even at the expense of other programs)

the distro is in the same boat. if they configured it to do what you want, they would have other people screaming at them that they would rather see the computer slow down than have programs die (you even see people here arguing that)

Taming the OOM killer

Posted Feb 17, 2009 14:27 UTC (Tue) by michaeljt (subscriber, #39183) [Link]

> simple, the kernel doesn't know what is right for you. how can it know that you really don't want this program that you start to use all available ram (even at the expense of other programs)

It does take a decision though - to allow all programmes to allocate as much RAM as they wish by default, even if it is not present, is very definitely a policy decision. Interestingly Wine fails to start if I set ulimit -v in this way (I can guess why). I wonder whether disabling overcommit would also prevent it from working?


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds