LWN.net Logo

Taming the OOM killer

Taming the OOM killer

Posted Feb 6, 2009 1:26 UTC (Fri) by dlang (✭ supporter ✭, #313)
In reply to: Taming the OOM killer by martinfick
Parent article: Taming the OOM killer

with malloc you can check the return code to see if it failed or not and handle the error

how would you propose that programmers handle an error when they allocate a variable? (which is one way to grow the stack)


(Log in to post comments)

Taming the OOM killer

Posted Feb 6, 2009 1:38 UTC (Fri) by brouhaha (guest, #1698) [Link]

The process should get a segfault or equivalent signal. If there is a handler for the signal, but the handler can't be invoked due to lack of stack space, the process should be killed. If the mechanism to signal the process in a potential out-of-stack situation is too complex to be practically implemented in the kernel, then the process should be killed without attempting to signal it.

At no point should the OOM killer become involved, because there is no reason to propagate the error outside the process (other than by another process noticing that the process in question has exited). A principle of reliable systems is confining the consequences of an error to the minimum area necessary, and killing some other randomly-selected (or even heuristically-selected) process violates that principle.

Taming the OOM killer

Posted Feb 6, 2009 5:26 UTC (Fri) by njs (guest, #40338) [Link]

> At no point should the OOM killer become involved, because there is no reason to propagate the error outside the process (other than by another process noticing that the process in question has exited).

This makes sense on the surface, but memory being a shared resource means that everything is horribly coupled no matter what and life isn't that simple.

You have 2 gigs of memory.

Process 1 and process 2 are each using 50 megabytes of RAM.

Then Process 1 allocates another 1948 megabytes.

Then Process 2 attempts to grow its stack by 1 page, but there is no memory.

The reason the OOM exists is that it makes no sense to blame Process 2 for this situation. And if you did blame Process 2, then the system would still be hosed and a few minutes later you'd have to kill off Process 3, Process 4, etc., until you got lucky and hit Process 1.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds