User: Password:
|
|
Subscribe / Log in / New account

Taming the OOM killer

Taming the OOM killer

Posted Feb 5, 2009 22:22 UTC (Thu) by epa (subscriber, #39769)
In reply to: Taming the OOM killer by martinfick
Parent article: Taming the OOM killer

There may be some middle ground between eliminating overcommit altogether and continuing with the status quo. For example, fork() calls could continue to pretend that memory is available, on the assumption that the child process will soon exec() something or otherwise reduce its footprint; but if physical memory is tight then it's okay for the kernel to refuse memory allocation requests from a process (which is then passed up through the C library as a null return from malloc()). This might be more reliable than always having malloc() succeed, whether the OOM killer is turned on or not.

For those making embedded or high-availability systems who want to try harder and turn off overcommit altogether, fork-then-exec could be replaced in user space with posix_spawn or vfork-then-exec or similar.


(Log in to post comments)

Taming the OOM killer

Posted Feb 5, 2009 23:11 UTC (Thu) by martinfick (subscriber, #4455) [Link]

The OOM killer does not come into play when malloc is called. If malloc is called when there in no memory there is no need to kill any processes, malloc simply fails and return the appropriate error code.

The OOM killer kicks in when memory has been overcommitted through COW. Two processes are sharing the same memory region and one of them decides to write to that shared COW page requiring the page to now be copied. There is no memory allocating happening, simply a write to a memory page which is already allocated to a process (two of them actually).

Again, the fork then exec shortcut is not really the big deal, it is processes that fork and do not exec and then eventually write to a COW page.

Taming the OOM killer

Posted Feb 6, 2009 0:51 UTC (Fri) by nix (subscriber, #2304) [Link]

The OOM killer comes into play if memory is requested and is not
available, and the request is not failable. Several such allocations
spring to mind:

- when a process stack is grown

- when a fork()ed process COWs

- when a page in a private file-backed mapping is written to for the
first time

- when a nonswappable kernel resource needs to be allocated (other than a
cache) which cannot be discarded when memory pressure is high

- if overcommit_memory is set, if a page from the heap or an anonymous
mmap() is requested for the first time

So the OOM killer is *always* needed, even if overcommitting were disabled
as much as possible. (You can overcommit disk space, too: thanks to sparse
files, you can run out of disk space writing to the middle of a file. With
some filesystems, e.g. NTFS, you can run out of disk space by renaming a
file, triggering a tree rebalance and node allocation when there's not
enough disk space left. NTFS maintains an emergency pool for this
situation, but it's only so large...)

Taming the OOM killer

Posted Feb 6, 2009 0:58 UTC (Fri) by martinfick (subscriber, #4455) [Link]

Why when a process' stack is grown? In this case the process should fail (die?) just like when malloc would fail, but there should be no reason to upset other processes in the system!

Taming the OOM killer

Posted Feb 6, 2009 1:26 UTC (Fri) by dlang (subscriber, #313) [Link]

with malloc you can check the return code to see if it failed or not and handle the error

how would you propose that programmers handle an error when they allocate a variable? (which is one way to grow the stack)

Taming the OOM killer

Posted Feb 6, 2009 1:38 UTC (Fri) by brouhaha (subscriber, #1698) [Link]

The process should get a segfault or equivalent signal. If there is a handler for the signal, but the handler can't be invoked due to lack of stack space, the process should be killed. If the mechanism to signal the process in a potential out-of-stack situation is too complex to be practically implemented in the kernel, then the process should be killed without attempting to signal it.

At no point should the OOM killer become involved, because there is no reason to propagate the error outside the process (other than by another process noticing that the process in question has exited). A principle of reliable systems is confining the consequences of an error to the minimum area necessary, and killing some other randomly-selected (or even heuristically-selected) process violates that principle.

Taming the OOM killer

Posted Feb 6, 2009 5:26 UTC (Fri) by njs (guest, #40338) [Link]

> At no point should the OOM killer become involved, because there is no reason to propagate the error outside the process (other than by another process noticing that the process in question has exited).

This makes sense on the surface, but memory being a shared resource means that everything is horribly coupled no matter what and life isn't that simple.

You have 2 gigs of memory.

Process 1 and process 2 are each using 50 megabytes of RAM.

Then Process 1 allocates another 1948 megabytes.

Then Process 2 attempts to grow its stack by 1 page, but there is no memory.

The reason the OOM exists is that it makes no sense to blame Process 2 for this situation. And if you did blame Process 2, then the system would still be hosed and a few minutes later you'd have to kill off Process 3, Process 4, etc., until you got lucky and hit Process 1.

Taming the OOM killer

Posted Feb 7, 2009 17:52 UTC (Sat) by oak (guest, #2786) [Link]

Stack is actually yet another reason for overcommit. Current Linux
kernels map by default 8MB of stack for each thread (and usually threads
use only something like 4-8KB of that). Without overcommit, process with
16 threads couldn't run in 128MB RAM, unless you change this limit. I
think you can change it only from kernel source and it applies to all
processes/threads in the system?

Taming the OOM killer

Posted Feb 8, 2009 15:26 UTC (Sun) by nix (subscriber, #2304) [Link]

setrlimit (RLIMIT_STACK,...);

Taming the OOM killer

Posted Feb 12, 2009 14:32 UTC (Thu) by epa (subscriber, #39769) [Link]

The OOM killer does not come into play when malloc is called. If malloc is called when there in no memory there is no need to kill any processes, malloc simply fails and return the appropriate error code.
Ah, I didn't realize that. From the way people talk it sounded as though malloc() would always succeed and then the process would just blow up trying to use the memory. If the only memory overcommit is COW due to fork() then it's not so bad (though I still think some kind of vfork() would be a more hygienic practice).

Taming the OOM killer

Posted Feb 12, 2009 15:23 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

malloc() isn't implemented by the kernel, you might do better to listen to someone who knows what they're talking about. :/

Taming the OOM killer

Posted Feb 12, 2009 16:06 UTC (Thu) by dlang (subscriber, #313) [Link]

vfork tends to be strongly discouraged nowdays. it can be used safely, but it's easy to not use safely.

there are a growing number of such functions in C nowdays as people go back and figure out where programmers commonly get it wrong and provide functions that are much harder to misuse (the case that springs to mind are the string manipulation routines)


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds