Taming the OOM killer
Taming the OOM killer
Posted Feb 5, 2009 10:17 UTC (Thu) by epa (subscriber, #39769)In reply to: Taming the OOM killer by dlang
Parent article: Taming the OOM killer
any process that forks and execs allocates more memory than it needs.Quite. Which is why a single fork-and-exec-child-process system call is needed. With that, there would be much less need to overcommit memory and so a better chance of avoiding hacks like the OOM killer.
The classical Unix design of separate fork() and exec() is elegant at first glance, but in practice it has caused various unpleasant kludges to cope with the memory overcommit. (Another one was vfork(), which IIRC was a fork call that used less memory but only worked as long as you promise to call exec() immediately afterwards. Why they didn't make a single fork-plus-exec primitive rather than this crufty interface eludes me.)
Posted Feb 5, 2009 11:16 UTC (Thu)
by iq-0 (subscriber, #36655)
[Link] (3 responses)
I don't know if filehandle closing is allowed after vfork, but this would also be a great help to ensure the right file handles are passed (which would be a tedious operation or really complex/verbose in the case of a single fork-and-exec call).
Posted Feb 5, 2009 14:08 UTC (Thu)
by epa (subscriber, #39769)
[Link] (2 responses)
But this is a small minority of the cases when an external command is run. And running an external command accounts for a large proportion of total fork()s. I'm just suggesting to make something more robust (avoiding the need to overcommit memory) for the common case.
Posted Feb 5, 2009 17:03 UTC (Thu)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Feb 5, 2009 22:05 UTC (Thu)
by epa (subscriber, #39769)
[Link]
Perhaps another way to avoid the need for memory allocation would be to use a new vfork-like call (heck, it could even be called vfork) that has a fixed memory budget set as a matter of policy system-wide. So when you vfork(), the memory is set up as copy-on-write, but the child process has a budget of at most 1000 pages it can scribble on. That should be enough to set up the necessary file descriptors, but if it tries to dirty more than its allowance it is summarily killed.
That way, there is some upper limit to the amount of memory that needs to be allocated - when vfork()ing the kernel just needs to ensure 1000 free pages - and the kernel doesn't have to make a (possibly untrustworthy) promise that the whole process address space is available for normal use.
Posted Feb 5, 2009 11:49 UTC (Thu)
by alonz (subscriber, #815)
[Link] (2 responses)
This isn't really simpler (except on block diagrams…)
Posted Feb 5, 2009 12:52 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (1 responses)
Posted Feb 5, 2009 20:00 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link]
Posted Feb 5, 2009 16:55 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Feb 5, 2009 17:40 UTC (Thu)
by martinfick (subscriber, #4455)
[Link] (12 responses)
Posted Feb 5, 2009 22:22 UTC (Thu)
by epa (subscriber, #39769)
[Link] (11 responses)
For those making embedded or high-availability systems who want to try harder and turn off overcommit altogether, fork-then-exec could be replaced in user space with posix_spawn or vfork-then-exec or similar.
Posted Feb 5, 2009 23:11 UTC (Thu)
by martinfick (subscriber, #4455)
[Link] (10 responses)
The OOM killer kicks in when memory has been overcommitted through COW. Two processes are sharing the same memory region and one of them decides to write to that shared COW page requiring the page to now be copied. There is no memory allocating happening, simply a write to a memory page which is already allocated to a process (two of them actually).
Again, the fork then exec shortcut is not really the big deal, it is processes that fork and do not exec and then eventually write to a COW page.
Posted Feb 6, 2009 0:51 UTC (Fri)
by nix (subscriber, #2304)
[Link] (6 responses)
- when a process stack is grown
- when a fork()ed process COWs
- when a page in a private file-backed mapping is written to for the
- when a nonswappable kernel resource needs to be allocated (other than a
- if overcommit_memory is set, if a page from the heap or an anonymous
So the OOM killer is *always* needed, even if overcommitting were disabled
Posted Feb 6, 2009 0:58 UTC (Fri)
by martinfick (subscriber, #4455)
[Link] (5 responses)
Posted Feb 6, 2009 1:26 UTC (Fri)
by dlang (guest, #313)
[Link] (2 responses)
how would you propose that programmers handle an error when they allocate a variable? (which is one way to grow the stack)
Posted Feb 6, 2009 1:38 UTC (Fri)
by brouhaha (subscriber, #1698)
[Link] (1 responses)
At no point should the OOM killer become involved, because there is no reason to propagate the error outside the process (other than by another process noticing that the process in question has exited). A principle of reliable systems is confining the consequences of an error to the minimum area necessary, and killing some other randomly-selected (or even heuristically-selected) process violates that principle.
Posted Feb 6, 2009 5:26 UTC (Fri)
by njs (subscriber, #40338)
[Link]
This makes sense on the surface, but memory being a shared resource means that everything is horribly coupled no matter what and life isn't that simple.
You have 2 gigs of memory.
Process 1 and process 2 are each using 50 megabytes of RAM.
Then Process 1 allocates another 1948 megabytes.
Then Process 2 attempts to grow its stack by 1 page, but there is no memory.
The reason the OOM exists is that it makes no sense to blame Process 2 for this situation. And if you did blame Process 2, then the system would still be hosed and a few minutes later you'd have to kill off Process 3, Process 4, etc., until you got lucky and hit Process 1.
Posted Feb 7, 2009 17:52 UTC (Sat)
by oak (guest, #2786)
[Link] (1 responses)
Posted Feb 8, 2009 15:26 UTC (Sun)
by nix (subscriber, #2304)
[Link]
Posted Feb 12, 2009 14:32 UTC (Thu)
by epa (subscriber, #39769)
[Link] (2 responses)
Posted Feb 12, 2009 15:23 UTC (Thu)
by tialaramex (subscriber, #21167)
[Link]
Posted Feb 12, 2009 16:06 UTC (Thu)
by dlang (guest, #313)
[Link]
there are a growing number of such functions in C nowdays as people go back and figure out where programmers commonly get it wrong and provide functions that are much harder to misuse (the case that springs to mind are the string manipulation routines)
Taming the OOM killer
Taming the OOM killer
This way you can even implement a simple 'if exec(a) fails try exec(b) or otherwise flag error with (possibly program specific) meaningful data'...
Oh sure, sometimes you will want to do something more complex like that. In these cases vfork() (as in classic BSD) doesn't work, because the child process is using memory belonging to the parent. The traditional fork() then exec() is best.
Taming the OOM killer
for which you need at least dup()s and filehandle manipulation between
fork() and exec(): and the nature of such manipulation differs for each
invoker...
Taming the OOM killer
<sarcasm>Taming the OOM killer
So you would prefer the VMS/Win32 style “CreateProcess” system call, with its 30+ arguments—just in order to accommodate all possible behaviors expected from the parent process?
</sarcasm>Taming the OOM killer
(From SUSv2 / POSIX draft.)
Taming the OOM killer
The vfork() function has the same effect as fork(2), except that the behavior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(), or returns from the function in which vfork() was called, or calls any other function before successfully calling _exit(2) or one of the exec(3) family of functions.
On the other hand, pid_t child = clone(run_child_func, run_child_stack, CLONE_VM, run_child_data)
would do the trick. The child would share memory with the parent, making overcommit unnecessary, but would have a different file descriptor table, allowing pipelines to be set up easily.
Taming the OOM killer
and piping. The call you want exists in POSIX, with the inelegant name
posix_spawn*(), but it's a nightmare of overdesign and a huge family of
functions precisely because it has to model all the things people usually
want to do between fork() and exec(). Its only real use is in embedded
MMUless systems in which fork() is impractical or impossible to implement.
Taming the OOM killer
Taming the OOM killer
Taming the OOM killer
Taming the OOM killer
available, and the request is not failable. Several such allocations
spring to mind:
first time
cache) which cannot be discarded when memory pressure is high
mmap() is requested for the first time
as much as possible. (You can overcommit disk space, too: thanks to sparse
files, you can run out of disk space writing to the middle of a file. With
some filesystems, e.g. NTFS, you can run out of disk space by renaming a
file, triggering a tree rebalance and node allocation when there's not
enough disk space left. NTFS maintains an emergency pool for this
situation, but it's only so large...)
Taming the OOM killer
Taming the OOM killer
The process should get a segfault or equivalent signal. If there is a handler for the signal, but the handler can't be invoked due to lack of stack space, the process should be killed. If the mechanism to signal the process in a potential out-of-stack situation is too complex to be practically implemented in the kernel, then the process should be killed without attempting to signal it.
Taming the OOM killer
Taming the OOM killer
Taming the OOM killer
kernels map by default 8MB of stack for each thread (and usually threads
use only something like 4-8KB of that). Without overcommit, process with
16 threads couldn't run in 128MB RAM, unless you change this limit. I
think you can change it only from kernel source and it applies to all
processes/threads in the system?
Taming the OOM killer
Taming the OOM killer
The OOM killer does not come into play when malloc is called. If malloc is called when there in no memory there is no need to kill any processes, malloc simply fails and return the appropriate error code.
Ah, I didn't realize that. From the way people talk it sounded as though malloc() would always succeed and then the process would just blow up trying to use the memory. If the only memory overcommit is COW due to fork() then it's not so bad (though I still think some kind of vfork() would be a more hygienic practice).
Taming the OOM killer
Taming the OOM killer