The existence of OOM is one of the few really stupid things in Linux

Posted Nov 6, 2009 8:55 UTC (Fri) by epa (subscriber, #39769)
In reply to: The existence of OOM is one of the few really stupid things in Linux by nix
Parent article: Toward a smarter OOM killer

At the moment if a 500Mbyte process forks itself the kernel has no idea whether it's about to exec() something else, in which case almost all those pages in the child process will be discarded, or if the child is going to continue on its way, in which case the pages are going to be needed and may well be written to. That ambiguity leads to a default policy of allowing the fork to succeed, but when that turns out to be the wrong judgement the OOM killer has to run.

It would be better for applications to give the kernel more clues about their intention, so the kernel can make better decisions on memory management.

I agree that posix_spawn, like almost anything that comes out of a committee, is a complicated monster. Perhaps a better answer would be to refine the distinction between fork() and vfork(), or to introduce a new fork-like call fork_intend_to_exec_soon(). Then the kernel could know that for an ordinary fork() it has to be cautious and check all the required memory is available, while fork_intend_to_exec_soon() has the current optimistic behaviour.

The existence of OOM is one of the few really stupid things in Linux

Posted Nov 6, 2009 13:43 UTC (Fri) by nix (subscriber, #2304) [Link] (3 responses)

fork_intend_to_exec_soon() should be the default, because *most* forks are
rapidly followed by exec()s. Whatever you choose, getting it used by much
software would be a long slow slog :/

The existence of OOM is one of the few really stupid things in Linux

Posted Nov 6, 2009 19:14 UTC (Fri) by dlang (guest, #313) [Link] (2 responses)

and if an application misuses this and does fork_intend_to_exec_soon() and then doesn't exec soon, what would the penalty be?

if applications can misuse this without a penalty they will never get it right (especially when using it wrong will let their app keep running in cases where the fork would otherwise fail)

but forget the fork then exec situation for a moment and consider the real fork situation. for a large app, most of the memory will never get modified by either process, and so even there it will almost never use the 2x memory that has been reserved for it.

The existence of OOM is one of the few really stupid things in Linux

Posted Nov 8, 2009 21:21 UTC (Sun) by epa (subscriber, #39769) [Link] (1 responses)

I don't think it matters much if a few slightly-buggy applications use the wrong variant. If 90% of userspace including the most important programs such as shells passes the right hint to the kernel, the kernel can make better decisions than it does now, and the need for the OOM killer will be reduced. It's a similar situation with raw I/O, for example: a disk-heavy program such as a database server might know that it will scan through a large file just once. Ordinarily this file's contents might clog up the page cache and evict more useful things. To help get more consistent performance, apps can be coded to hint to the kernel that it needn't bother to cache a particular I/O request. The default is still to cache it, and it's not catastrophic if one or two userspace programs haven't been tuned to use the new fancy hinting mechanism.

but forget the fork then exec situation for a moment and consider the real fork situation. for a large app, most of the memory will never get modified by either process, and so even there it will almost never use the 2x memory that has been reserved for it.

Very true, but of course there's no way for the kernel to know this. I expect most apps would prefer the fork to either succeed for sure, or fail at once if not enough memory can be guaranteed. There may be a few where optimistically hoping for the best and perhaps killing a random process later is the ideal behaviour.

The existence of OOM is one of the few really stupid things in Linux

Posted Nov 8, 2009 23:32 UTC (Sun) by nix (subscriber, #2304) [Link]

Yeah, but forking to exec immediately afterwards is the common case. If
you make that some weird nonportable new variant, 90% of programs are
never going to use it, and none of the rest will until some considerable
time has passed (time for this call to percolate down into the kernel and
glibc --- and try getting this call past Ulrich, ho ho.)

(Anyway, we *have* fork_to_exec_soon(). It's called vfork().)