User: Password:
|
|
Subscribe / Log in / New account

Taming the OOM killer

Taming the OOM killer

Posted Feb 5, 2009 15:44 UTC (Thu) by hppnq (guest, #14462)
In reply to: Taming the OOM killer by michaeljt
Parent article: Taming the OOM killer

I would rather have my system slow to an unusable crawl if I was confident that it would come out of it again at some point. Even then, I can still press the reset button, which is what I have usually ended up doing in OOM situations anyway.

On your home system this makes some sense, but all this goes out the window once you have to take service levels into account.


(Log in to post comments)

Taming the OOM killer

Posted Feb 6, 2009 7:46 UTC (Fri) by michaeljt (subscriber, #39183) [Link]

Granted, but then you don't want random processes dying either. That can also have adverse affects on service levels. In that case you are more likely to want a system that will stop allocating memory in time.

Taming the OOM killer

Posted Feb 6, 2009 8:58 UTC (Fri) by dlang (subscriber, #313) [Link]

it's actually far easier to deal with processes dieing then the entire machine effectivly locking up in a swap storm.

you probably already have tools in place to detect processes dieing and either restart them (if the memory preasure is temporary) or failover to another box (gracefully for all the other processes on the box)

Taming the OOM killer

Posted Jul 15, 2014 2:27 UTC (Tue) by bbulkow (guest, #87167) [Link]

When the random process is SSHD, few tools continue to function. Yes, I've seen this in production multiple times. I wish that most server distributions did not allow over commit, and/or SSHD was protected. I also wish the OOM killer system messages were clearer.

Taming the OOM killer

Posted Jul 15, 2014 2:52 UTC (Tue) by dlang (subscriber, #313) [Link]

turning off overcommit would cause more memory allocation failures (because the memory system would say that it couldn't guarantee memory that ends up never being used)

True, it would happen at malloc() time instead of randomly, but given that most programs don't check return codes, this would help less than it should

Taming the OOM killer

Posted Jul 15, 2014 9:41 UTC (Tue) by dgm (subscriber, #49227) [Link]

> but given that most programs don't check return codes

IMHO, this should be treated like a bug.

> the memory system would say that it couldn't guarantee memory that ends up *never being used*

This too.

Taming the OOM killer

Posted Jul 15, 2014 19:11 UTC (Tue) by dlang (subscriber, #313) [Link]

>> but given that most programs don't check return codes

> IMHO, this should be treated like a bug.

you have a right to your opinion, but in practice, your opinion doesn't matter that much

>> the memory system would say that it couldn't guarantee memory that ends up *never being used*

> This too.

exactly how would you expect the linux kernel to know that the application that just forked is never going to touch some of the memory of the parent and therefor doesn't need it to be duplicated (at least in allocation)?

this is especially important for large programs that are forking so that the child can then exec some other program. In this case you may have a multi-GB allocation that's not needed because the only thing the child does is to close some file discripters and exec some other program. With the default overcommit and Copy-on-Write, this 'just works', but with overcommit disabled, the kernel needs to allocate the multiple GB of RAM (or at least virtual memory) just in case the application is going to need it. This will cause failures if the system doesn't have a few extra GB around to handle these wasteful allocations.

not to mention that there's overhead in updating the global allocations, so allocating and then deallocating memory like that has a cost.

Taming the OOM killer

Posted Jul 16, 2014 11:23 UTC (Wed) by dgm (subscriber, #49227) [Link]

> exactly how would you expect the linux kernel to know that the application that just forked is never going to touch some of the memory of the parent and therefor doesn't need it to be duplicated (at least in allocation)?

What about telling it that you're just about to call execv, so it doesn't need to? What about auto-detecting this by simply watching what the first syscall after fork is?

Not bad for just 15 seconds of thinking about it, isn't it?

Taming the OOM killer

Posted Jul 16, 2014 12:05 UTC (Wed) by JGR (subscriber, #93631) [Link]

The very first syscall after fork is not necessarily execv, fds are often closed/set up just beforehand.
Even if execv is called immediately (for some value of immediately), the parent may well have scribbled over the memory which holds the parameters to be passed to execv in the child, before the child has called execv.
If it's really essential that nothing should be duplicated, you can still use vfork.

Taming the OOM killer

Posted Jul 16, 2014 12:18 UTC (Wed) by dgm (subscriber, #49227) [Link]

Not to mention that overcommit is not the same as CoW. You can keep CoW and still disable overcommit (there's even a knob for that).

Taming the OOM killer

Posted Jul 16, 2014 15:07 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> Not to mention that overcommit is not the same as CoW. You can keep CoW and still disable overcommit....

CoW is still a form of overcommit, even if it's not referred to as such. In the one case you commit to allocating a new page in the future, on the first write, and pre-filling it with a copy of an existing page. In the other case you commit to allocating a new page in the future, probably on the first write, and pre-filling it with zeros. In both cases you're writing an IOU for memory which may not actually exist when it's needed.

You could pre-allocate memory for CoW while deferring the actual copy, but that would only be a performance optimization. You'd still have the problem that fork() may fail in a large process for lack of available memory even though the child isn't going to need most of it.

Taming the OOM killer

Posted Jul 16, 2014 14:06 UTC (Wed) by mpr22 (subscriber, #60784) [Link]

close(0); close(1); close(2); dup2(childendofsocket, 0); dup2(childendofsocket, 1); dup2(childendofsocket, 2); close(parentendofsocket); execve(/*args*/); _exit(255);

Taming the OOM killer

Posted Jul 16, 2014 18:50 UTC (Wed) by nix (subscriber, #2304) [Link]

Even if you checked allocator return codes perfectly, it still wouldn't help: you can OOM calling a function if there isn't enough memory to expand the stack, even in the absence of overcommit. Nothing you can do about *that* (other than to 'pre-expand' the stack with a bunch of do-nothing function calls early in execution, and hope like hell you expanded it enough).

Taming the OOM killer

Posted Jul 17, 2014 14:22 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> other than to 'pre-expand' the stack with a bunch of do-nothing function calls early in execution, and hope like hell you expanded it enough

Also that you don't expand it too much and crash in your stack_balloon function.

Taming the OOM killer

Posted Jul 15, 2014 14:44 UTC (Tue) by raven667 (subscriber, #5198) [Link]

sshd should be auto-restarted by systemd which should help save the system if OOM killer is running rampant.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds