LWN.net Logo

Respite from the OOM killer

Respite from the OOM killer

Posted Sep 30, 2004 15:32 UTC (Thu) by hppnq (subscriber, #14462)
In reply to: Respite from the OOM killer by copsewood
Parent article: Respite from the OOM killer

However, system admins who want a more reliably-performing system have long known that they need to provide adequate memory and disk resources in any case.

And that they should forbid overcommitting memory. ;-)


(Log in to post comments)

Respite from the OOM killer

Posted Oct 2, 2004 20:35 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

However, system admins who want a more reliably-performing system have long known that they need to provide adequate memory and disk resources in any case.
And that they should forbid overcommitting memory. ;-)

Does forbidding overcommitting memory make a more reliably performing system? When you forbid overcommitting memory, all you do is make a different process fail at a different time. A process that's minding its own business, using a small amount of memory and doing something very important fails when its fork() gets "out of memory." And this happens even though there's only a 1% chance that letting the fork() go through would lead to trouble. And it happens to dozens of applications while one broken application sucks up all the virtual memory resources.

But in the overcommiting case, the program would work fine and 1% of the time some other process which is likely to be doing something unimportant and/or likely to be the cause of the memory shortage dies.

I think you could say the overcommitting system is performing more reliably.

Respite from the OOM killer

Posted Oct 4, 2004 17:22 UTC (Mon) by im14u2c (subscriber, #5246) [Link]

...and both are broken.

But, just like my car, which currently idles rough, has an exhaust leak, and the "service engine" light's on, it still gets me to and from work.

The difference is in the failure mode. Do you degrade gracefully, or do you start to blow up at the first sign of error? If you're a user, you probably want graceful degradation--you can tolerate some excessive swapping to a point, and if it gets too bad, you reboot. At least OOo didn't implode taking your document with it. If you're a developer, you probably want to know ASAP something's wrong so you can fix it.

Thankfully, my car still runs (albeit not entirely happily), rather than flashing "service engine" and shutting down.

Respite from the OOM killer

Posted Oct 5, 2004 3:50 UTC (Tue) by mbp (guest, #2737) [Link]

OK, graceful degradation is nice. But it's hard to tell whether overcommit helps or hurts.

Even car designers have this problem: some modern cars will refuse to start if the engine is getting into a state where there is a chance of permanent damage. If it's approaching zero oil pressure, I think I would rather have an electronic cutout than an engine seizure.

Respite from the OOM killer

Posted Oct 6, 2004 15:51 UTC (Wed) by giraffedata (subscriber, #1954) [Link]

If you're a user, you probably want graceful degradation--you can tolerate some excessive swapping to a point, and if it gets too bad, you reboot. At least OOo didn't implode taking your document with it.

True, but that's not an option with either of the cases being discussed -- overcommit or no overcommit. This choice comes into play only when there's no place left to swap to.

The no-overcommit case can cause OOo to fail more gracefully. If OOo is written to tolerate a failed fork (et al) and give you the chance to kill some other process and then save your document, then no-overcommit could be a good thing for OOo.

On the other hand, if you don't have the technical skills to find the right process to kill, you're going to have to reboot anyway and lose your document. By contrast, with overcommit, you probably wouldn't have lost the document. Either there never would have been a crisis in the first place, or the OOM killer would have killed some other process and let OOo continue normally.

Respite from the OOM killer

Posted Oct 8, 2004 20:05 UTC (Fri) by tmcguire (guest, #25295) [Link]

You know, back in the old days when I was working with AIX (3.2.5, if that means anything to you), it had the policy of overcommitting memory and then randomly killing processes when it discovered that it was out.

Of course, the process that it seemed to kill first was always inetd, which made the system completely useless and didn't take up much resources anyway. So AIX had to go on and kill other stuff, too.

And naturally system calls like sbrk would never fail, so no application had any opportunity to gracefully handle any problems. But then, no developer ever had the interest or the incentive to actually handle errors, so the situation was nicely symmetrical.

One of the programs best known for allocating memory and then not using it (in large amounts) was the X server, so turning off overallocation wasn't really an option. Fixing *that* bug probably wasn't an option either.

It's always nice to see modern systems learning from their elders, so to speak. But if Linux is going to repeat previous mistakes, it really should go all the way. It's much more fun. Or has someone introduced SIGDANGER*, and I've just missed the memo?

* The SIGDANGER signal was sent to all processes just before the OOM killer started to work. Theoretically, a process handling SIGDANGER could reduce its memory allocation. If it had time. And, if the programmer wanted to. The inetd maintainers apparently didn't. Or, a process could have a handler for SIGDANGER and then just ignore it---the OOM killer would skip any process that handled SIGDANGER.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds