User: Password:
|
|
Subscribe / Log in / New account

Taming the OOM killer

Taming the OOM killer

Posted Feb 5, 2009 8:00 UTC (Thu) by brouhaha (subscriber, #1698)
In reply to: Taming the OOM killer by dlang
Parent article: Taming the OOM killer

Sure, but there's no reason not to do both Copy On Write AND count the memory that theoretically might be needed by the fork() as committed. If there isn't enough memory/swap to handle that, the fork() should fail. The user should have enough memory and/or swap that this isn't a problem.

Assuming that when a big process does a fork() that it is going to exec() a small process is completely absurd. Sure, that may happen fairly often, but the case where a small process does a fork() and does an exec() of a large process also happens fairly often.

Usually somewhere in this discussion someone says "but what about embedded systems", which I claim actually supports my position, because it is even MORE important in an embedded system for there to (1) be sufficient memory/swap, and (2) not let the OOM killer nuke some random process if there isn't.


(Log in to post comments)

Taming the OOM killer

Posted Feb 5, 2009 8:06 UTC (Thu) by dlang (subscriber, #313) [Link]

swap isn't free, especially on embedded systems, so providing enough memory+swap to handle your total allocation may not be reasonable.

also, actually _using_ swap can be extremely painful, sometime more painful than simply crashing the system (at least that you can have watchdogs for and failover/reboot)

in theory you are right, the system is unpredictable with overcommit enabled. in practice it is reliable enough for _many_ uses

Taming the OOM killer

Posted Feb 5, 2009 8:22 UTC (Thu) by brouhaha (subscriber, #1698) [Link]

swap isn't free, especially on embedded systems
I have yet to use an embedded Linux system that didn't have substantially more "disk" than RAM, except one case in which there was no disk but plenty of RAM. I somewhat question the wisdom of designing an embedded Linux system for which there is little RAM and even less disk.

In an embedded system, I wouldn't expect there to be a large number of processes sleeping between a fork() and exec(). If there were, that would most likely be a sign of serious problems, so having a fork() fail under such circumstances seems like a good thing.

also, actually _using_ swap can be extremely painful, sometime more painful than simply crashing the system
Sure, but when copy-on-write is used for the fork()/exec() case that seems to be what people are worried about, the swap won't actually be used. It will just be reserved until the exec().

If you're concerned about system performance degrading because of excessive swap usage, there's no reason why you can't have a user-space process to act as a watchdog for that problem, which may occur for reasons other than memory "overcommit".

I've been involved in engineering a number of embedded products that had to have high reliability in the field, and I would not dream of shipping such a product with the kernel configured to allow memory overcommitment. Even though you can _usually_ get away with it, "usually" isn't good enough. There simply needs to be enough memory (and/or swap) to handle the worst-case requirements of the system. Otherwise it _will_ fail in the field, and thus not be as reliable as intended.

Taming the OOM killer

Posted Feb 5, 2009 8:33 UTC (Thu) by dlang (subscriber, #313) [Link]

one point I was trying to make (and apparently failed) is that even on systems where you have the disk space for swap, having things use that swap can be a big problem

if you could allocate the address space but then tell the kernel "don't really use it" you may be ok, but how is that different from the current overcommit?

you _are_ overcommiting (compared to what is acceptable to the system's performance) and counting on the efficiancies of COW to keep you from actually using the swap space you have comitted.

the only difference is that you overcommit up to a given point (at which time your allocations start failing, which may also cause the system to 'fail' as far as the user is concerned)

i fully agree that there are situations where disabling overcommit is the right thing to do. However, I am also seeing other cases where allowing overcommit is the right thing to do.

Taming the OOM killer

Posted Feb 5, 2009 9:07 UTC (Thu) by brouhaha (subscriber, #1698) [Link]

if you could allocate the address space but then tell the kernel "don't really use it"
I'm not telling the kernel "don't use it". If the kernel needs to, it will use it. For the primary case people seem concerned with, the time between fork() and exec(), it will be committed but due to COW, it won't actually get used. It may still get used for other cases, and within reason that's a good thing, but a user-space daemon can take some system-specific corrective action if it gets out of hand. This provides a whole lot more flexibility in error handling than a user-space daemon that would only control the behavior of the OOM killer.
you may be ok, but how is that different from the current overcommit?
It's different because the kernel is NEVER going to kill an unrelated process selected by a heuristic. It is going to fail an allocation or fork, and the software can take some reasonable recovery action.

The system should not be designed or configured such that the kernel can fail to provide memory that has been committed, because there is NO reasonable recovery mechanism for that. It is far easier to handle memory allocation problems gracefully when the error is reported at the time of the attempt to commit the memory, rather than at some random future time.

Taming the OOM killer

Posted Feb 5, 2009 9:16 UTC (Thu) by dlang (subscriber, #313) [Link]

so you would rather have the system slow to an unusable crawl if it actually tries to use all the memory that has been asked for rather than have _anything_ killed under _any_ conditions.

there are times for that, but there are also times when 99.999% reliability is good enough.

Taming the OOM killer

Posted Feb 5, 2009 10:23 UTC (Thu) by epa (subscriber, #39769) [Link]

The current setup (where memory is overcommitted and there is an OOM killer) is also quite capable of slowing the system to an unusable crawl if you have swap space in use. So I don't think that turning off overcommit and allocating a slightly larger amount of swap would make the situation any worse.

(On a related note, the kernel is free to refuse any request for extra memory, and can do so for its own reasons. So for example if a process needs to fork() then the memory allocation would normally succeed, on the assumption that the extra memory probably won't be used, but provided there is enough swap space to back it up just in case. Whereas an explicit memory allocation 'I want ten gigabytes' could, as a matter of policy, be denied if the system doesn't have that much physical RAM.)

Taming the OOM killer

Posted Feb 5, 2009 21:03 UTC (Thu) by dlang (subscriber, #313) [Link]

I'm not talking about 'I need 10G of memory' allocations, I'm talking about cases where lots of small programs end up using individually small amounts of memory, but the total is large.

but if you have large programs that may need to fork, it's not nessasarily the case that it's 'a slightly larger amount of swap'. I've seen people arguing your point of view toss off that a large system should be willing to dedicate a full 1TB drive just to swap so that it can turn overcommit off. in practice, if you end up using more than a gig or so of swap your system slows to a crawl

Taming the OOM killer

Posted Feb 5, 2009 22:26 UTC (Thu) by epa (subscriber, #39769) [Link]

I think it would be useful to add swap space for 'emergency only' use. So if all physical RAM is free, the kernel starts refusing user space requests for more memory. However if a process wants to fork() the kernel can let it succeed, knowing that in the worst case there is swap space to back its promises.

It is rather a problem that merely adding swap space as available means it can then be used by applications just as willingly as physical RAM. Perhaps a per-process policy flag would say whether an app can have its memory allocation requests start going to swap (as opposed to getting 0 from malloc() when physical RAM is exhausted). Then sysadmins could switch this flag on for particular processes that need it.

Taming the OOM killer

Posted Feb 6, 2009 0:45 UTC (Fri) by nix (subscriber, #2304) [Link]

The problem is that the system is more dynamic than that. Swap space is
moved to and from physical memory on demand; there is almost never much
free physical memory, because free memory is wasted memory, so the first
sign you get that you're about to run out of memory is when you're out of
*swap* and still allocating more (reducing the various caches and paging
text pages out as you go).

Taming the OOM killer

Posted Feb 5, 2009 12:42 UTC (Thu) by michaeljt (subscriber, #39183) [Link]

I would rather have my system slow to an unusable crawl if I was confident that it would come out of it again at some point. Even then, I can still press the reset button, which is what I have usually ended up doing in OOM situations anyway. And the same way as you can tune the behaviour of the OOM killer, you could also tune which applications the system tries to keep responsive, so that you can reasonably quickly manually kill (or just stop) the offending processes.

Taming the OOM killer

Posted Feb 5, 2009 15:44 UTC (Thu) by hppnq (guest, #14462) [Link]

I would rather have my system slow to an unusable crawl if I was confident that it would come out of it again at some point. Even then, I can still press the reset button, which is what I have usually ended up doing in OOM situations anyway.

On your home system this makes some sense, but all this goes out the window once you have to take service levels into account.

Taming the OOM killer

Posted Feb 6, 2009 7:46 UTC (Fri) by michaeljt (subscriber, #39183) [Link]

Granted, but then you don't want random processes dying either. That can also have adverse affects on service levels. In that case you are more likely to want a system that will stop allocating memory in time.

Taming the OOM killer

Posted Feb 6, 2009 8:58 UTC (Fri) by dlang (subscriber, #313) [Link]

it's actually far easier to deal with processes dieing then the entire machine effectivly locking up in a swap storm.

you probably already have tools in place to detect processes dieing and either restart them (if the memory preasure is temporary) or failover to another box (gracefully for all the other processes on the box)

Taming the OOM killer

Posted Jul 15, 2014 2:27 UTC (Tue) by bbulkow (guest, #87167) [Link]

When the random process is SSHD, few tools continue to function. Yes, I've seen this in production multiple times. I wish that most server distributions did not allow over commit, and/or SSHD was protected. I also wish the OOM killer system messages were clearer.

Taming the OOM killer

Posted Jul 15, 2014 2:52 UTC (Tue) by dlang (subscriber, #313) [Link]

turning off overcommit would cause more memory allocation failures (because the memory system would say that it couldn't guarantee memory that ends up never being used)

True, it would happen at malloc() time instead of randomly, but given that most programs don't check return codes, this would help less than it should

Taming the OOM killer

Posted Jul 15, 2014 9:41 UTC (Tue) by dgm (subscriber, #49227) [Link]

> but given that most programs don't check return codes

IMHO, this should be treated like a bug.

> the memory system would say that it couldn't guarantee memory that ends up *never being used*

This too.

Taming the OOM killer

Posted Jul 15, 2014 19:11 UTC (Tue) by dlang (subscriber, #313) [Link]

>> but given that most programs don't check return codes

> IMHO, this should be treated like a bug.

you have a right to your opinion, but in practice, your opinion doesn't matter that much

>> the memory system would say that it couldn't guarantee memory that ends up *never being used*

> This too.

exactly how would you expect the linux kernel to know that the application that just forked is never going to touch some of the memory of the parent and therefor doesn't need it to be duplicated (at least in allocation)?

this is especially important for large programs that are forking so that the child can then exec some other program. In this case you may have a multi-GB allocation that's not needed because the only thing the child does is to close some file discripters and exec some other program. With the default overcommit and Copy-on-Write, this 'just works', but with overcommit disabled, the kernel needs to allocate the multiple GB of RAM (or at least virtual memory) just in case the application is going to need it. This will cause failures if the system doesn't have a few extra GB around to handle these wasteful allocations.

not to mention that there's overhead in updating the global allocations, so allocating and then deallocating memory like that has a cost.

Taming the OOM killer

Posted Jul 16, 2014 11:23 UTC (Wed) by dgm (subscriber, #49227) [Link]

> exactly how would you expect the linux kernel to know that the application that just forked is never going to touch some of the memory of the parent and therefor doesn't need it to be duplicated (at least in allocation)?

What about telling it that you're just about to call execv, so it doesn't need to? What about auto-detecting this by simply watching what the first syscall after fork is?

Not bad for just 15 seconds of thinking about it, isn't it?

Taming the OOM killer

Posted Jul 16, 2014 12:05 UTC (Wed) by JGR (subscriber, #93631) [Link]

The very first syscall after fork is not necessarily execv, fds are often closed/set up just beforehand.
Even if execv is called immediately (for some value of immediately), the parent may well have scribbled over the memory which holds the parameters to be passed to execv in the child, before the child has called execv.
If it's really essential that nothing should be duplicated, you can still use vfork.

Taming the OOM killer

Posted Jul 16, 2014 12:18 UTC (Wed) by dgm (subscriber, #49227) [Link]

Not to mention that overcommit is not the same as CoW. You can keep CoW and still disable overcommit (there's even a knob for that).

Taming the OOM killer

Posted Jul 16, 2014 15:07 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> Not to mention that overcommit is not the same as CoW. You can keep CoW and still disable overcommit....

CoW is still a form of overcommit, even if it's not referred to as such. In the one case you commit to allocating a new page in the future, on the first write, and pre-filling it with a copy of an existing page. In the other case you commit to allocating a new page in the future, probably on the first write, and pre-filling it with zeros. In both cases you're writing an IOU for memory which may not actually exist when it's needed.

You could pre-allocate memory for CoW while deferring the actual copy, but that would only be a performance optimization. You'd still have the problem that fork() may fail in a large process for lack of available memory even though the child isn't going to need most of it.

Taming the OOM killer

Posted Jul 16, 2014 14:06 UTC (Wed) by mpr22 (subscriber, #60784) [Link]

close(0); close(1); close(2); dup2(childendofsocket, 0); dup2(childendofsocket, 1); dup2(childendofsocket, 2); close(parentendofsocket); execve(/*args*/); _exit(255);

Taming the OOM killer

Posted Jul 16, 2014 18:50 UTC (Wed) by nix (subscriber, #2304) [Link]

Even if you checked allocator return codes perfectly, it still wouldn't help: you can OOM calling a function if there isn't enough memory to expand the stack, even in the absence of overcommit. Nothing you can do about *that* (other than to 'pre-expand' the stack with a bunch of do-nothing function calls early in execution, and hope like hell you expanded it enough).

Taming the OOM killer

Posted Jul 17, 2014 14:22 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> other than to 'pre-expand' the stack with a bunch of do-nothing function calls early in execution, and hope like hell you expanded it enough

Also that you don't expand it too much and crash in your stack_balloon function.

Taming the OOM killer

Posted Jul 15, 2014 14:44 UTC (Tue) by raven667 (subscriber, #5198) [Link]

sshd should be auto-restarted by systemd which should help save the system if OOM killer is running rampant.

Taming the OOM killer

Posted Feb 5, 2009 21:06 UTC (Thu) by dlang (subscriber, #313) [Link]

the problem is that a system that goes heavily into swap may not come back out for hours or days.

if you are willing to hit reset in this condition then you should be willing to deal with the OOM killer killing the box under the same conditions.

Taming the OOM killer

Posted Feb 6, 2009 8:00 UTC (Fri) by michaeljt (subscriber, #39183) [Link]

Is I said, perhaps some work could be put into improving this situation then rather than improving the OOM killer. Like using the same heuristics they are developing for the killer to determine processes to freeze and move completely into swap, freeing up memory for other processes. This is of course somewhat easier to correct if the heuristics go wrong (unless they go badly wrong of course, and take down the X server or whatever) than if the process is just shot down.

Taming the OOM killer

Posted Feb 12, 2009 19:14 UTC (Thu) by efexis (guest, #26355) [Link]

There's no reason for OOM killer to kick in if there's swap available, stuff can just be swapped out (swapping may need memory, which case you set a watermark where swapping is forced before free memory drops below that point, to ensure that swapping can happen). OOM means exactly what it says - you're out of memory, silicon or magnetic it makes no difference.

Personally I have swap disabled or set very low, as a runaway process will basically mean I lose contact with a server, unable to log in to it or anything, until it has finished chewing through all available memory *and* swap (causing IO starvation, IO being the thing I need to log in and kill the offending task) until it hits the limit and gets killed.

Everything important is set to be restarted, either directly from init, or indirectly from daemontools or equivalent, which is restarted by init should it go down (which has never happened).

Taming the OOM killer

Posted Feb 13, 2009 23:33 UTC (Fri) by michaeljt (subscriber, #39183) [Link]

I have been thinking about this a bit more, since my system was just swapped to death again (and no, the OOM killer did not kick in). Has anyone tried setting a per-process memory limit in percentage of the total physical RAM? That would help limit the damage done by runaway processes without stopping large processes from forking.

Taming the OOM killer

Posted Feb 14, 2009 0:03 UTC (Sat) by dlang (subscriber, #313) [Link]

if you swapped to death and OOM didn't kick in, you have probably allocated more swap than you are willing to have used.

how much swap did you allocate? any idea how much was used?

enabling overcommit with small amounts of swap will allow large programs to fork without problems, but will limit runaway processes. it's about the textbook case for using overcommit.

Taming the OOM killer

Posted Feb 16, 2009 9:04 UTC (Mon) by michaeljt (subscriber, #39183) [Link]

> how much swap did you allocate? any idea how much was used?

Definitely too much (1 GB for 2 GB of RAM), as I realised after reading this: http://kerneltrap.org/node/3202. That page was also what prompted my last comment. It seems a bit strange to me that increasing swap size should so badly affect system performance in this situation, and I wondered whether this could be fixed with the right tweak, such as limiting the amount of virtual memory available to processes, say to a default of 80 percent of physical RAM. This would still allow for large processes to fork, but might catch runaway processes a bit earlier. I think that if I find some time, I will try to work out how to do that (assuming you don't answer in the mean time to tell me why that is a really bad idea, or that there already is such a setting).

Taming the OOM killer

Posted Feb 16, 2009 15:38 UTC (Mon) by dlang (subscriber, #313) [Link]

have you looked into setting the appropriate values in ulimit?

Taming the OOM killer

Posted Feb 17, 2009 8:23 UTC (Tue) by michaeljt (subscriber, #39183) [Link]

> have you looked into setting the appropriate values in ulimit?

Indeed. I set ulimit -v 1600000 (given that I have 2GB of physical RAM) and launched a known bad process (gnash on a page I know it can't cope with). gnash crashed after a few minutes, without even slowing down my system. I just wonder why this is not done by default. Of course, one could argue that this is a user or distribution problem, but given that knowledgeable people can change the value, why not in the kernel? (Again, to say 80% of physical RAM. I tried with 90% and gnash caused a noticeable performance degradation.) This is not a rhetorical question, I am genuinely curious.

Taming the OOM killer

Posted Feb 17, 2009 8:29 UTC (Tue) by dlang (subscriber, #313) [Link]

simple, the kernel doesn't know what is right for you. how can it know that you really don't want this program that you start to use all available ram (even at the expense of other programs)

the distro is in the same boat. if they configured it to do what you want, they would have other people screaming at them that they would rather see the computer slow down than have programs die (you even see people here arguing that)

Taming the OOM killer

Posted Feb 17, 2009 14:27 UTC (Tue) by michaeljt (subscriber, #39183) [Link]

> simple, the kernel doesn't know what is right for you. how can it know that you really don't want this program that you start to use all available ram (even at the expense of other programs)

It does take a decision though - to allow all programmes to allocate as much RAM as they wish by default, even if it is not present, is very definitely a policy decision. Interestingly Wine fails to start if I set ulimit -v in this way (I can guess why). I wonder whether disabling overcommit would also prevent it from working?


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds