User: Password:
|
|
Subscribe / Log in / New account

Blocking on allocation failure - WTF?

Blocking on allocation failure - WTF?

Posted Mar 10, 2011 8:27 UTC (Thu) by epa (subscriber, #39769)
Parent article: Delaying the OOM killer

Writing "1" to that file will disable the OOM killer within that group. Should an out-of-memory situation come about, the processes in the affected group will simply block when attempting to allocate memory until the situation improves somehow.
I understand some of the reasons for overcommitting memory when it's not known how much is really available or needed, but blocking on a definite out-of-memory seems just plain daft. If the kernel knows that no more memory is available why can't it pass that information on to user space?


(Log in to post comments)

Blocking on allocation failure - WTF?

Posted Mar 10, 2011 8:28 UTC (Thu) by epa (subscriber, #39769) [Link]

why can't it pass that information on to user space?
I should have said, why can't it pass the info back to the process that requested more memory? (rather than a third userspace OOM-killer process)

Blocking on allocation failure - WTF?

Posted Mar 10, 2011 11:06 UTC (Thu) by dgm (subscriber, #49227) [Link]

I think about the same. I see value in having a higher level control on which process get dumped, for instance, in systems composed of many processes where some are more expendable than others.

But you can "easily" simulate this with conventional means: in precious programs, use a custom version of malloc() that, on error, sends a kill signal to some of the processes on the expendable list.

Or I'm missing something?

Blocking on allocation failure - WTF?

Posted Mar 10, 2011 11:56 UTC (Thu) by alonz (subscriber, #815) [Link]

Yes, you are missing something.

What you miss is that in modern Linux, malloc() practically never fails. It only allocates virtual space, which is not yet backed by physical memory.

Memory is only actually allocated when the process first touches it—which can be arbitrarily late. (Plus there are many other kinds of memory allocation: breaking of COW pages created by fork(), allocation of page structures in the kernel, skbs, …)

Blocking on allocation failure - WTF?

Posted Mar 10, 2011 14:39 UTC (Thu) by epa (subscriber, #39769) [Link]

What you miss is that in modern Linux, malloc() practically never fails.
Right, but here is an instance where it does fail to allocate memory. It is known that the memory is not available right now (although of course it might become available at some point in the future). The malloc() API has provision for letting that be known to the application, by returning null. If the application wants to just hang until memory becomes available, that can easily be implemented in user-space (perhaps at the cost of a little busy-waiting or sleeping); but on the other hand if the application would like to find out when no more memory is available and do something, it's impossible to implement that on top of a malloc() interface that just blocks indefinitely.

Blocking on allocation failure - WTF?

Posted Mar 11, 2011 14:22 UTC (Fri) by droundy (subscriber, #4559) [Link]

The point is that malloc never does block indefinitely with overcommit enabled. It (essentially) always succeeds, and what blocks indefinitely is when you try to write to that shiny new virtual address space that was provided to you by malloc. At this point, the kernel realizes that there aren't any pages to provide you with, but it's too late for malloc to do anything to help you out, since malloc already succeeded.

Blocking on allocation failure - WTF?

Posted Mar 10, 2011 16:15 UTC (Thu) by dgm (subscriber, #49227) [Link]

I see. So much complexity... but is it worth it? Is there someone really using overcommit?

On a production environment it has to be a maintenance nightmare to have random tasks killed without warning. I would disable it right away. And the same on desktops. I just looked and Ubuntu disables it by default.

Memory overcommit

Posted Mar 10, 2011 16:48 UTC (Thu) by rvfh (subscriber, #31018) [Link]

It's not just worth it, it's practically impossible to live without it. If you needed to have the total amount of real memory you malloc, most systems would need twice as much RAM.

Think about memory pools for example...

Memory overcommit

Posted Mar 10, 2011 21:49 UTC (Thu) by epa (subscriber, #39769) [Link]

If you needed to have the total amount of real memory you malloc, most systems would need twice as much RAM.
Or just twice as much swap space - which is not an issue on a typical desktop or laptop hard disk these days. (On an SSD or mobile phone it may be different.)

If you have a huge swapfile but no overcommit, and if some applications allocate lots of memory and then unexpectedly start using all the memory they asked for, then your system will start swapping and running slowly. If you just overcommit memory, then when apps start using all the memory they allocated the system will become unstable and processes will be killed without warning by the OOM killer. It's clear which is preferable.

Memory overcommit

Posted Mar 11, 2011 0:37 UTC (Fri) by droundy (subscriber, #4559) [Link]

If you just overcommit memory, then when apps start using all the memory they allocated the system will become unstable and processes will be killed without warning by the OOM killer. It's clear which is preferable.

Surely you've run into the situation where you've been unable to log into a machine because it's swapping like crazy, and are thus unable to kill the offending process? Is that really preferable to being able to go in and fix things immediately? Of course, things are easier when you've got a desktop and are already logged in, but even then I've seen situations where just switching to a terminal took many, many minutes, not to suggest the possibility of opening a new terminal.

The large majority of OOM-killer experiences I've had have been situations where there was a memory leak involved. In such cases, the OOM killer is usually quite good at identifying the culprit and killing it. If you add enough swap, then the system freezes up indefinitely (or until you're able to get a killall typed into a terminal). Not a huge improvement in my book. In any case, it's not clear which is preferable.

Memory overcommit

Posted Mar 16, 2011 12:17 UTC (Wed) by epa (subscriber, #39769) [Link]

You're right, it is sometimes preferable to have applications be killed rather than being unable to log into the machine. But even here the OOM killer seems like a useful sticking plaster rather than fixing the real problem. It would be better to have 5% of physical memory reserved for root (or for an interactive 'task manager' that lets you kill misbehaving apps), in the same way that 5% of disk space was traditionally reserved.

If the I/O scheduler were a bit smarter, then the swapping activities of processes would count against their I/O usage, so a single bloated Firefox process would not be able to monopolize the disk to the exclusion of everything else. Similarly there could be more fairness in physical RAM allocation, so it wouldn't be possible for one app to consume all the physical memory pushing everything else into swap; it would be limited to say 80%. (This is reasonable for desktop systems, of course for servers or number-crunching you don't care so much about interactive performance so you'd increase that figure.)

Memory overcommit

Posted Mar 16, 2011 17:43 UTC (Wed) by nix (subscriber, #2304) [Link]

I'd say that you want to require 5% or whatever for root-owned apps *that belong to a terminal*. This stops a maddened root-owned daemon from bloating up and leaving you unable to log in again.

Memory overcommit

Posted Mar 11, 2011 7:10 UTC (Fri) by rvfh (subscriber, #31018) [Link]

Why use loads of swap space when I have 3GB of RAM to run a bunch of services?

A quick look at top:
Mem: 3095608k total, 2405948k used, 689660k free, 292560k buffers
Swap: 722920k total, 43508k used, 679412k free, 1034276k cached

So there are 689 MB of unused memory! Not to mention 1 GB+ of of cached stuff for the system to grab in case of need.

Now let's see the memory usage (top two):
238m 126m 26m /usr/bin/systemsettings
294m 95m 23m /usr/lib/firefox-3.6.14/firefox-bin

So the two main memory users (systemsettings!?!) have committed more than half a gig of RAM, but use less than a quarter (I know these numbers are not perfect, but the idea remains).

And you want me to get rid of overcommit???

Memory overcommit

Posted Mar 16, 2011 12:19 UTC (Wed) by epa (subscriber, #39769) [Link]

Isn't that the point? You have three gigabytes of RAM, more than enough to give every application all it needs and do so in a guaranteed way - not 'you can probably have this but you might be randomly killed later on depending on what else happens'.

Blocking on allocation failure - WTF?

Posted Mar 10, 2011 21:50 UTC (Thu) by nix (subscriber, #2304) [Link]

My firewall is a swapless embedded system with 512Mb RAM. Do I want overcommit turned on? I thought for perhaps as long as a microsecond before deciding 'hell, yes'.

Blocking on allocation failure - WTF?

Posted Mar 11, 2011 1:05 UTC (Fri) by dgm (subscriber, #49227) [Link]

I hate to reply to myself, but I just got it wrong. Overcommit in Ubuntu uses the kernel default, that is, do overcommit. As an experiment I turned it off (setting /proc/sys/vm/overcommit_memory to 2) and after that I couldn't start a new instance of chromium in a not very loaded system (a couple of terminals and Nautilus).

So apparently it's not only useful but critical.

Blocking on allocation failure - WTF?

Posted Mar 11, 2011 7:19 UTC (Fri) by Darkmere (subscriber, #53695) [Link]

Blame chrome there. They allocate a fair chunk of RAM to start with for their javascript memorypool. ( That wants to be continuous, iirc)

Basically, Chrome is designed around Overcommit. As are a few other apps.

Blocking on allocation failure - WTF?

Posted Mar 11, 2011 23:57 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

So apparently it's not only useful but critical.

Isn't the problem just that you adjusted /proc/sys/vm/overcommit_memory without making the corresponding adjustment in the amount of swap space?

Blocking on allocation failure - WTF?

Posted Mar 10, 2011 14:17 UTC (Thu) by Tuna-Fish (guest, #61751) [Link]

How would you pass that information? Remember that memory is not allocated on malloc, but when you access a page. Any memory access can fail due to lack of memory -- how do you handle that in userspace?

Blocking on allocation failure - WTF?

Posted Mar 10, 2011 14:42 UTC (Thu) by epa (subscriber, #39769) [Link]

How would you pass that information?
By returning null, as is the documented interface for malloc().
Any memory access can fail due to lack of memory -- how do you handle that in userspace?
This is true, you can get errors on accessing memory that was previously 'allocated'. But that's not a reason for doing the wrong thing in this specific case. If it is quite certain that the memory isn't available, why not return that to the application using the documented interface? If the app wants to just sleep waiting for memory to become available, it can do so at its own choice.

Blocking on allocation failure - WTF?

Posted Mar 12, 2011 0:15 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

This is true, you can get errors on accessing memory that was previously 'allocated'. But that's not a reason for doing the wrong thing in this specific case.

Which specific case are you referring to? The article is about the OOM killer, which comes into play when a process accesses memory (for example, executes a STORE instruction), not when a process does malloc(). In that case, how would you have the kernel notify the application program?

And even if we're talking about a case where the the user space program could be told that the system is out of memory and given the option to do something other than block, that wouldn't be acceptable because the user space programs have already been written. We want to do the best possible thing given existing programs. And even if we were talking about programs not yet written, there's something to be said for freeing the coder from worrying about these tedious, extremely rare situations.

Blocking on allocation failure - WTF?

Posted Mar 16, 2011 12:22 UTC (Wed) by epa (subscriber, #39769) [Link]

Which specific case are you referring to?
I was referring to this from the article (my italics):
Should an out-of-memory situation come about, the processes in the affected group will simply block when attempting to allocate memory until the situation improves somehow.
Rather than blocking indefinitely on malloc(), it would make more sense to just return null when there is no new memory available. The application can then decide whether it wants to keep retrying indefinitely, report the error to the user, just die in a big cloud of smoke, or do something else like freeing some of its own data (e.g. the JVM could do a garbage collection pass). If malloc() just blocks forever, the app doesn't have that choice.

Blocking on allocation failure - WTF?

Posted Mar 16, 2011 15:22 UTC (Wed) by giraffedata (subscriber, #1954) [Link]

Should an out-of-memory situation come about, the processes in the affected group will simply block when attempting to allocate memory until the situation improves somehow.
Right. The process here does not block in malloc(). It blocks typically on a store instruction, but also on any of various system calls, such as open(). A malloc() at this time would succeed.

The process is attempting to allocate memory, as it is the process that is doing the system call or triggering the page fault in which kernel code attempts to allocate physical memory. malloc(), in contrast, doesn't, from the kernel's point of view, allocate memory — just addresses for it.

The article probably should have made a clearer distinction between memory as seen by user space code and memory as seen by the kernel.

Blocking on allocation failure - WTF?

Posted Mar 17, 2011 11:51 UTC (Thu) by epa (subscriber, #39769) [Link]

Thanks for the explanation.

Blocking on allocation failure - WTF?

Posted Mar 12, 2011 0:22 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

I understand some of the reasons for overcommitting memory when it's not known how much is really available or needed, but blocking on a definite out-of-memory seems just plain daft. If the kernel knows that no more memory is available why can't it pass that information on to user space?

The idea is that the out of memory situation is only temporary. The OOM killer or one of its user space henchmen will make more memory available eventually, probably by killing some process the administrator didn't say should be immune.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds