User: Password:
|
|
Subscribe / Log in / New account

User-space out-of-memory handling

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

By Jonathan Corbet
March 26, 2014
2014 LSFMM Summit
While opinions on how the kernel should respond to out-of-memory (OOM) situations vary, almost everybody seems to agree that what the kernel does now is in need of improvement. A session on the topic during the memory management track at the 2014 Linux Storage, Filesystem, and Memory Management Summit covered some possible improvements, but reached no real conclusions.

David Rientjes used the session to talk about his user-space OOM handling patches and to ask for a green light for their inclusion. He spent a while talking about how these patches work; this introduction can be found in David's article on the subject and will not be [David Rientjes] repeated here. David has been pushing this work for the last year or so, but it seems clear that the community is still not completely sold on it.

Sasha Levin asked whether it might be better to use the vmpressure mechanism, which sends notifications when memory is getting tight, rather than waiting for a full OOM situation and hoping that user space can handle it. The problem with that approach, as Rik van Riel put it, is that there is no limit to how quickly a system can consume its memory. David added that the vmpressure mechanism does not work as well as one might think. As an illustration of the problem, consider a process that locks many pages into memory; it will consume much of the available memory, but no pressure notifications will result because no reclaim is yet happening. The system can then go from a "no pressure" state to "out of memory" almost instantaneously once reclaim starts; there simply is no opportunity for user space to respond.

As the discussion went on, it became clear that the most discomfort existed around the use of a user-space handler to deal with global OOM situations. If a single control group under the memory controller (a "memcg") runs out of memory, it makes sense to have a user-space handler respond. But, Michal Hocko asked, do we really want to handle global OOM situations (where the system as a whole is out of memory) in user space? He agreed that the current code does not work for everybody, but, he said, pushing responsibility into user space opens up a can of worms and would be hard to maintain in the long term. It would be better, he suggested, to improve the global OOM killer in the kernel instead.

Tim Hockin, speaking about his work at Google (which has driven the user-space OOM handler development), talked about the problems they have had with OOM-handling requirements that have changed over time. Google has a hard time deciding what it wants to have happen in OOM situations; it seems hard to expect the kernel developers to anticipate where those requirements might go in the future. That has led to the desire to push the policy into user space where it can be changed without the need to build and deploy a new kernel — a process which does not happen quickly at Google. He would be happy with an in-kernel mechanism that allowed policies to be changed, but only if it is possible to effect a change without building a new kernel.

Robert Haas agreed that moving the policy into user space gives users the ability to make changes without having to change the kernel itself. Kernel developers, he said, simply are not smart enough to come up with all possible policies. But David said he was willing to try if that was how it would be done, though he suggested that the community might not be happy about the "hundreds of patches" implementing all of the possible policies that would result.

There was also some unhappiness about David's use of the memcg mechanism for global OOM handling. That mechanism will only work if control groups are built into the kernel, but there are still plenty of users who prefer not to enable control groups at all. The motivation for using that interface was to allow per-memcg and global OOM handlers to work with the same interface and be coded the same way. Peter Zijlstra suggested that the same control files could be placed in /proc for global OOM handling, providing something very close to the same interface without needing to enable control groups.

David asked for some guidance on how he could make progress in this area. It has been hard to get a consensus on his user-space OOM handling patches, but no viable alternatives have come forward. So he is somewhat stuck. Unfortunately, no consensus emerged in this session either, so there is still no clear path forward for this project.

[Your editor would like to thank the Linux Foundation for supporting his travel to the Summit.]


(Log in to post comments)

User-space out-of-memory handling

Posted Mar 28, 2014 13:57 UTC (Fri) by pedrocr (guest, #57415) [Link]

Have there been any recent attempts at eliminating overcommit?

There was a post a while ago[1] by a Windows kernel developer commenting on the lack of advances in the NT kernel and he mentioned that the Linux OOM situation was just crazy.

I remember Alan Cox having some patches to eliminate overcommit a long time ago. Any other efforts? It seems so much time is spent on handling OOM instead of solving this once and for all.

[1]http://blog.zorinaq.com/?e=74

User-space out-of-memory handling

Posted Mar 28, 2014 14:41 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

Eliminating overcommit does not eliminate the need for an OOM mechanism more sophisticated than "ENOMEM all allocations when memory is exhausted", because that mechanism iteratively kills random programs whose authors are not professional paranoids who always make a massive alloca()-and-memset() to ensure they will have enough stack space for all non-pathological cases, rather than killing the runaway process that is hoovering up all of RAM and swap into its heap.

Overcommit

Posted Mar 28, 2014 14:43 UTC (Fri) by corbet (editor, #1) [Link]

It's been possible to turn off overcommit for a long long time; see /proc/sys/vm/overcommit_memory. But few people do that because it's actually a real pain to have to provide every bit of memory that every process thinks it might ever use.

Overcommit

Posted Mar 28, 2014 15:22 UTC (Fri) by pedrocr (guest, #57415) [Link]

Wouldn't that be a case of "we need to fix userpace" like with the power problems that led to the creation of powertop? It should be easy to create a memorytop program to list the big offenders in allocated but never used memory.

Overcommit

Posted Mar 28, 2014 16:18 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

That sounds good on paper. Unfortunately, one of the sources of "big offenders" is the result of the following scenario:

  • Emacs / Firefox / Kdenlive / Eclipse / GIMP / (your favorite application software with a large writable working set) calls fork().
  • Child process touches a very small subset of its copy of the working set, in order to perform some set of operations which for whatever reason cannot be encoded as arguments to posix_spawn() (and anyway you can't rely on posix_spawn() being implemented as a syscall).
  • Child process calls execve().

This is a rather less tractable problem than "this, that, and the other program are spending lots of time busywaiting and thus preventing the CPU from going into a C-state".

Overcommit

Posted Apr 7, 2014 14:21 UTC (Mon) by nix (subscriber, #2304) [Link]

... and anyway nobody ever uses posix_spawn() unless they have to because it's a horror to do anything in it. It's meant for programs that have to run on MMU-less systems that are so primitive that they cannot implement any sort of fork(), and such systems rarely run big fork-happy text editors!

User-space out-of-memory handling

Posted Mar 31, 2014 9:19 UTC (Mon) by dgm (subscriber, #49227) [Link]

Perhaps what's needed is a policy-space that is neither user nor kernel. Maybe it's time to polish and unify any of the existing in-kernel virtual machines for this kind of purposes.


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds