Avoiding the OOM killer with mem_notify

Posted Jan 31, 2008 17:06 UTC (Thu) by jzbiciak (guest, #5246)
Parent article: Avoiding the OOM killer with mem_notify

How effective can this be, though, for many C programs? If I malloc a bunch of memory, perhaps as caches, and then am asked to free it, that doesn't magically release pages back to the OS. Now, if malloc uses mmap for some of the larger allocations, those can be released back to the OS by munmap. But, for the general sbrk managed heap, I have to free stuff near the end of the heap before I can ask for my brk to be lowered. There's no guarantee I can do that.

For this to be useful, whatever I malloc needs to have an additional level of indirection in user space, so I can move the objects I wish to keep and then compact the heap. Otherwise, simply freeing stuff up won't be enough.

It may be useful to compare/contrast this to the HURD's approach, which is simply to force user space to do its own VM management. There, the kernel and user-space dicker about physical pages only, and user space figures out how best to handle the burden when a given app wants more pages than the OS can give it. The answer could be garbage collection, discarding caches, swapping or whatever makes sense to a given application.

The main thing is that the app knows way ahead of time that real RAM is in short supply, and avoids getting into the overcommitted state entirely. And since the kernel isn't doing the swapping, it seems like you wouldn't get into situations where you need to free memory so you have enough memory so that you can write out pages and the like. Example: Imagine that to wake an app so it can free some pages, you have to bring it in from swap, but swap is too full to write any dirty anonymous pages out. If your policy is that each app self-swaps, this should never happen since the OS guarantees it'll have enough pages to do its work, and user space will just muddle along with what its given. (In theory, it seems like a user space app could get by with just a few pages... a couple executable pages and a couple data pages.)

I'm guessing mem_notify will try to wake apps sufficiently far ahead that it can avoid those "need RAM to free RAM" situations in practice, but setting proper thresholds seems like it ought to be rather tricky.

Avoiding the OOM killer with mem_notify

Posted Feb 1, 2008 3:03 UTC (Fri) by vomlehn (guest, #45588) [Link] (1 responses)

> I'm guessing mem_notify will try to wake apps sufficiently far ahead that it can avoid those
"need RAM to free RAM" situations in practice, but setting proper thresholds seems like it
ought to be rather tricky

I don't see how the kernel can possibly know enough for it to notify applications far enough
ahead about the need to free memory; the memory allocation behavior of applications is just
too unpredictable.

An approach that seems like it would be better would be to notify the kernel that certain
pages in your application are being used to cache data. The kernel is then free to simply grab
them if it needs them. If your application decides it needs the data later, it uses as sytem
call to notify the kernel that the pages are no longer being used as a cache. If the kernel
didn't need the pages, they would still have their old data and the application could use them
directly.

On the other hand, if the kernel did have to grab the pages in the interim, the system call
used to grab the pages back would return an error. Your application would then know it needs
to remap the pages and regenerate the data. Of course, it's possible the pages can't be
remapped because memory is too low. The application would handle that as though the data
wasn't cached and it couldn't get the memory to read it. It already has to be able to do this,
so this doesn't add to the application's complexity.

The advantages of this approach are that the pages are immediately available to the kernel
without having to wake the process up. No need to figure out complex threshholds, no need to
allocate enough memory for the process to run, no delay in making the needed memory available.
You could even allow for priorities when telling the kernel the pages are being used for cache
so that the kernel would grab lower priority pages first.

I wish I had the time to code this and submit it because I think that mem_notify is an awful
botch that will cause unending pain as people add patch on patch to try to make it work. But
that's just my personal opinion...

Avoiding the OOM killer with mem_notify

Posted Feb 1, 2008 19:47 UTC (Fri) by zlynx (guest, #2285) [Link]

Applications can already see if they're missing memory pages by using mincore().

You may not have been carefully reading the mem_notify patch descriptions.

What it does is trigger on memory pages going into the inactive list.  This is what happens to
prepare memory pages that are good candidates for swapping.

Here is the Changelog from version 5 of the mem_notify patch, see the v3 changes:
Changelog
-------------------------------------------------
  v4 -> v5 (by KOSAKI Motohiro)
    o rebase to 2.6.24-rc8-mm1
    o change display order of /proc/zoneinfo
    o ignore very small zone
    o support fcntl(F_SETFL, FASYNC)
    o fix some trivial bugs.

  v3 -> v4 (by KOSAKI Motohiro)
    o rebase to 2.6.24-rc6-mm1
    o avoid wake up all.
    o add judgement point to __free_one_page().
    o add zone awareness.

  v2 -> v3 (by Marcelo Tosatti)
    o changes the notification point to happen whenever
      the VM moves an anonymous page to the inactive list.
    o implement notification rate limit.

  v1(oom notify) -> v2 (by Marcelo Tosatti)
    o name change
    o notify timing change from just swap thrashing to
      just before thrashing.
    o also works with swapless device.