User: Password:
Subscribe / Log in / New account



Posted Jul 21, 2011 12:10 UTC (Thu) by ebirdie (guest, #512)
In reply to: Swapping by epa
Parent article: Checkpoint/restart (mostly) in user space

Cool. Possibly a way for the OOM killer to morph into an OOM dumper? And to get a better handle to OOM situations.

(Log in to post comments)


Posted Jul 22, 2011 18:38 UTC (Fri) by jeremiah (subscriber, #1221) [Link]

I like this idea ALOT. I like the idea that's once memory pressure is down, that you could conceivably have the dumped process restart. I have had some issues where routine maintenance has done something that caused a high memory service to be killed, when it should have been the maint process. It would have been great for the service to be restarted after the service finished. Or even, be able to pause/dump some processes, run a main routine, then when done un-pause the processes.


Posted Jul 22, 2011 22:09 UTC (Fri) by mhelsley (guest, #11324) [Link]

The difficult part is you really want to know the amount of memory necessary to dump the process at OOM time. If you don't have the memory to start a new process, much less do the checkpoint, then you can't use this method of checkpointing to avoid OOM kills. If the amount of memory needed to do a dump is some asymptotic function of the size of the program being checkpointed, well, those are exactly the programs you're trying to dump during OOM!


Posted Jul 23, 2011 5:55 UTC (Sat) by jeremiah (subscriber, #1221) [Link]

I would think you could reserve an amount large enough to start a process to spool out and checkpoint the process to disk. It might not be the most efficient thing, but it might just get the job done. I obviously haven't looked at this/ heard enough to make an informed opinion, just seemed like a cool idea though. I'm curious why you need to know the amount of memory needed though?


Posted Jul 28, 2011 9:21 UTC (Thu) by mhelsley (guest, #11324) [Link]

Precisely to know how much memory to reserve as you suggested. Unless you're clever at managing it yet keeping it available during OOM that reserved memory is wasted memory. So you'd want to be careful not to waste too much.


Posted Jul 28, 2011 17:10 UTC (Thu) by jeremiah (subscriber, #1221) [Link]

Right, guess I misread something. Obviously you have to reserve enough mem for the checkpoint process, but that seems like it would be of a fixed and predictable size, probably even fairly small. I was under the impression that they were trying to figure out how big the process that was being killed was, which as long as it fits on a disk, which seems pretty likely, you don't really care. As long as there is always more free space on the drive, than there is total memory on the system, you never have a out of disk space problem.


Posted Jul 30, 2011 17:54 UTC (Sat) by oak (guest, #2786) [Link]

That's what cgroups is for. You make sure that things OOM early enough that rest of the system has enough memory to handle it gracefully.

As to kernel swapping the OOMed program back to ram from swap when you read the dump file, with cgroups setup retaining enough memory for the rest of the system (and kernel) while the OOMing container group is frozen that shouldn't be a problem either.


Posted Jul 31, 2011 3:50 UTC (Sun) by slashdot (guest, #22014) [Link]

Honestly, a dynamically resizing swap file does the same thing, better.

You can't use any checkpoint/restart system to swap processes, because none can guarantee to perfectly restore them.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds