Ok, now I get what the problem is and what (theoretically) the solution would be. Makes me even more sad, considering this issue bites me about twice a day, forcing me to do a hard reset :(
Posted Mar 24, 2011 20:49 UTC (Thu) by giraffedata (subscriber, #1954)
[Link]
this issue bites me about twice a day, forcing me to do a hard reset :(
(mostly firefox and kmail eating too much ram)
You at least should have a memory rlimit on those processes, then. That way, when the program tries to grab an impractical amount of memory, it just dies immediately.
Rlimits are full of holes (your quota gets replicated every time you fork; I/O address space and shared memory gets counted against it) and are hard to set (you have to set it from inside the process), but they help a lot. I run with an address space rlimit of half my real memory size on every process (with some special exceptions). Most people use the default, which is no limit on anything.
The shell 'ulimit' command is the common way to set an rlimit. I don't know what it takes to set that up for your firefox and kmail processes.
How to maintain access to the machine
Posted Mar 26, 2011 9:40 UTC (Sat) by efexis (guest, #26355)
[Link]
Memory cgroups will solve that problem for you, they are the number 1 thing I have found that improves system stability in *years*. Very simple to implement, assume cgroups is mounted under /cgroup with memory controller enabled (or for separate control, I mount my memory controller under /cgroup/memory so I can put tasks under memory control groups without putting them also under others)
Create shell script wrapper for what you want to run:
That puts it into a 1200meg group, no matter how many processes it forks, the entire lot cannot go over that 1200, and if they do, an OOM killer will kick in within only that group. You can also put similar lines at the top of scripts in /etc/init.d for example (obviously not needing the 'exec' line if you're adding to an existing startup script).
As long as you don't give any group 100% memory (I tend to put everything in 80% groups by default) no single runaway process or set of processes can ever bring the entire system down because there's always that 20% left it cannot touch.