User: Password:
Subscribe / Log in / New account

How to maintain access to the machine

How to maintain access to the machine

Posted Mar 18, 2011 7:31 UTC (Fri) by rvfh (subscriber, #31018)
In reply to: group scheduling and I/O by giraffedata
Parent article: A group scheduling demonstration

Interesting read indeed, thanks for that.
I think this has been discussed before, but we would basically need two things to keep access to the machine then:
* the processes we need to be pinned to the physical memory (ssh, bash, ...)
* the other processes to stop I/O then the processes above are being accessed, or better, our processes to not need I/O at all

Correct? If yes, then we could have a process group that we declare unswappable, unstoppable (better be very reliable then!), or would I still be missing a very difficult-to-solve part?

(Log in to post comments)

How to maintain access to the machine

Posted Mar 18, 2011 17:02 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

It doesn't really have to be that hard. It isn't necessary to identify a set of high priority processes; all it takes is a more intelligent scheduler that senses thrashing and makes different choices as to what process to run next. It places some processes on hold for a few seconds at a time, in rotation. That way, the shell from which you're trying to kill that runaway process would have several second response time on some commands instead of several minutes, and that's good enough to get the job done.

This kind of scheduling was common ages ago with batch systems. It's easier there, because if you put a batch job on ice for even ten minutes, nobody is going to complain. In modern interactive computers, it's much less common. Maybe because very slow response is almost as bad as total stoppage due to thrashing in most cases (just not in the one we're talking about).

The key point is that thrash avoidance isn't about giving some processes priority for scarce resources; it's about making every process in the system run faster.

How to maintain access to the machine

Posted Mar 24, 2011 19:49 UTC (Thu) by jospoortvliet (subscriber, #33164) [Link]

Ok, now I get what the problem is and what (theoretically) the solution would be. Makes me even more sad, considering this issue bites me about twice a day, forcing me to do a hard reset :(

(mostly firefox & kmail eating too much ram)

How to maintain access to the machine

Posted Mar 24, 2011 20:49 UTC (Thu) by giraffedata (subscriber, #1954) [Link]

this issue bites me about twice a day, forcing me to do a hard reset :( (mostly firefox and kmail eating too much ram)

You at least should have a memory rlimit on those processes, then. That way, when the program tries to grab an impractical amount of memory, it just dies immediately.

Rlimits are full of holes (your quota gets replicated every time you fork; I/O address space and shared memory gets counted against it) and are hard to set (you have to set it from inside the process), but they help a lot. I run with an address space rlimit of half my real memory size on every process (with some special exceptions). Most people use the default, which is no limit on anything.

The shell 'ulimit' command is the common way to set an rlimit. I don't know what it takes to set that up for your firefox and kmail processes.

How to maintain access to the machine

Posted Mar 26, 2011 9:40 UTC (Sat) by efexis (guest, #26355) [Link]

Memory cgroups will solve that problem for you, they are the number 1 thing I have found that improves system stability in *years*. Very simple to implement, assume cgroups is mounted under /cgroup with memory controller enabled (or for separate control, I mount my memory controller under /cgroup/memory so I can put tasks under memory control groups without putting them also under others)

Create shell script wrapper for what you want to run:

[[ -d "$CG" ]] || {
mkdir -p "$CG"
echo $(( 1048576 * 1200 )) > "$CG/memory.limit_in_bytes"
echo $$ > "$CG/tasks"
exec /path/to/browser "$@"

That puts it into a 1200meg group, no matter how many processes it forks, the entire lot cannot go over that 1200, and if they do, an OOM killer will kick in within only that group. You can also put similar lines at the top of scripts in /etc/init.d for example (obviously not needing the 'exec' line if you're adding to an existing startup script).

As long as you don't give any group 100% memory (I tend to put everything in 80% groups by default) no single runaway process or set of processes can ever bring the entire system down because there's always that 20% left it cannot touch.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds