Kernel Summit: Class-based Kernel Resource Management
This article is part of LWN's 2004 Kernel Summit coverage. |
- Organizes processes and sockets into groups. This organization
can be done in a number of ways using pluggable classifier modules.
Typical schemes include the user on whose behalf things are running or
the program that is being run.
- Applies a policy on how much of the system each class is allowed to use. Resource usage which can be controlled includes CPU usage, memory usage, I/O bandwidth, etc.
The whole thing is configured through a virtual filesystem; creating a new group is just a matter of making a new directory in that filesystem. The normal Unix permissions apply in this directory; depending on how they are set, non-root users can make changes to some or all resource policies.
Uses for this mechanism include workload consolidation (restricting parts of the system's workloads to a given amount of resource usage), quality of service guarantees for network services or individual users, etc. CKRM can limit the amount of memory used by OpenOffice (something has to do that) or give a database manager process priority access to the machine.
Linus objected to the term "guarantees," claiming that any attempt to provide resource guarantees will lead to poor performance, deadlocks, or both.
Alternatives to CKRM were quickly presented. Virtualization works for some sorts of resource limitation tasks, but do not work well on the desktop and can suffer from latency problems. Various user-mode solutions, such as "zapper daemons," are unable to respond to quick surges in resource use.
There were various objections to the CKRM implementation; some called it over-engineered. Linus would like to see the general resource classes split into separate classes for every type of resource which is being controlled. It may be desirable to put a process into one class for its CPU usage, but another one altogether for controlling its I/O bandwidth. Various other implementation changes were requested as well. CKRM will likely find its way into the kernel at some point, but it will likely need another iteration or two through the developer review process first.
Index entries for this article | |
---|---|
Kernel | Class-based resource management |
Kernel | Resource management |
Posted Jul 29, 2004 11:14 UTC (Thu)
by ringerc (subscriber, #3071)
[Link]
I'm the sysadmin of a small to medium business network. We run a dual Xeon This would be INCREDIBLE
server to host file services (NetATalk, Samba, and NFS), intranet web
services, LTSP thin clients, and mail services. As you can imagine, this
does not always go smoothly ... but it works OK overall, and it could go a
lot more smoothly than it does now.
I'm going to present a wishlist that attempts to briefly explain what I'd
find useful and why I think it'd be good. I'm not claiming that any of it
is easy, or even that it's a good idea in the interests of the system as a
whole - I lack the knowledge to evaluate that. All I can say is "I think
this would be very useful..."
It would be very helpful to be able to control resource allocation to
processes in a more flexible and CONSISTENT way than provided by `nice`
and `ulimit`. In particular:
- Disk I/O QoS, so we could (say) configure "user" applications to
get priority for quick, brief disk access while limiting the IO ops/second
and throughput of file services to just below the disk's ability. Another
example might be limiting the disk throughput and IO/sec usage of a large,
low priority copy operation like archiving an LVM snapshot to removable
storage or cloning an experimental version of a database.
- Per-process disk usage monitoring. "Dammit, why is the /home
array thrashing..."
- Memory limits that start paging processes out instead of killing
them if they exceed the limit, so it's possible to say (for example) that
the group 'users' may collectively consume no more than 50% of system
memory.
- Making CPU and memory limitatation consistent. I'm not convinced
the CPU time ulimit makes any sense in a modern computing environment, and
while I find the 'kill when exceeded' memory ulimit great for limiting the
damage done by crashing processes it'd be nice to be able to have less
drastic control over the system resources they use as well. See above
point.
- Memory priorities for processes. "If you have to free up some
space, get rid of the database cache first, please DO NOT page out the
binaries of my interactive applications, my thin client users happen to be
using those..."
- The ability to do large, one-off copies without driving
everything remotely useful out of the in-memory disk cache. This is a
MAJOR problem in my experience. It is nigh impossible to do any seriously
large copies on an active Linux server (in my experience) if there are any
reasonably interactive tasks. Even if the interactive tasks don't normally
even touch the disk(s) you're using for the big copy, they'll quickly get
sluggish and start swapping or having to repeatedly load parts of
libraries and files from disk. This appears to be because all the
previously cached data - silly things like glibc and the program binaries
- are being pushed out of the disk cache in favour of data from the copy
operation that will never be re-used. This issue makes things like backups
of live servers a much higer impact affair than they could be, especially
combined with the apparent lack of any way of rate-limiting copy
operations.
- The ability to configure and control disk, memory, CPU access,
and other forms of resource limitation and QoS from a single consistent
interface (say, sysfs). Ideally policies might be applied to a group of
related proceses (as `ulimit` and friends do currently) or to all
processes owned by a particular user or group. Imagine "No single user may
use more than 80% of the network bandwidth" or "This group of virtual
machines is limited to 50% of physical system memory (and will begin
paging out instead of crashing if it exceeds it)
Currently, it seems very hard to get different sorts of services to play
well together on the same server. For good utilisation and to limit the
number of servers that need to be managed, it would be nice to change
this. Many of the things needed to make (say) a thin client server andm
mail server live acceptably well on the same box will no doubt also
benefit virtualisation schemes like LVS and UML. After all, they, too,
want ways to prevent different VMs that might be doing different things
from treading on each other too badly.
In fact, I'd eventually love to be able to move my thin client services
into one virtual space (think BSD Jail, LVS, UML, etc), mail into another,
etc with minimal resource overheads. The management benefits would be
pleasant - upgrade your terminal server environment reguarly to get shiny
new GUI improvements, while keeping your mail server environment unchanged
for as long as humanly possible.
I think Linux is already ahead of some contenders when it comes to many
things efficiently sharing one system, but in my opinion there's still a
lot of room for improvement.
Of course, I'm just a lowly sysadmin and probably don't understand the
complexity of what I'm talking about. What the heck - these are wants and
needs, and I'm interested in how they translate into anything that could
make its way into reality outside an IBM mainframe.
Comments appreciated.
--
Craig Ringer
craig <at> postnewspapers [dot] com >d0t< au