Class-based Kernel Resource Management
[Posted September 3, 2003 by corbet]
The
Class-based Kernel Resource Management
(CKRM) project is an effort at IBM to provide the hooks for better control
over resource consumption by processes. The CKRM project sees the existing
resource management tools (
nice,
ulimit) as not being up
to the task. So the CKRM hackers have set out to provide a whole new
infrastructure for process control. The ideas were presented at the Ottawa
Linux Symposium last July; now, the first set of patches has been posted.
The
overview posting describes the other
patches in the set and gives some pointers to further information.
The core concept behind CKRM is the division of processes into distinct
classes, each of which has a separate set of policies applied to it. A
kernel API has been provided which enables the loading of classifier
modules, enabling different sites to have entirely different ways of
classifying processes. Most would likely stick with the rule-based classifier, which is provided with
the CKRM patch set; it allows
classification based on various task structure fields. So, for example,
processes can be classified based on their UID, which program they are
running, etc.
Tasks can be reclassified any number of times over their lifetime. The CKRM core patch places hooks in the logical
spots where a process could change classification: when a user or group ID
is changed, when a program calls exec(), when a new process is forked,
etc. There is also a plan for a system call allowing a process to request
reclassification at any time, but that call does not appear to be present
in the current patches.
Once a task is classified, the system can apply policies to that task. So,
for example, the CPU control patch enforces
CPU usage policies on processes. Essentially, each class (as a whole) can
be restricted to (and guaranteed) access to a administrator-specified
percentage of the available processor time. To implement this policy, the
patch modifies the scheduler by creating a new run queue for each class.
Before the scheduler picks a new process to run, it first decides which
class has the highest-priority claim on the CPU. The process to run can
then be chosen from that class's queue in the usual way.
The memory control patch, instead,
implements policies stating how much physical memory each class can use.
The patch hooks into the page reclamation code, making that code rather
more selective in how it choses pages to kick out of main memory. Whenever
possible, the page reclaimer only choses pages from classes which are going
over their maximum allowed share of physical memory. As memory gets
tighter, each class will be trimmed down to its minimum share, as set up by
the administrator. If there is no real pressure on memory, however,
processes are allowed to grow beyond the bounds set for their class.
The memory control problem is complicated by shared pages: what happens
when pages are shared between processes in different classes? The
documentation on the CKRM web site describes an elaborate mechanism where
classes are set up in a hierarchy and shared pages are divided across the
appropriate parts of that hierarchy. What the current code appears to do,
however, is to simply assign shared pages to the class with the largest
share of physical memory.
The CKRM team also describes mechanisms which allow control over the disk
I/O bandwidth used by each class and the number of incoming network
connections each class can be handling at a given time. The I/O
limitations are implemented by adding per-class queues to the disk I/O
scheduler and merging requests into a single dispatch queue with the
bandwidth policies taken into account. The networking policies involve the
creation of yet another set of class-specific queues; in this case,
incoming connections are divided into classes through the use of iptables
rules. Patches for I/O bandwidth and incoming network connection control
have not been released at this time, however.
CKRM is clearly a work in progress; much of the structure is in place, but
not everything has been implemented and the code is full of "this needs to
be cleaned up" comments. The CKRM hackers hope to get their work into 2.7,
however, so they have some time yet to work things into shape.
(
Log in to post comments)