LWN.net Logo

Advertisement

Front, Kernel, Security, Distributions, Development. See your byline here on LWN.net.

Advertise here

The cpuset mechanism

A set of patches has been making the rounds for the last month or so which implements a concept known as a "cpuset." A cpuset is simply an arbitrary collection of processors in an SMP system; cpusets can be used to partition a large system into smaller virtual machines in a flexible sort of way. This patch was originally posted by Simon Derr; more recent versions (found in the "patches" section, below) have been sent out by Stephen Hemminger at OSDL.

Internally, the patch creates a hierarchy of cpusets. At boot time, the root set is created containing all of the system's processors. System calls can then be used to create child sets. The creation of a cpuset is not a privileged task, but no process can expand beyond the set of processors initially assigned to it. Thus, for example, the system administrator can create a cpuset for a particular group of processes which will be confined to the designated processors. Those processes can, however, further partition the set for their own purposes.

In normal use, one would expect cpusets to correspond to the underlying hardware; all processors in a set would normally be part of the same NUMA node, for example. There is nothing in the patch that requires users to do things that way, however; cpusets can be any arbitrary subset of the available processors. Processors can also belong to multiple cpusets, so cpusets can overlap each other in arbitrary ways. There is, however, a "strict" flag which can be set to disallow the sharing of processors in this way.

There are a few new system calls created by this patch:

cpuset_create();
Creates a new cpuset as a child of the process's current cpuset, containing the same processors as the parent.

cpuset_destroy();
Destroys the given cpuset.

cpuset_attach()
Attaches a process to a particular cpuset.

cpuset_alloc()
Changes the set of processors belonging to a cpuset. The name of this call is a little misleading, since it can release processors from a cpuset. In fact, removing CPUs will be the normal usage, since a cpuset cannot contain processors which are not also contained in its parent.

cpuset_getfreecpus();
Returns a list of processors which are not part of the current cpuset, but which could be added.

Processes running within a cpuset have no view of the processors which are not contained within that set. Processors in a cpuset are renumbered to appear to be the only processors on the system; thus, for example, system calls like sched_setaffinity() will only bind processes within their particular cpuset.

This patch has generated a certain amount of interest in the large-systems community. It clearly does not fall within the 2.6.0-test "stability patches only" mandate, but there may be pressure to get it into the kernel not much after 2.6.0 is released.


(Log in to post comments)

The cpuset mechanism

Posted Oct 23, 2003 13:08 UTC (Thu) by hazelsct (guest, #3659) [Link]

Slightly off-topic, but does NUMA consist of just two levels of CPU hierarchy, i.e. "all" and "node"? Or are there other ways of dividing a large machine into subclusters with good communication?

I'm curious because I have a vague hope that these cool NUMA algorithms can someday be used over a transport such as TCP/IP, so Beowulf clusters can become single-image shared-memory machines with better resource sharing than Mosix. In very large clusters, one has "nodes" with two or more CPUs, and switches with 23 or 31 or 47 nodes, and hierarchies of such switches; or flat topologies with multiple switches such that each node can see all the others (as pioneered by KLAT2). As machines like Altix scale to 512 and 1024 processors, I imagine that there too it would be nice for cpusets to reflect this type of hierarchy.

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds