|| ||Simon Derr <Simon.Derr@bull.net>|
|| ||[RFC] cpuset proposal|
|| ||Mon, 13 Oct 2003 15:07:43 +0200 (DFT)|
|| ||Sylvain Jeaugey <email@example.com>|
We'd like to introduce our 'CPUSET' Linux feature to the world. It has
been discussed a few weeks ago and received some positive feedback on
linux-ia64 and LSE. Since then we have ported it to i386 (as suggested by
Stephen Hemminger) and would appreciate to receive comments from a wider
Any opinion, comment, rant, flame, mark of interest or despise is more
For a more complete description or source code see
We'd be also curious about whether a feature of this kind or addressing
the same issue would be likely to make it into a future version of the
Simon & Sylvain.
What are CPUSETS ?
CPUSETs are lightweight objects in the linux kernel that enable users to
partition their multiprocessor machine by creating execution areas. This
has been somewhat inspired by the pset or cpumemset patches existing for
Linux 2.4. A virtualization layer has been added so it becomes possible to
split a machine in terms of CPUs.
The main motivation of this patch is to give the linux kernel full
administration capabilities concerning CPUs. CPUSETs are strong jails, and
a process running inside this predefined area won't be able to run on
other processors than those given to him. Some domains in which it can be
* Web Servers running multiple instances of the same web application.
* Servers running different applications (for instance, a web server
and a database).
* HPC applications, especially in NUMA machines.
CPUSETS allow to:
1. create sets of CPUs on the system, and bind applications to them
2. translate the masks of CPUs given to sched_setaffinity() so they
stay inside the set of CPUs. With this mechanism, processors are
virtualized, for the use of sched_setaffinity() and /proc information.
Thus, any former application using this syscall to bind processes to
processors will work with virtual CPUs without any change.
3. provide a way to create sets of cpus *inside* a set of cpus : hence
a system administrator can partition a system among users, and users can
partition their partition among their applications.
4. Change on the fly the execution area of a whole set of processes (to
give more resources to a critical application, for example).
These features have been implemented as a kernel patch for Linux 2.6 and a
suite of userland tools.
Typical CPUSET Usage:
* CPU-bound applications : Many applications (as it is often the case
for HPC apps) use to have a "one process on one processor" policy. They
can use sched_setaffinity() to do so, but what if we have to run several
such apps at the same time ? One can do this by creating a cpuset for each
* Web Serving : A server containing a web server (apache for instance)
and a database (MySQL, Oracle) may want to have one part of the machine
dedicated for each task. A cpuset can be created for each task. If the
server is used to run two instances of this web-system, cpusets can be
used to split the system in two, and then in each partition create one
area for the web server and one area for the database.
* User administration : an administrator may give fixed numbers of
CPUs to some classes of users and leave the rest to other users, for
instance. He can do so by creating cpusets and assigning them to users.
* Critical applications : processors inside strict areas may not be
used by other areas. Thus, a critical application (real time ...) may be
run inside an area and be sure that other processes won't use its CPU.
This implies that others applications won't be able to lower its
reactivity. This can be done by creating a cpuset for the critical
application, and another for all the other tasks.
* In the future, other features such as associating a memory allocation
policy (such as local node, or round robin) to a set of cpus might be
* We are looking forward to the inclusion of the cpuset feature in a
more visible project...
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/