Splitting the kernel stack
[Posted June 5, 2002 by corbet]
The Linux kernel has, for years, run with an 8KB (two page) stack in each
process's address space (at least, on i386 systems). That stack holds the
"task structure" (the kernel's information about the process) and provides
space for automatic variables and call frames when the system is running in
kernel mode. The 8KB stack works, of course, but it is not optimal. The
biggest problem, perhaps, is the need to find two adjacent pages for a new
stack every time a new process is created. On a busy system memory can get
badly fragmented, and allocating two pages together can be a challenge.
So Ben LaHaise has posted a patch which
splits the kernel stack into two 4KB stacks. One of them holds the task
structure and is used by normal kernel code (i.e. handling system calls).
The other stack is set aside and is used only when the kernel is handling
interrupts.
A separate interrupt stack is not a particularly new idea - many operating
systems have had interrupt stacks for decades. There are numerous
advantages to doing things this way. Only one interrupt stack (per CPU) is
needed, so one page of memory per process is freed up. The interrupt stack
is also more likely to stay in the processor cache, improving performance.
Interrupt handlers need not worry about other kernel code having consumed
most of the stack when they get invoked. And, of course, it is no longer
necessary to perform a two-page allocation to set up the regular kernel
stack.
The biggest downside, perhaps, is that non-interrupt kernel code must now
fit into much less stack space. Some
kernel code is not particularly careful about the size of its automatic
variables, and risks overflowing the new, smaller stack. As a way of
tracking down such code, Ben has also posted a
stack checker (followed by a brown paper bag
fix) which monitors stack usage and raises the alarm when
available space on the stack gets too low. The two patches are probably
best used together.
(
Log in to post comments)