Moving things on and off the kernel stack
[Posted June 19, 2002 by corbet]
The Linux kernel stack is a limited resource; it must fit into two pages of
memory, which it shares with some process information. Overflowing the
kernel stack can be a catastrophic event, and it can happen at surprising
times, such as in interrupt handlers. After a recent Stanford Checker
posting pointing out numerous places where large structures have been
allocated on the stack, and with proposals to consider reducing the size of
the stack, there has been an increase in interest in minimizing kernel
stack usage.
One bit of code that caught Andries Brouwer's eye was the resolution of
symbolic links. In the process of symlink resolution, the kernel can
encounter new links which must also be resolved; this is handled by a
recursive call into the resolution code. Each call, of course, requires
kernel stack space, so recursive calls must be looked at with care - unless
the recursion is carefully bounded, it can easily overflow the kernel
stack. The symlink code handles this constraint by limiting the symlink
depth to five.
Andries has posted a new symlink
implementation that eliminates the recursion. Instead, it maintains
its own stack - allocated separately - which contains the current state of
symlink resolution. In this way, the five-level limit can be lifted
without fear of overrunning the kernel stack. Of course, it is extremely
rare that anybody actually hits the five-level limit; there are special cases, however, where users do
interesting things with symbolic links.
Not all developments are oriented toward reducing kernel stack usage, however.
Andi Kleen has posted a patch which does the
opposite in order to make the select and poll system
calls perform better. These calls (which share most of an internal
implementation) allocate a couple of pages of kernel memory to hold the
requisite data structures; they are sized to be able to handle situations
where large numbers of file descriptors are being waited on. In reality,
however, many (if not most) select and poll calls are
given only a small number of file descriptors, so much of that memory is
wasted.
Andi's patch works by setting up a separate fast path for when only a small
number of file descriptors are in use. Rather than allocate those two
pages, the fast path uses a small, in-stack array. The stack space usage
is limited to 256 bytes, which will fit easily even on a reduced-size
stack. The new implementation not only saves a couple of kernel pages for
each process calling select (and there can be many on a typical Linux
system), it's faster as well. The patch has been included in 2.5.23-dj2,
and will likely find its way into the mainline before too long.
(
Log in to post comments)