LWN.net Logo

Moving things on and off the kernel stack

The Linux kernel stack is a limited resource; it must fit into two pages of memory, which it shares with some process information. Overflowing the kernel stack can be a catastrophic event, and it can happen at surprising times, such as in interrupt handlers. After a recent Stanford Checker posting pointing out numerous places where large structures have been allocated on the stack, and with proposals to consider reducing the size of the stack, there has been an increase in interest in minimizing kernel stack usage.

One bit of code that caught Andries Brouwer's eye was the resolution of symbolic links. In the process of symlink resolution, the kernel can encounter new links which must also be resolved; this is handled by a recursive call into the resolution code. Each call, of course, requires kernel stack space, so recursive calls must be looked at with care - unless the recursion is carefully bounded, it can easily overflow the kernel stack. The symlink code handles this constraint by limiting the symlink depth to five.

Andries has posted a new symlink implementation that eliminates the recursion. Instead, it maintains its own stack - allocated separately - which contains the current state of symlink resolution. In this way, the five-level limit can be lifted without fear of overrunning the kernel stack. Of course, it is extremely rare that anybody actually hits the five-level limit; there are special cases, however, where users do interesting things with symbolic links.

Not all developments are oriented toward reducing kernel stack usage, however. Andi Kleen has posted a patch which does the opposite in order to make the select and poll system calls perform better. These calls (which share most of an internal implementation) allocate a couple of pages of kernel memory to hold the requisite data structures; they are sized to be able to handle situations where large numbers of file descriptors are being waited on. In reality, however, many (if not most) select and poll calls are given only a small number of file descriptors, so much of that memory is wasted.

Andi's patch works by setting up a separate fast path for when only a small number of file descriptors are in use. Rather than allocate those two pages, the fast path uses a small, in-stack array. The stack space usage is limited to 256 bytes, which will fit easily even on a reduced-size stack. The new implementation not only saves a couple of kernel pages for each process calling select (and there can be many on a typical Linux system), it's faster as well. The patch has been included in 2.5.23-dj2, and will likely find its way into the mainline before too long.


(Log in to post comments)

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds