Unexporting the system call table
[Posted October 9, 2002 by corbet]
A linux-kernel reader recently
complained
that Red Hat had applied a patch to the kernel in its 8.0 distribution
which made the
sys_call_table data structure unavailable to
modules. He will not have been pleased with the 2.5.41 kernel release,
which did the same thing.
sys_call_table is a special table used to dispatch system calls
within the kernel. It is a simple array, indexed by the system call number
passed in from user space. The reason for wanting this array to be
exported, of course, is to allow modules to add or modify system calls. A
classic example is a module implementing the "streams" interface, which is
unlikely to ever be part of the mainline kernel. Some users need streams,
though; an exported system call table allows them to load a module and have
the streams call work as expected.
So why would this capability be taken away? The stated reason is that
tweaking the system call table is nonportable and unsafe. Each
architecture has a different system call table format, so code which wants
to be portable has to understand how each architecture does things. There
is also no locking mechanism for the system call table, so run-time changes
are subject to race conditions. And finally, there are even errata
problems on some processors; changing a table used the way
sys_call_table is used can have unfortunate and unexpected
results.
Many of these problems could be worked around with a bit of coding. But
the simple fact is that many kernel developers do not want loadable modules
to be able to add or change system calls. Binary modules are tolerated as
long as they stick to the "published" interfaces and implement
straightforward features (such as device drivers and filesystems). A
module which can add or change system calls can go well beyond that
interface. Removing access to the system call table keeps modules in their
place.
Working around this problem not all that difficult for modules which need
to do so. A patch was quickly posted which
made streams work again, for example. The solution is to have a set of
stub system calls wired into the kernel; when the associated module is
loaded, the stubs can make the appropriate calls with the necessary locks.
Otherwise they return an ENOSYS error.
(
Log in to post comments)