How to speed up system calls
[Posted December 18, 2002 by corbet]
It all started with
an observation that
system calls on a modern Pentium 4 processor are far slower than on
older CPUs. It seems that, for whatever reason, software interrupts
generated with the
int instruction are very slow with the P4
processor. Since x86 Linux invokes system calls with
"
int $0x80", that slowness makes itself felt - especially
with system calls (like
getpid()) that would, otherwise, be very
fast.
There is an obvious solution to this problem: use the sysenter
instruction instead. sysenter is quite a bit faster on modern
Pentium processors. There are just a couple of problems: not all x86
processors support sysenter, and sysenter steps on
registers in ways that can be hard to work around.
The lack of across-the-board support for sysenter is a problem.
The kernel maintains a set of flags telling it what capabilities a given
processor has; other processor-specific options are set at configuration
time. System calls, however, are not invoked from the kernel - that is the
C library's job. The last thing glibc needs is to be trying to figure out,
at run time, the right way to invoke system calls.
Linus's solution to this problem is a patch
which brings back a variant of an old idea. As of 2.5.53, the kernel will
map a global, read-only page at the top of every process's address space.
That page contains the optimal code for executing a system call on the
current processor. Whenever glibc needs to call into the system, it simply
sets up the registers and, rather than doing the old
int $0x80, it jumps into the new page. The C library still
needs to do a runtime test (since older kernels will lack this "vsyscall"
page), but it need not concern itself with the detailed capabilities of
different processors.
Keeping the registers straight turned out to be a trickier problem. The
way sysenter steps on registers makes it hard to invoke system
calls with more than five parameters. Various schemes were looked at,
including creating a new "extra argument block" or simply requiring that
six-argument system calls be invoked the old way. Linus finally came up
with a tricky solution that makes it all work, however; those of you who
like digging through x86 assembly may want to peek at his "absolutely wonderfully disgusting solution" to
the problem. "I'm a disgusting pig, and proud of it to boot."
The result of all this: the gettimeofday() system call runs in
just over half the time on a P4 processor. The speedup on Pentium 3's
is less - a factor of 1.2 - but is still worthwhile.
Now that the vsyscall page is in place, will it be used for other things,
such as implementing gettimeofday() entirely in user space? The
answer, for now, appears to be "no". Getting a user-space
gettimeofday() right is, seemingly, harder than it looks; there
are synchronization issues, especially on some SMP systems where the clocks
may not be synchronized by the hardware. So a user-space
gettimeofday() appears to not be in the works, for now at least.
(
Log in to post comments)