How to speed up system calls

[Posted December 18, 2002 by corbet]

It all started with an observation that system calls on a modern Pentium 4 processor are far slower than on older CPUs. It seems that, for whatever reason, software interrupts generated with the int instruction are very slow with the P4 processor. Since x86 Linux invokes system calls with "int $0x80", that slowness makes itself felt - especially with system calls (like getpid()) that would, otherwise, be very fast.

There is an obvious solution to this problem: use the sysenter instruction instead. sysenter is quite a bit faster on modern Pentium processors. There are just a couple of problems: not all x86 processors support sysenter, and sysenter steps on registers in ways that can be hard to work around.

The lack of across-the-board support for sysenter is a problem. The kernel maintains a set of flags telling it what capabilities a given processor has; other processor-specific options are set at configuration time. System calls, however, are not invoked from the kernel - that is the C library's job. The last thing glibc needs is to be trying to figure out, at run time, the right way to invoke system calls.

Linus's solution to this problem is a patch which brings back a variant of an old idea. As of 2.5.53, the kernel will map a global, read-only page at the top of every process's address space. That page contains the optimal code for executing a system call on the current processor. Whenever glibc needs to call into the system, it simply sets up the registers and, rather than doing the old int $0x80, it jumps into the new page. The C library still needs to do a runtime test (since older kernels will lack this "vsyscall" page), but it need not concern itself with the detailed capabilities of different processors.

Keeping the registers straight turned out to be a trickier problem. The way sysenter steps on registers makes it hard to invoke system calls with more than five parameters. Various schemes were looked at, including creating a new "extra argument block" or simply requiring that six-argument system calls be invoked the old way. Linus finally came up with a tricky solution that makes it all work, however; those of you who like digging through x86 assembly may want to peek at his "absolutely wonderfully disgusting solution" to the problem. "I'm a disgusting pig, and proud of it to boot."

The result of all this: the gettimeofday() system call runs in just over half the time on a P4 processor. The speedup on Pentium 3's is less - a factor of 1.2 - but is still worthwhile.

Now that the vsyscall page is in place, will it be used for other things, such as implementing gettimeofday() entirely in user space? The answer, for now, appears to be "no". Getting a user-space gettimeofday() right is, seemingly, harder than it looks; there are synchronization issues, especially on some SMP systems where the clocks may not be synchronized by the hardware. So a user-space gettimeofday() appears to not be in the works, for now at least.

User space gettimeofday()

Posted Dec 19, 2002 3:54 UTC (Thu) by kbob (guest, #1770) [Link] (3 responses)

Gettimeofday() may not be practical in user space, but
getpid() certainly is. Other syscalls without side effects
that return infrequently changing information could also be
put into user space.

User space getpid()

Posted Dec 19, 2002 5:53 UTC (Thu) by ncm (guest, #165) [Link]

While getpid() could certainly be put in user space, the
number of programs that it would speed up noticeably
could probably be counted on a pig's hoof. It would be
pretty surprising if any syscall used frequently enough
to make a difference could be handled in user space.

gettimeofday() is called so frequently in real programs
that any speedup matters, at least for some programs.
Few other syscalls have that property.

User space gettimeofday()

Posted Dec 20, 2002 0:19 UTC (Fri) by acristianb (guest, #1702) [Link] (1 responses)

What are the side effects of gettimeofday?

User space gettimeofday()

Posted Dec 20, 2002 20:54 UTC (Fri) by jmshh (guest, #8257) [Link]

None. But there were two conditions, the other being "infrequently
changing information". Updates require cache invalidation, and this is
better done when the data is really needed only.

Does anyone know...

Posted Dec 20, 2002 8:52 UTC (Fri) by wolfrider (guest, #3105) [Link] (2 responses)

If this will affect AMD, etc processors?

Does anyone know...

Posted Dec 20, 2002 11:53 UTC (Fri) by hjernemadsen (subscriber, #5676) [Link]

If I remember correctly, the sysenter instruction is not present on AMD
processors. But it really doesn't matter as the normal int $0x80 approach
on an AMD is faster than sysenter on a P4...

Does anyone know...

Posted Dec 20, 2002 16:05 UTC (Fri) by proski (subscriber, #104) [Link]

According to AMD Processor Recognition Application Note, the SYSENTER and SYSEXIT instructions are supported on AMD Athlon and Duron, but not on K6 (see tables at the end).