|| ||Ulrich Drepper <firstname.lastname@example.org>|
|| ||More documentation: system call how-to|
|| ||Wed, 1 Aug 2007 14:06:57 -0400|
How about adding the attached text to the Documentation directory? I
had to correct over the years to one or the other system call design
problems. Other problems couldn't be corrected anymore and we have to
live with them. Maybe spelling out the rules explicitly will help a bit.
I've added a few rules I could think of right now. What should be
added as well is a rule for 64-bit parameters on 32-bit platforms. I
leave this to the s390 people who have the biggest restrictions when
it comes to this.
Signed-off-by: Ulrich Drepper <email@example.com>
Rules for designing new system calls
1. Do not use multiplexing system calls.
A practical argument is that it invariably reduces the number of
available parameters to the system call which will haunt people who
have to care about architectures with a limited set of registers
reserved for this purpose.
Another aspect is that it is most likely slower. The caller in
most cases knows exactly which sub-function of the system call is
needed. If the decision about the sub-function is dynamic the
computation of the code could just as well be a computation of a
system call number. The difference lies in the kernel where the
multiplexing always has to happen, even if the required
sub-function is known to the caller ahead of time.
Adding new system calls is much cheaper: it is a word in a table.
This is much less code and data than the switch statement or
if-cascade needed to implement the multiplexer.
Bad examples: sys_socketcall on x86, sys_futex, and several more
2. Use of ENOSYS:
The runtime has to be able to distinguish non-existing system calls
due to old kernel versions from error conditions in an implemented
system call. This means the ENOSYS error should never be used in
an error condition once a system call is implemented.
Example: In sys_fallocate, if the file system does not implement the
fallocate operation, return EOPNOTSUPP and not ENOSYS.
There is one exception to the rule: if rule #1 is violated and a
multiplexer system call is used, invalid sub-function codes should
be signaled using ENOSYS.
3. Choose parameters for growth
It makes today no sense anymore to implement any system call which
restricts even on 32-bit machines the size of values indicating
file sizes or offsets to 32-bits. 64-bit values should be used
Example: sys_fadvise64, which should have been defined from day 1
Similarly, timeout granularity of seconds is not suitable anymore.
Most interfaces use nano-second resolution and a often used way
to specify such times and intervals is using the timespec structure.
4. 32-bit compatibility
Kernels for architectures like x86-64 and PPC64 have to be able to
execute 32-bit binaries as well. The implementation of the actual
system calls is of course shared. The types for the system call
parameters and return values on 32-bit and 64-bit systems can be
different. This is where compatibility wrappers come in.
These functions, usually named compat_sys_XYZ for a system call
sys_XYZ, are only needed in case the system call parameter is
a pointer to a structure which has a different representation in
32- and 64-bit mode. Differences in size of integer or pointer
arguments does not require a compatibility wrapper.
Examples: compat_sys_utimensat, which has to convert a timespec
structure from 32-bit to 64-bit. See also rule #3.