A new system call restart mechanism
[Posted December 10, 2002 by corbet]
System calls often have to wait for things - I/O completion, availability
of a resource, or simply for a timeout to expire, for example. Normally
the process
making the system call becomes unblocked at the appropriate time, and the
call completes its work and returns to user space. What happens, though,
if a signal is queued for the process while it is waiting? In that case,
the system call needs to abort its work and allow the actual delivery of
the signal. For this reason, kernel code which sleeps tends to follow the
sleep with a test like:
if (signal_pending(current))
return -ERESTARTSYS;
After the signal has been handled, the system call will be restarted (from
the beginning), and the user-space application need not deal with
"interrupted system call" errors. For cases where restarting is not
appropriate, a -EINTR return status will cause a (post-signal)
return to user space without restarting the system call.
In general, this mechanism works reasonably well. But, what about cases
where the system call should not just be restarted from the beginning? The
case which raised that question is the nanosleep() system call,
which puts the process to sleep for a (potentially) short time. By the
POSIX standard, nanosleep() should not return early as a result of
a signal if the process has no handler for that signal. So the call
should be restarted. The problem is that the argument to
nanosleep() tells how long the process wants to sleep - not when
it wants to wake up. When the call is restarted, it must take into account
how long the process had slept before the signal, and how long it took to
deal with the signal, and adjust the sleep time accordingly. In other
words, it should save the absolute time when the process wanted to wake up,
and the restarted call should sleep until that time (or just return if the
time has already passed). But there is no easy place for a system call to
save that sort of information.
To solve this problem, Linus added a new
mechanism to the 2.5.51 kernel, based on work by George Anzinger. This
mechanism allows interrupted system calls to specify a different function
to run when the call is restarted, along with information to be passed to
that function.
Specifically, the thread_info structure now includes a
restart_block structure. A system call needing different restart
behavior can put a restart handler function into that structure, along with
some arguments for that function. Then, if interrupted, the system call
should return -ERESTARTSYS_RESTARTBLOCK. After the signal is
dispatched, and if there was no handler specified by the process (and the
process still lives), the function in the restart block will be called,
with the block itself as an argument.
nanosleep(), which is currently the only user of this mechanism,
need only save the wakeup time in the restart block, along with pointers to
the user arguments. Interrupted sleeps will now be handled properly. It
is not clear how many other system calls will make use of the new restart
system; in most cases it is better to just return -EINTR in
complicated situations. But, for cases where you really need to see the
operation through, the new mechanism should help.
(
Log in to post comments)