By Jonathan Corbet
June 29, 2010
The notion that one should be liberal in what one accepts while being
conservative in what one sends is often expressed in the networking field,
but it shows up in a number of other areas as well. Often, though, it can
make more sense to be conservative on the accepting side; the condition of
many web pages would have been far better had early browsers not been so
forgiving of bad HTML. The tradeoff between being accepting and insisting
on correctness recently came up in a discussion of a proposed API change
for the
futex() system call; "conservative" appears to be the
winning approach in this case.
The futex() system call provides fast locking operations to user
space. Callers will normally block until a lock becomes available, but
they can also provide a struct timespec value specifying the
maximum amount of time to wait:
struct timespec {
long tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
};
The interpretation of the timeout value is a little strange. For a
FUTEX_WAIT command, the timeout is relative to the current time;
for any other command, it is either ignored or treated as an absolute
time. In particular, the operations like FUTEX_WAIT_BITSET and
FUTEX_LOCK_PI use absolute timeouts.
Oleg Nesterov recently came to the kernel
mailing list with an interesting glibc problem. If the tv_sec
portion of the timeout is negative, the kernel will fail the
futex() call with an EINVAL error. The POSIX thread code
is not prepared for that to happen and shows its anger by going into an
infinite loop - behavior which is not normally appreciated by user-space
programmers. The glibc developers have concluded that this behavior is a
kernel bug; to them, a negative absolute time value indicates a time before the
epoch. Since the epoch is, for all practical purposes, the beginning of
time, the response to a pre-epochal time should be ETIMEDOUT,
which the library is prepared to deal with.
This position was not well received. Thomas Gleixner responded that times before the epoch cannot be
programmed into the system clock and, thus, are not accepted by any Linux system
call which deals with absolute times. Since some system calls cannot
possibly accept such values, Thomas says, none should: "I'm strictly
against having different definitions of sanity for different
syscalls."
Linus, too, opposes accepting negative
times, but for slightly different reasons:
A positive time_t value is well-defined. In contrast, a negative
tv_sec value is inherently suspect. Traditionally, you couldn't
even know if time_t was a signed quantity to begin with! And on
32-bit machines, a negative time_t is quite often the result of
overflow (no, you don't have to get to 2038 to see it - you can
get overflows from simply doing large relative timeouts etc).
In other words, a negative time value is an indication that something,
somewhere has gone wrong. In such situations, rejecting the value may well
be the best thing to do.
That leaves the glibc developers in the position of having to fix their
code to deal with this (previously) unexpected return value. The good
news, such as it is, is that they'll be working on that code anyway. It
seems that the same function will also loop if it gets EFAULT back
from futex(), and that is clearly a user-space bug.
(
Log in to post comments)