LWN.net Logo

What makes a valid timespec?

By Jonathan Corbet
June 29, 2010
The notion that one should be liberal in what one accepts while being conservative in what one sends is often expressed in the networking field, but it shows up in a number of other areas as well. Often, though, it can make more sense to be conservative on the accepting side; the condition of many web pages would have been far better had early browsers not been so forgiving of bad HTML. The tradeoff between being accepting and insisting on correctness recently came up in a discussion of a proposed API change for the futex() system call; "conservative" appears to be the winning approach in this case.

The futex() system call provides fast locking operations to user space. Callers will normally block until a lock becomes available, but they can also provide a struct timespec value specifying the maximum amount of time to wait:

    struct timespec {
	long		tv_sec;			/* seconds */
	long		tv_nsec;		/* nanoseconds */
    };

The interpretation of the timeout value is a little strange. For a FUTEX_WAIT command, the timeout is relative to the current time; for any other command, it is either ignored or treated as an absolute time. In particular, the operations like FUTEX_WAIT_BITSET and FUTEX_LOCK_PI use absolute timeouts.

Oleg Nesterov recently came to the kernel mailing list with an interesting glibc problem. If the tv_sec portion of the timeout is negative, the kernel will fail the futex() call with an EINVAL error. The POSIX thread code is not prepared for that to happen and shows its anger by going into an infinite loop - behavior which is not normally appreciated by user-space programmers. The glibc developers have concluded that this behavior is a kernel bug; to them, a negative absolute time value indicates a time before the epoch. Since the epoch is, for all practical purposes, the beginning of time, the response to a pre-epochal time should be ETIMEDOUT, which the library is prepared to deal with.

This position was not well received. Thomas Gleixner responded that times before the epoch cannot be programmed into the system clock and, thus, are not accepted by any Linux system call which deals with absolute times. Since some system calls cannot possibly accept such values, Thomas says, none should: "I'm strictly against having different definitions of sanity for different syscalls."

Linus, too, opposes accepting negative times, but for slightly different reasons:

A positive time_t value is well-defined. In contrast, a negative tv_sec value is inherently suspect. Traditionally, you couldn't even know if time_t was a signed quantity to begin with! And on 32-bit machines, a negative time_t is quite often the result of overflow (no, you don't have to get to 2038 to see it - you can get overflows from simply doing large relative timeouts etc).

In other words, a negative time value is an indication that something, somewhere has gone wrong. In such situations, rejecting the value may well be the best thing to do.

That leaves the glibc developers in the position of having to fix their code to deal with this (previously) unexpected return value. The good news, such as it is, is that they'll be working on that code anyway. It seems that the same function will also loop if it gets EFAULT back from futex(), and that is clearly a user-space bug.


(Log in to post comments)

What makes a valid timespec?

Posted Jul 1, 2010 13:49 UTC (Thu) by marcH (subscriber, #57642) [Link]

I do not find that receiving an incorrect error code is an excuse good enough to go into an infinite loop! Not wanting to work around someone else's bug sounds fine; amplifying it does not.

What makes a valid timespec?

Posted Jul 1, 2010 13:53 UTC (Thu) by marcH (subscriber, #57642) [Link]

> The notion that one should be liberal in what one accepts while being conservative in what one sends is often expressed in the networking field, but it shows up in a number of other areas as well. Often, though, it can make more sense to be conservative on the accepting side;...

This is indeed a very frequent topic in networking. Being too liberal promotes buggy code.
<http://en.wikipedia.org/wiki/Robustness_principle#Interpr...>

What makes a valid timespec?

Posted Jul 1, 2010 21:19 UTC (Thu) by dmadsen (guest, #14859) [Link]

On the contrary, quoting the Wikipedia article that's quoting RFC1122:

Be liberal in what you accept, and conservative in what you send

Software should be written to deal with every conceivable error, no matter how unlikely; [...] unless the software is prepared, chaos can ensue. [...] This assumption will lead to suitable protective design, [...]

The text is saying to be prepared to handle "every conceivable error" on the receiving side. This means that you write extra code to handle conditions that you just know "can't happen".

Being liberal implies do something sensible, no matter what. It says the opposite of "it's ok to write bad buggy code".

In my opinion, the best way to write a spec is to try to find a way to make all input values meaningful in some way. Sometimes this involves expanding the definition of a field. As a loose example off the top of my head, perhaps a positive value could mean absolute time but a negative value could mean relative time instead of "error". (Whether this actually makes sense for this situation, I don't know, but this is the kind of idea I mean).

If this can be done, it's a win, because this means there is no error possible, and the caller has to check for less error return codes. Sure, the caller could accidentally generate the wrong value, but that's a different problem.

Of course, this can't be done all the time, and even here, "sensible" does not mean "violate the spec". The concept of "fail gracefully" is always applicable. I'm pretty sure that sloppy code won't be graceful in its failure.

On the sending side, it says "be conservative". It does not say "it's OK to be sloppy". Just because I know the receiver is tolerant gives me no excuse to test that tolerance. There is no excuse to relax one's efforts to write excruciatingly correct code. (Doesn't mean I'll always succeed in that, but to stop trying is inexcusable.)

What makes a valid timespec?

Posted Jul 1, 2010 21:48 UTC (Thu) by marcH (subscriber, #57642) [Link]

> Being liberal implies do something sensible, no matter what. It says the opposite of "it's ok to write bad buggy code".

You misunderstood me, I meant: being liberal promotes the buggy code OF OTHERS.

What makes a valid timespec?

Posted Jul 2, 2010 10:23 UTC (Fri) by farnz (guest, #17727) [Link]

The problem marcH is alluding to is that if I'm liberal in what I accept, and you are buggy, you have no incentive to fix things; indeed, you may not even notice that you're buggy. Worse, if you're big enough, the rest of us end up forced to accept your bad output - with people saying "well, it works with their system, and it works with foobar, so farnz's system must be at fault".

As a really, really simple example; imagine a network protocol where lines end CRLF (say SMTP). Imagine an SMTP server that accepts any of the 4 line endings (CR, LF, CRLF, LFCR). If you accidentally code your SMTP client to do LFCR (\n\r instead of \r\n), it works with that SMTP server. Add in a little bit of human nature ("it works with foobar SMTP server, so it must be your server that's faulty"), and you have a recipe for forcing everyone to accept bad input and interpret it in a particular way.

What makes a valid timespec?

Posted Jul 7, 2010 3:48 UTC (Wed) by dmadsen (guest, #14859) [Link]

I understand that what you're saying is that if you're too flexible in handling malformed input, that over time that malformed input becomes part of the standard simply through wide usage.

The way to combat that is to point to the real standard, the one that says "CRLF". The person responsible for the error has a choice, either to fix the problem or not.

And if there are multiple wrong implementations, then they all need to be fixed: widespread wrong doesn't make it right.

If the buggy code is in the 800-pound chair-throwing gorilla's code, than I'll bet that it won't be fixed anyway, whether it originally interoperated or not. In other words, if they want to ignore the spec, they'll do it regardless of my code, and if I want to play in that arena, I'm gonna have to decide whether to accept (or generate!) malformed input or not.

If the positions are reversed, that is, my code is buggy and the gorilla's is right, then the situation surely could exist that I wouldn't know until someone else found it and told me. But then I'd change it. :-) And I'd kick myself for having missed it.

This is one of the reasons that there are interoperability bake-offs. But this also assumes that all players in the game want to "do the right thing". If someone doesn't care about correct implementations, then there are larger problems than whether or not I accept lousy input.

Now suppose I just want to code to the spec, and further that the spec is clear enough so that everyone agrees on just what the protocol actually is. "Liberal acceptance" still applies, in that I still must do input sanity checking and fail gracefully if the protocol doesn't otherwise have a way to say "bad input".

There are always people issues in these situations, and I think *that's* where "you have a recipe for forcing everyone to accept bad input" comes from.

Thank you for your explanation. I still believe in "liberal acceptance", but I better understand what you're saying and why.

What makes a valid timespec?

Posted Jul 7, 2010 9:12 UTC (Wed) by farnz (guest, #17727) [Link]

Slightly stronger than that, I'm afraid; if you're liberal about what you accept (as against gracefully erroring out on bad input), you create a world where people don't realise that they're getting it wrong until you point it out to them (because their wrong input is accepted everywhere they test). If their implementation becomes popular before anyone notices, the de-facto standard becomes accepting malformed input, and anyone who points at the real standard is told "but no-one does that - do it properly, not what the standard says".

Hence the more modern idea that you should be conservative in what you send, and error out gracefully in the presence of bad input. Taking SMTP as an example again - if you get bad line endings, gracefully erroring out by responding "500 Invalid line ending detected - use CRLF not LFCR" or "500 Invalid line ending detected - use CRLF instead of LF" is better overall than being liberal in what you'll accept by correcting the bad line endings, and responding "250 OK".

What makes a valid timespec?

Posted Jul 9, 2010 4:57 UTC (Fri) by dmadsen (guest, #14859) [Link]

You're assuming that:

1) They have made a mistake in testing;
2) Everyone accepts the bad input...
3) ... And without any warning messages.
4) Their software becomes popular, and no one notices the error.
5) The programmers of the original software are (a) unwilling to fix their code; and/or (b) arrogant enough to believe their code is right and the standard is "wrong".

Especially in free software, don't you think that *someone* will take it upon themselves to do the right thing and fix the problem code?

I find it difficult to believe that the community would put up with allowing an ongoing error without any attempt to fix it.

Ignoring a standard is pretty bad -- if everyone felt free to do that, there would be no interoperability, or rather, interoperability would be negotiated on a case-by-case basis. But that's why standards are negotiated in the first place, so each case doesn't have to be.

I can understand a large proprietary vendor acting that way, and, as a matter-of-fact, doing it on purpose to deliberately make other (correct!) non-interoperable, and then depending on their market share to hose everyone else. But I see that as a different situation, albeit one that's occurred already.

BTW, ISTR there's an SMTP code that, instead of indicating success, indicates success-with-problems. Maybe I'm wrong...

I'm NOT saying that the tolerant program shouldn't complain -- as you point out, making the problem known is a good thing. I'm just saying that if you don't have to fail, you shouldn't. And I agree with you that if you have to fail, it should be gracefully.

What makes a valid timespec?

Posted Jul 9, 2010 9:56 UTC (Fri) by farnz (guest, #17727) [Link]

I'm assuming that the normal testing will be "point my code at someone else's code that implements the specification, and see if it works"; this seems to be standard in the industry, unfortunately. If it works, people will claim it's OK - and there will be a tendency to blame the bit you've substituted in for the fault (think "It works fine with exim, it must be postfix that's the problem").

Programmers rarely appear (in free software or commercial software) to hunt for warning messages; there's a habit of assuming that if it works, it's OK. This includes not picking up in-line warnings in the protocol, or warnings in the log file; you need errors to be "in-your-face" to get the average programmer to notice. While these will eventually get picked up in published source projects, even in an all free software world there will be plenty of unpublished programs that are used internally to an organisation - and it's not improbable that particular published programs will get a bad rep for being unreliable not because they implement the spec wrongly, but because they're stricter than other implementations, and thus break badly written in-house code.

And I chose SMTP and line endings for a reason - many languages have an equivalent of Python's universal newlines, or Perl's chomp that can be used to strip line endings; the obvious way to use chomp on a CRLF protocol on a POSIX system would also handle plain LF (and indeed, the obvious implementation in C would do the same). Given the grade of testing that seems common in published open source (let alone people's unpublished code), I wouldn't be at all surprised if the net effect of "be liberal in what you accept" was to ensure that the published standard and the de-facto standard don't match up - and anyone implementing the standard has to do things like "cope with all four line ending possibilities, even though the standard says CRLF".

What makes a valid timespec?

Posted Jul 9, 2010 10:14 UTC (Fri) by Wol (guest, #4433) [Link]

I'm minded of MS-Mail - the server version ... :-)

The spec says "don't quit unless the client sends an explicit quit command". So what does MS-Mail do? When they introduced 8-bit and clients started sending EHLO instead of HELO? "Oh, EHLO is invalid, quitting"! So no 8-bit client could connect to the MS server because the opening negociations always failed with the MS server dropping the connection!

And afaik they never fixed it :-(

Cheers,
Wol

What makes a valid timespec?

Posted Jul 1, 2010 20:44 UTC (Thu) by clugstj (subscriber, #4020) [Link]

If the man page for futex() says it can return EINVAL, then the POSIX thread code MUST be able to handle it.

It is pure arrogance to assume that your code CANNOT provide incorrect inputs to a system call.

What makes a valid timespec?

Posted Jul 2, 2010 9:24 UTC (Fri) by nix (subscriber, #2304) [Link]

Whyever not? Ulrich has turned down contributions to allow compilation of glibc with -fstack-protector, so obviously he believes that glibc is flawless and completely bug-free. (He didn't deign to provide any reasons for this rejection, so I am left guessing. The next bugzilla ticket after the contribution was one demonstrating a buffer overrun in the stdio routines tickled by glibc's own testsuite. It remains unfixed.)

What makes a valid timespec?

Posted Jul 3, 2010 18:31 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

If the man page for futex() says it can return EINVAL

There's no man page for the futex system call (or any user guide or specification for any system call). The man page is for the glibc function that is based on that system call.

Consequently, the issue of whether the kernel is conforming to spec never came up in the thread; the discussion was mainly what the best design for the kernel is.

It is pure arrogance to assume that your code CANNOT provide incorrect inputs to a system call

I don't think that assumption is here. The design decision here is how much code to put into robustness -- making glibc tolerate its own defects. It's reasonable to say that if glibc has a bug, glibc can go into an infinite loop. I'm pretty sure I wouldn't engineer it that way, but it's a philosophy.

What makes a valid timespec?

Posted Jul 3, 2010 21:03 UTC (Sat) by corbet (editor, #1) [Link]

It's not glibc's defects which are at issue here - the timeout value is provided by the calling application.

What makes a valid timespec?

Posted Jul 4, 2010 2:39 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

I think in this subthread we were talking about a hypothetical glibc bug - one that would justify handling EINVAL even if the glibc author can't see a way that it would ever happen.

As for it not being a glibc defect if the program goes into an infinite loop when the user specifies a negative timeout value, I don't buy it. Thats like saying, "it's not my fault I brought a gun into the prison; I was just doing what my brother asked me to. It's not my policy to pass judgement on requests from prisoners I'm visiting."

Accordingly, no one in the thread pointed the finger of blame for the infinite loop at the glibc caller. It was just question of kernel vs glibc.

What makes a valid timespec?

Posted Jul 13, 2010 9:54 UTC (Tue) by robbe (guest, #16131) [Link]

> There's no man page for the futex system call (or any user guide or
> specification for any system call). The man page is for the glibc
> function that is based on that system call.

While true in general, this is false for futex():
http://www.kernel.org/doc/man-pages/online/pages/man2/fut...

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds