LWN.net Logo

On the importance of return codes

By Jake Edge
December 2, 2009

Just days after FreeBSD 8.0 was released, the FreeBSD developers were undoubtedly unhappy to see a "zero day" exploit posted on the Bugtraq mailing list. The exploit is for a local privilege escalation vulnerability in the runtime loader (rtld) that allows unprivileged users to become root. The vulnerability and patch highlight the need for code—particularly security enforcing code—to check the return values of functions that get called.

The exploit essentially creates a broken environment, such that unsetenv() cannot delete variables from that environment. Because unsetenv() is unable to remove variables like LD_PRELOAD from the environment, rtld fails to do so when running a setuid(0) binary such as ping. But, as the patch shows, rtld could have recognized the situation by checking the return value from unsetenv(). By not doing so, a security feature can easily be circumvented.

LD_PRELOAD allows users to specify libraries they want loaded before the executable. This is typically used to load previous versions, debugging aids (like malloc()/free() tracking), and things of that sort. Clearly setuid() binaries should not be linked to arbitrary, user-controlled libraries at runtime. In the case of the exploit, the shared library used simply spawns a shell from the _init() call. That shell has the effective user id of root because the loader kernel has already called setuid() for the ping binary.

It is common for programmers to ignore return values for functions that "can't fail", but that is a dangerous practice. It is worse when it happens in code that runs with privileges. Something similar occurred with the (badly named) "sendmail capabilities bug", which was really a problem with the Linux kernel capabilities implementation. But, had sendmail been more defensive and checked the return code from setuid() when it was dropping privileges—something that "can't fail"—a much bigger problem would have been averted.

If the person writing the system or library call believed that the call can't fail, they would presumably have made it a void function. That's not to say that those programmers—or committees like POSIX—are immune from bugs or bad decisions, but callers should heed their intent. It's a difficult problem, though, as it is sometimes unclear what the program should do if something that can't fail does fail. Worse yet, without some kind of comprehensive fault-injection framework, those error paths are difficult to test. But, at least for privileged code, the problem can't be ignored.

This particular problem has existed in FreeBSD since version 7.0, released in February 2008. A pre-advisory with the patch was released by FreeBSD within a few hours of the Bugtraq posting. A full advisory and update is expected soon. In the meantime, this should serve as something of an object lesson for others; hopefully that will lead to developers scrutinizing existing code for similar issues, while also helping to remind programmers not to make that kind of mistake in any future code they write.


(Log in to post comments)

On the importance of return codes

Posted Dec 3, 2009 3:13 UTC (Thu) by jimparis (subscriber, #38647) [Link]

It seems strange that unsetenv() thought the environment was corrupt and couldn't handle it, while something (getenv()?) was still able to read the bad LD_PRELOAD variable later.

On the importance of return codes

Posted Dec 3, 2009 9:57 UTC (Thu) by hppnq (guest, #14462) [Link]

That's because getenv(), unlike unsetenv(), does not always check for a corrupt environment: it simply loops over the entire environment when the program brings its own. So getenv() skips environ[0] and reads LD_PRELOAD from environ[1], while unsetenv() complains about the corrupted environ[0].

It seems only a relatively recent change to unsetenv() introduced a non-void return value, and that the loader code was not updated at the same time.

On the importance of return codes

Posted Dec 3, 2009 10:27 UTC (Thu) by epa (subscriber, #39769) [Link]

It seems only a relatively recent change to unsetenv() introduced a non-void return value, and that the loader code was not updated at the same time.
I wonder whether linting the whole source tree with some kind of static analyser to warn about ignored return values would have caught this.

On the importance of return codes

Posted Dec 3, 2009 13:12 UTC (Thu) by NAR (subscriber, #1313) [Link]

I guess some static analyser could have warned about this. But I also guess that that warning would have been buried deep under heaps of false alarms. The Linux kernel NULL-pointer vulnerabilities were detected by the Stanford checker (if I remember correctly), it's just nobody cared to wade through all that stuff.

On the importance of return codes

Posted Dec 3, 2009 9:07 UTC (Thu) by michaeljt (subscriber, #39183) [Link]

> Worse yet, without some kind of comprehensive fault-injection framework, those error paths are difficult to test.
I was playing around with the idea of a simple script that could extract parts of a C(++) file and make simple changes based on something similar to Doxygen comments in order to unit test that sort of thing. That is, your script extracts the body code of the functions you want to unit test, glues on alternative header and footer code that you write in the doxygen comments, and replaces function calls where you want to test failure conditions with simple checks against a function parameter that you supply.

E.g.

bool func_to_test()
/** @test_header bool my_test_func(void *malloc_result) */
/** @start_extracting_here */
{
... code ...
/** @replace_next_line_with my_global_pointer = malloc_result */
my_global_pointer = malloc(sizeof_struct);
return (my_global_pointer != NULL);
}
/** @stop_extracting_here */

/** @start_testing_code
* assert(my_test_func((void *)-1));
* assert(!my_test_func(NULL));
*/
/** @end_testing_code */

Perhaps I will get round to finishing it in the next couple of weeks...

When the programmer is forced to handle return codes

Posted Dec 3, 2009 13:19 UTC (Thu) by NAR (subscriber, #1313) [Link]

The manual for the close() system call states that "Not checking the return value of close() is a common but nevertheless serious programming error". Java uses checked exceptions to avoid these problems: if the IOException thrown from e.g. FileOutputStream:close() is not handled, the code doesn't even compile. Unfortunately this doesn't solve the problem - many "not so talented" developers just put an empty try-catch block around to shut the compiler up when at least some error message would be useful...

When the programmer is forced to handle return codes

Posted Dec 4, 2009 11:12 UTC (Fri) by skitching (subscriber, #36856) [Link]

What would you suggest a program do if a close call fails?

When the programmer is forced to handle return codes

Posted Dec 4, 2009 15:14 UTC (Fri) by hppnq (guest, #14462) [Link]

A failing close() may indicate a whole range of problems. While the file descriptor is gone in any case, catching the error and taking appropriate measures is a very good idea in non-trivial cases. What is appropriate depends on the context, because you can close() anything that can be represented by a file descriptor. Not doing anything may result in data loss or corruption even if you have written an otherwise bug-free program, although probably it is more likely that a close() failure will point out some problem in your code.

Even if your program is not able to do something sensible, maybe the user or administrator is, so you should report the error.

When the programmer is forced to handle return codes

Posted Dec 4, 2009 20:05 UTC (Fri) by magnus (subscriber, #34778) [Link]

A failed close is usually (always?) a deferred write error, so handle it the same way as if one of the write calls to the file had failed.

If a close would fail after just reading data that would be much more tricky to handle since that would imply that some earlier read data would be incorrect but I don't think that case should occur?

Either way, a failed close shouldn't just go unnoticed. If you can't do anything else, a perror() is better than nothing.

When the programmer is forced to handle return codes

Posted Dec 4, 2009 20:25 UTC (Fri) by magnus (subscriber, #34778) [Link]

Just checked the man page and noticed that close can also fail with EINTR so a retry loop could be in order.. You learn something new every day...

When the programmer is forced to handle return codes

Posted Dec 4, 2009 22:49 UTC (Fri) by cras (guest, #7000) [Link]

UNIX98 says:

If close() is interrupted by a signal that is to be caught, it will return -1 with errno set to [EINTR] and
the state of fildes is unspecified.

So just looping may not be a good idea either. In any case I'd hope that the only time when close()
could return EINTR is with NFS..

When the programmer is forced to handle return codes

Posted Dec 5, 2009 0:27 UTC (Sat) by nix (subscriber, #2304) [Link]

I suspect that in practice close() can return -EINTR because write() can,
and close() can issue deferred writes. I'd expect a tight loop which
continues until errno != EINTR would work fine: that would drop out on
EBADF as well, which is what you want.

When the programmer is forced to handle return codes

Posted Dec 5, 2009 7:18 UTC (Sat) by cras (guest, #7000) [Link]

Sounds reasonable, yes. But do you think there's a single program that actually is doing the looping,
or even wants to change their code to do this? I've never heard of this being a problem in my code.
And actually I think it's probably better not to do the looping anyway. If you get a signal, it means
someone wants the process to be killed anyway, so might as well get on with it.

When the programmer is forced to handle return codes

Posted Dec 5, 2009 8:00 UTC (Sat) by quotemstr (subscriber, #45331) [Link]

If you get a signal, it means someone wants the process to be killed anyway, so might as well get on with it.
There are many signals, only some of which are meant to kill processes. Consider job control. It's also perfectly legitimate (if a little antiquated) to use signals for reporting and miscellaneous functions. You don't want to leak a file descriptor upon receiving a nonfatal signal.

When the programmer is forced to handle return codes

Posted Dec 5, 2009 21:35 UTC (Sat) by nix (subscriber, #2304) [Link]

I do it in my code because it *has* caused me a problem (notably
getting -EINTR on write()/close() on NFS-served volumes, and
getting -ENOSPC on close()).

(Obviously you actually have helpers that do it. You don't repeat the code
everywhere, that'd be disgusting)

When the programmer is forced to handle return codes

Posted Dec 5, 2009 22:47 UTC (Sat) by cras (guest, #7000) [Link]

So I guess you know what happens in Linux when close() fails with EINTR (with NFS)? It won't close
the fd and it should be retried?

I've gotten rid of alarm()s in (most of) my code and there are no child processes, so I'd think close()s
are pretty safe to do without looping.. And with NFS I first fsync/fdatasync first anyway, so close
probably shouldn't fail anyway. Unless it can fail with EINTR even when it doesn't have to write
anything?.. That would seem almost like a bug.

When the programmer is forced to handle return codes

Posted Dec 6, 2009 0:05 UTC (Sun) by nix (subscriber, #2304) [Link]

So I guess you know what happens in Linux when close() fails with EINTR (with NFS)? It won't close the fd and it should be retried?
The EINTR happens if a signal arrives while pending writes are being flushed. At this point, the FD is not yet closed. (But if it were, you could retry it anyway, and you'd get EBADF and would know that it had been closed last time around.)

Since you have to loop for write()/read() anyway, looping for close() as well is hardly a killer. (And, yes, I would rather that -EINTR would die die die as fast as possible, but unfortunately it is not dead so we have to deal with it.)

When the programmer is forced to handle return codes

Posted Dec 6, 2009 0:29 UTC (Sun) by cras (guest, #7000) [Link]

That also makes me wonder about my fsync/fdatasync calls. I suppose they can also fail with EINTR.
Anyway, as I mentioned above, all signals causing those to my program nowdays should be
external so I don't think I will be doing anything about this problem. (Handling is the same for
EINTR as for EDQUOT/ESPACE.) Interesting to realize it could be a problem, of course.

When the programmer is forced to handle return codes

Posted Dec 6, 2009 11:17 UTC (Sun) by nix (subscriber, #2304) [Link]

POSIX documents that fsync() can return EINTR but does not document it for
fdatasync(). This seems likely to be an oversight to me.

When the programmer is forced to handle return codes

Posted Dec 5, 2009 11:40 UTC (Sat) by hppnq (guest, #14462) [Link]

If the file descriptor is actually gone -- which I think is the case also if close() returns with EINTR -- it may have been reused already in a multi-threaded environment by the time you loop around: you might close the wrong file. In a single-threaded environment, you should expect EBADF immediately after the EINTR, I suppose.

When the programmer is forced to handle return codes

Posted Dec 5, 2009 21:37 UTC (Sat) by nix (subscriber, #2304) [Link]

Yeah, sorry, my attitude about people doing file I/O in many threads at
once is that they deserve everything they get.

I much prefer using processes and explicit IPC to threads.

When the programmer is forced to handle return codes

Posted Dec 6, 2009 11:31 UTC (Sun) by hppnq (guest, #14462) [Link]

I think the point is that close() will always render the file descriptor unusable: if it doesn't you really cannot close() it again safely, because you've bumped into a kernel bug.

When the programmer is forced to handle return codes

Posted Dec 7, 2009 23:02 UTC (Mon) by cventers (subscriber, #31465) [Link]

Multithreaded programs might reallocate the fd before your thread got the opportunity to see an EBADF. If you broke the loop on success as well, you'd be mostly okay, except for the unspecified status of the fd as per UNIX 98...

On the importance of return codes

Posted Dec 3, 2009 13:45 UTC (Thu) by nix (subscriber, #2304) [Link]

The runtime loader hasn't called setuid() for the ping binary: the kernel has. The whole point of the setuid model is that only the kernel's exec() implementation(s) can elevate privileges: potentially-unprivileged code (like the dynamic loader) cannot.

On the importance of return codes

Posted Dec 3, 2009 14:25 UTC (Thu) by jake (editor, #205) [Link]

> The runtime loader hasn't called setuid() for the ping binary:
> the kernel has.

Yes, of course, thanks for pointing that out.

jake

On the importance of return codes

Posted Dec 3, 2009 13:46 UTC (Thu) by cesarb (subscriber, #6266) [Link]

At least part of the problem is that the code runs in a potentially "dirty" state, and has to know every possible thing which has to be "cleaned" from that state (and how to clean it). If it forgets to (or, as was the case here, fails to) remove even one of them, no matter how insignificant it is, it can potentially be exploited.

On the other hand, if the code ran in a "clean" state and only copied things it knew to be safe from the "dirty" state, the risk would be lower (if it forgets or fail to do a copy, the information is simply missing).

In other words, blacklists versus whitelists.

Unfortunately, that cannot be done in this case, since it has no way of knowing which environment variables the called program will need.

On the importance of return codes

Posted Dec 3, 2009 18:25 UTC (Thu) by khc (subscriber, #45209) [Link]

However, it can make a copy of the environment, with LD_PRELOAD filtered out, and use the new copy.

blacklist vs whitelist

Posted Dec 9, 2009 3:56 UTC (Wed) by pjm (subscriber, #2080) [Link]

That would be a blacklist approach. The whole point of cesarb's comment is that one would usually prefer a whitelist approach: create a new environment containing just PATH (with a known safe value), HOME, and a couple of others.

On the importance of return codes

Posted Dec 3, 2009 15:22 UTC (Thu) by kpvangend (guest, #22351) [Link]

Isn't this something that gcc can catch?

On the importance of return codes

Posted Dec 3, 2009 15:41 UTC (Thu) by rvfh (subscriber, #31018) [Link]

You can set attributes to a function for GCC to complain. I think you add __wur at the end of the declaration, like so:
extern int system (__const char *__command) __wur;

On the importance of return codes

Posted Dec 3, 2009 19:16 UTC (Thu) by kpfleming (subscriber, #23250) [Link]

There are some Linux distributions (Ubuntu I know for sure, Debian may have been the source) that by default enable such checking for library calls provided by glibc; compiling code that does not pay attention to return values results in warnings/errors. It's a start, at least.

On the importance of return codes

Posted Dec 4, 2009 0:38 UTC (Fri) by nix (subscriber, #2304) [Link]

You mean

extern int system (__const char *__command)
__attribute__((__warn_unused_result__))

__wur is a glibc-specific hack that should not be used outside of glibc
system headers.

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds