This is why we can't have safe cancellation points

Posted Apr 14, 2016 23:51 UTC (Thu) by wahern (subscriber, #37304)
In reply to: This is why we can't have safe cancellation points by pikhq
Parent article: This is why we can't have safe cancellation points

posix_spawn exists solely because POSIX makes implementing fork optional so that systems without virtual memory can still meet the letter of the standard.

posix_spawn doesn't magically solve thread race problems. Using fork+exec+dup2 can be just as safe posix_spawn and be significantly more clear. For example, descriptors without the FD_CLOEXEC descriptor flag set will still leak into the new process instance unless you explicitly close them. But which is easier: tediously filling out a posix_spawn_file_actions object, or simply calling close? Often the latter.

Implementations generally implement posix_spawn by using vfork+exec. That does mean that they bypass pthread_atfork handlers. But that's not intrinsically safer: it can cut both ways--maybe one was going to close a descriptor. Anyhow, pthread_atfork handlers are fundamentally broken wrt the original intention, and the specification admits this.

This is why we can't have safe cancellation points

Posted Apr 15, 2016 12:07 UTC (Fri) by oshepherd (guest, #90163) [Link] (4 responses)

posix_spawn is not optional in POSIX. Yes, one of the motivating reasons for it is that it permits profiles of POSIX for no-MMU platforms to spawn new processes, but that is not the only motivation. posix_spawn can also be more efficient (because it does not have to go through the completely generic fork path, and it additionally makes error handling much easier (Have you ever tried reporting execve, dup or close errors back to the parent process? It's not a trivial matter...)

I don't buy the argument that fork+dup+close+exec is easier. Not once you actually handle errors properly or do other things that a Robust Application (TM) should.

This is why we can't have safe cancellation points

Posted Apr 15, 2016 14:04 UTC (Fri) by MrWim (subscriber, #47432) [Link] (1 responses)

> posix_spawn is not optional in POSIX.

A minor point: I understood the parent to mean that fork is optional in POSIX, whereas posix_spawn is mandatory.

This is why we can't have safe cancellation points

Posted Apr 15, 2016 15:05 UTC (Fri) by oshepherd (guest, #90163) [Link]

Uh, thinko. I meant to say "fork is not optional in POSIX." Indeed, posix_spawn is optional, but fork is not (that said, if there are any important platforms where POSIX spawn doesn't exist - I'm thinking probably OS X here - it'd be relatively easy for somebody to produce a compatibility shim which implemented it on top of fork/execve/pipe/close/dup/etc)

This is why we can't have safe cancellation points

Posted Apr 21, 2016 5:13 UTC (Thu) by wahern (subscriber, #37304) [Link] (1 responses)

You're completely right: posix_spawn is optional (part of the Spawn extension) and fork is not. I even had the standard open but didn't bother confirming what I asserted. Shameful....

Regarding error checking: routines like posix_spawn_file_actions_addclose can fail with ENOMEM, and AFAICT both the glibc and musl implementations allocate memory even on the first add--musl for each individual action, glibc for 8 actions. Because allocation can fail even on Linux with OOM (e.g. policy-based resource limits), regardless of allocation size, correct code needs to check for failure on each individual descriptor action added to the queue. So posix_spawn requires the same number of error checks.

OTOH, even the most pedantic of developers could choose to ignore errors from close() (or posix_close() if and when http://austingroupbugs.net/view.php?id=529 is adopted) in the child process. Apropos this article, you no longer need to worry about close being a thread cancellation point in the child, and blocking all signals is easy, so EINTR won't happen. And EBADF shouldn't happen in correctly written software.[1] Alternatively, you could choose to set FD_CLOEXEC, which doesn't even have an EINTR failure mode. Arguably dup2 could correctly be ignored--I have a hard time imagining a failure condition where dup'ing a descriptor over an already open stdio descriptor could fail, though that does depend on some assumptions and it's not something I would do anyhow.

Point being, explicit fork+exec could in some situations take less code than posix_spawn because you could elide some error checks. And I can't imagine a situation where it could take appreciably more code.

More importantly, though, is the point that posix_spawn doesn't solve threading race conditions. The only possibly plus in this regard is that posix_spawn will correctly block signals during the operation so that, e.g., a signal handler isn't wrongly called in the child but before exec.[2] Conspicuously missing, on the other hand, is the ability to set the umask in the child process. Setting or even querying the umask simply can't be done in a race-free manner in a threaded application, unless no other thread relies on the umask, or if you fork and report back the umask.

While there's nothing intrinsically wrong with using posix_spawn, it shouldn't be used for the wrong reasons. You still have to carefully consider the important stuff.

[1] EBADF invariably means you have a bug in your application, often a thread race or in single-threaded non-blocking I/O code an ordering issue. I refuse to ignore EBADF in my event loop and polling libraries (unlike libevent and similar libraries) despite people complaining to me how annoying it is to propagate it. Such a bug could easily lead to stalled network I/O. I'm convinced it's is a very common problem in non-blocking I/O networking daemons, but that its rare enough that people chaulk it up to network hiccups. So I propagate EBADF when manipulating a descriptor event because it's not the library's prerogative to hide such an error, and it can't possibly know whether the error is benign, recoverable, or panic-worthy. Though as with ENOMEM, library state remains consistent after the error so that recovery isn't foreclosed.

[2] pthread_sigmask has no failure mode when used correctly, so it's just two lines of condition-less code when using fork+exec. Though I learned a few years ago over on comp.unix.programmer that one should initialize a sigset_t object with sigemptyset before passing as the _output_ argument to pthread_sigmask and similar routines. Some implementations will logical-OR the signal set, rather than writing over the entire sigset_t object. See also http://pubs.opengroup.org/onlinepubs/9699919799/functions.... I admit this is one case where using posix_spawn has a clear benefit over fork+exec. I just don't think that in the grand scheme of things it amounts to much. Descriptor leakages and umask races, for example, are arguably far-and-above the bigger problem, especially from a security perspective, and posix_spawn provides no benefit and in some cases is more limited.

This is why we can't have safe cancellation points

Posted Apr 25, 2016 14:39 UTC (Mon) by nix (subscriber, #2304) [Link]

Arguably dup2 could correctly be ignored--I have a hard time imagining a failure condition where dup'ing a descriptor over an already open stdio descriptor could fail, though that does depend on some assumptions and it's not something I would do anyhow.

A brief glance at do_dup2() in the kernel (or, for that matter, at a sufficiently recent manpage) reveals that it can fail with -EBUSY if the file descriptor it's being asked to dup over is still being opened, so (just as with -EINTR) a retry loop would be needed for perfect safety in this situation.