This is why we can't have safe cancellation points

Posted Apr 15, 2016 19:40 UTC (Fri) by nix (subscriber, #2304)
In reply to: This is why we can't have safe cancellation points by luto
Parent article: This is why we can't have safe cancellation points

But this breaks all programs that assume it is enabled by default and then proceed to call pthread_cancel() on their own threads.

Changing longstanding defaults like this constitutes a break of userspace. You need a new -D flag (which, perhaps, sets a new ELF note, or simply triggers the linking in of a new crt1.o which flips the default) to ensure that this only happens to programs that are prepared for it.

This is why we can't have safe cancellation points

Posted Apr 15, 2016 19:49 UTC (Fri) by luto (guest, #39314) [Link] (9 responses)

Agreed. I'm not saying that changing the default is actually wise. But it might be enough of a simplification and a performance benefit to make it worthwhile.

This is why we can't have safe cancellation points

Posted Apr 15, 2016 20:19 UTC (Fri) by nix (subscriber, #2304) [Link] (8 responses)

A performance benefit? What performance cost is there to a single address comparison in a relatively rare path? (And as for complexity cost, well, hell, that sort of backward-compatibility burden is why systems get more complex over time, but that doesn't mean we can cavalierly throw users over the wall. codesearch.debian.net shows quite a lot of users, and yes, many of them are real users. :) )

(Now I'd agree that *asynchronous* cancellation is nearly impossible to program to and has an even smaller use case than synchronous cancellation, but even *it* is useful sometimes, particularly as a transient thing; e.g. when a thread that otherwise is synchronously cancellable is doing a long-running computation that it knows does no syscalls and can be safely unwound from the cleanup handler.)

This is why we can't have safe cancellation points

Posted Apr 15, 2016 20:34 UTC (Fri) by luto (guest, #39314) [Link] (7 responses)

It prevents syscall inlining. The impact is small but nonzero.

This is why we can't have safe cancellation points

Posted Apr 17, 2016 23:12 UTC (Sun) by nix (subscriber, #2304) [Link] (6 responses)

Hmmm. I see now -- when HJ's sycall inlining does turn up, or anything like it, you don't have one address to compare to any more, you have a great heap of them all across glibc, and there is obviously no way to do any similar comparison (I can think of ridiculously overdesigned ways to do it involving searching a tree of address ranges, but they'd all be *far* too slow and blow the dcache sky-high: just no).

Given that syscall inlining isn't something you can possibly turn on and off at runtime -- the inlining is, after all, into glibc, so you'd need multiple copies of glibc via hwcaps, which seems total overkill for this and would totally negate any saving via massive icache bloat -- you'd not be able to fix this by changing a *default*. You'd need to basically give up on fixing this race, or give up on fixing it this way, or break cancellation completely for everyone (a total non-starter).

Hmm. Too late at night, but I'll think on this. Either I have a niggling germ of a possible idea for a fix for this at the edge of my brain, or I'm just tired and hallucinating. (Or both!)

This is why we can't have safe cancellation points

Posted Apr 18, 2016 0:21 UTC (Mon) by luto (guest, #39314) [Link] (1 responses)

I think you could do it the way the kernel does the exception table -- just make a sorted list of pairs of starts and ends of cancellable regions. You only need to check it when your cancellation signal is delivered, and the data cache impact of *that* is basically irrelevant.

But you could do it by flipping the default if you're willing to accept a branch: just test the cancellable flag and jump out of line if needed. This is no worse than the existing musl thing in which each cancellable syscall needs to test the cancallable flag anyway to see if it needs to cancel even without a signal being sent.

This is why we can't have safe cancellation points

Posted Apr 18, 2016 10:28 UTC (Mon) by nix (subscriber, #2304) [Link]

Oh, I assumed you were necessarily taking the cost of the cancellable test anyway (the branch is near-zero cost in the common case, because it obviously has a prediction hint). Were you trying to avoid even that?

This is why we can't have safe cancellation points

Posted Apr 18, 2016 4:20 UTC (Mon) by neilbrown (subscriber, #359) [Link] (3 responses)

> you don't have one address to compare to any more

No, but you probably have one sequence of op-codes to compare.
The comparisons might be a little more complex than "memcmp" but could you not test "is this EIP value within a thunk" by comparing surrounding bytes against the standard thunk at each of the (very few) possible offsets?

This is why we can't have safe cancellation points

Posted Apr 18, 2016 10:27 UTC (Mon) by nix (subscriber, #2304) [Link] (2 responses)

Good point. And yes, you do have one sequence, it's extremely stereotyped, and it'll always be in the icache (at syscall entry, anyway). (On x86-64 -- on x86-32, this is irrelevant, because there, even the 'inlined' syscalls (INTERNAL_SYSCALL users) are still doing a GOT lookup and an indirect jump (on x86-32, anyway) to the vDSO syscall entry point.)

If you can get away with scanning for this only when cancellation is actually detected, it seems that the cost would be very low, though the complexity would obviously be higher than a simple address comparison, and it would tie that part of the kernel to these fairly fine and arch-dependent details of glibc's implementation, in a way that would probably not be spotted fast if it broke :(

This is why we can't have safe cancellation points

Posted Apr 18, 2016 11:10 UTC (Mon) by itvirta (guest, #49997) [Link] (1 responses)

> And yes, you do have one sequence, it's extremely stereotyped, and it'll always be in the icache

Stupid question: Does the instruction cache help if you're reading the instruction bytes as data?

This is why we can't have safe cancellation points

Posted Apr 20, 2016 16:56 UTC (Wed) by nix (subscriber, #2304) [Link]

Hm, no, but the L2+ caches are unified on many models (e.g. on all the Intel x86-64 CPUs I have access to, Nehalem and later), and getting stuff from L2 cache is still immensely faster than getting it from RAM, fast enough that you can often consider it free for applications like this.