This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
Posted Apr 14, 2016 7:51 UTC (Thu) by khim (subscriber, #9252)In reply to: This is why we can't have safe cancellation points by mjthayer
Parent article: This is why we can't have safe cancellation points
Posted Apr 14, 2016 19:40 UTC (Thu)
by luto (guest, #39314)
[Link] (25 responses)
So musl could (and, AFAICT, does) use int $0x80 only for cancellable syscalls.
(FWIW, the actual meat of the patch I wrote was fine, I think. The issue was that building vdso code at all is a giant mess and I broke the build system.)
Posted Apr 14, 2016 21:18 UTC (Thu)
by khim (subscriber, #9252)
[Link] (23 responses)
Posted Apr 14, 2016 21:28 UTC (Thu)
by luto (guest, #39314)
[Link] (22 responses)
Perhaps someone should attempt to change the standard to work the other way (default is PTHREAD_CANCEL_DISABLE and PTHREAD_CANCEL_DEFERRED). After all, no sensible program uses cancellation, so why make them pay the price?
Posted Apr 14, 2016 22:30 UTC (Thu)
by khim (subscriber, #9252)
[Link]
POSIX had that requirement for the last 20 years or so, I'm afraid it's too late to change it. If someone wants to introduce drastic, potentially disruptive, change to POSIX then it would be significantly more sane to just make them optional in POSIX and remove them from libraries like Musl and GLibc, don't you think?
Posted Apr 15, 2016 0:04 UTC (Fri)
by neilbrown (subscriber, #359)
[Link] (6 responses)
This is the part of the story that didn't make much sense to me. Why do you think cancellation is such a bad idea?
Posted Apr 15, 2016 0:11 UTC (Fri)
by luto (guest, #39314)
[Link] (5 responses)
AFAICT the only way to use it safely is to have cancellation off *except* at very carefully selected points and to turn it on at those points. Every cancellation point then needs to be aware that the thread can go away without unwinding.
ISTM any code that actually does this would be better off using ppoll, etc.
Posted Apr 15, 2016 0:38 UTC (Fri)
by neilbrown (subscriber, #359)
[Link] (4 responses)
Surely I can:
Then if I ever get canceled, everything will be cleaned up nicely.
I would need to disable cancellation while manipulating a data structure shared with other threads, but I see cancellation more as being appropriate for largely independent threads.
What specific risks do you see if cancellation is mostly enabled?
Posted Apr 15, 2016 7:21 UTC (Fri)
by khim (subscriber, #9252)
[Link] (2 responses)
I think the problem is simple inefficiency. Cancellation support is not free - even if it's not used. And even your "simple" scheme includes many steps and couldn't arrive in a random program by accident. Surely if you change a design of your program that much to make it possible to use cancellation you could as well go and create wrapper for pthread_create which will call pthread_setcancelstate(p), too?
Posted Apr 15, 2016 8:00 UTC (Fri)
by neilbrown (subscriber, #359)
[Link] (1 responses)
Specifically? The solution used by musl costs almost nothing except on x86_32 and the change to make it work well on x86_32 has zero extra performance cost.
> And even your "simple" scheme includes many steps and couldn't arrive in a random program by accident.
I'm failing to parse that... Certainly you wouldn't put any code in any program by accident (I hope) ??
> Surely if you change a design of your program that much to make it possible to use cancellation you could as well go and create wrapper for pthread_create which will call pthread_setcancelstate(p), too?
I fail to see how this would solve anything at all.
Many applications never cancel any threads. They are irrelevant. They need do nothing and they suffer no cost (maybe a couple of instructions per syscall. If you can't afford that, hand-code your systemcalls).
Some applications do find value in the ability to cancel threads. Those threads clearly need to be prepared to be canceled. Being prepared is not zero work, but it is not too onerous.
If the thread is not doing any resource allocation, maybe just computing pi to a few million bits, then it can deliberately request async cancellation and go about its business.
Which makes for nice clean code with the certainty that the cleanup handler will run even if the thread is canceled.
Posted Apr 15, 2016 15:10 UTC (Fri)
by khim (subscriber, #9252)
[Link]
99.9% of all programs (and I've picked conservative number) are irrelevant? That's novel idea to me. When proportion is this skewed even these two instructions make no sense: why should 99.9% of all the apps suffer at all if this could be avoided? The natural response would: because I could just take bits and pieces from these 99.9% apps and use these to build these rare few apps which do use cancellation. But as you've shown you couldn't just take random working code from working library, plug it in a program which uses cancellation and hope that the end result would work.
ALL code must be carefully designed in such a program. And if ALL code is specifically written for such a program then additional burden of adding couple of pthread_setcancelstate calls here and there wouldn't be large at all! The argument that "hey, I don't know where and how threads are created in this large program" wouldn't fly: if you don't know even that much about your program/library/whatever then how could you be sure that you control is enough to even try to attempt to use cancellation of threads? Sure. But if that's called "a little care" then "you also need to call pthread_setcancelstate(PTHREAD_CANCEL_ENABLE) in each thread" wouldn't a large problem...
Posted Apr 15, 2016 17:17 UTC (Fri)
by nix (subscriber, #2304)
[Link]
I've had problems with the multithreading in that code, but they were all races associated with mutexes and condition variables. The nature of synchronous cancellation has caused me zero problems.
Posted Apr 15, 2016 19:07 UTC (Fri)
by ballombe (subscriber, #9523)
[Link] (13 responses)
The fact that something is broken in some corner case does not make is useless in other case.
Posted Apr 15, 2016 19:16 UTC (Fri)
by luto (guest, #39314)
[Link] (12 responses)
pthread cancellation is very dangerous, is useful only for specialized cases, and IMO should never have been enabled by default.
If it were simply disabled by default, then this performance issue would be irrelevant.
Posted Apr 15, 2016 19:40 UTC (Fri)
by nix (subscriber, #2304)
[Link] (10 responses)
Changing longstanding defaults like this constitutes a break of userspace. You need a new -D flag (which, perhaps, sets a new ELF note, or simply triggers the linking in of a new crt1.o which flips the default) to ensure that this only happens to programs that are prepared for it.
Posted Apr 15, 2016 19:49 UTC (Fri)
by luto (guest, #39314)
[Link] (9 responses)
Posted Apr 15, 2016 20:19 UTC (Fri)
by nix (subscriber, #2304)
[Link] (8 responses)
(Now I'd agree that *asynchronous* cancellation is nearly impossible to program to and has an even smaller use case than synchronous cancellation, but even *it* is useful sometimes, particularly as a transient thing; e.g. when a thread that otherwise is synchronously cancellable is doing a long-running computation that it knows does no syscalls and can be safely unwound from the cleanup handler.)
Posted Apr 15, 2016 20:34 UTC (Fri)
by luto (guest, #39314)
[Link] (7 responses)
Posted Apr 17, 2016 23:12 UTC (Sun)
by nix (subscriber, #2304)
[Link] (6 responses)
Given that syscall inlining isn't something you can possibly turn on and off at runtime -- the inlining is, after all, into glibc, so you'd need multiple copies of glibc via hwcaps, which seems total overkill for this and would totally negate any saving via massive icache bloat -- you'd not be able to fix this by changing a *default*. You'd need to basically give up on fixing this race, or give up on fixing it this way, or break cancellation completely for everyone (a total non-starter).
Hmm. Too late at night, but I'll think on this. Either I have a niggling germ of a possible idea for a fix for this at the edge of my brain, or I'm just tired and hallucinating. (Or both!)
Posted Apr 18, 2016 0:21 UTC (Mon)
by luto (guest, #39314)
[Link] (1 responses)
But you could do it by flipping the default if you're willing to accept a branch: just test the cancellable flag and jump out of line if needed. This is no worse than the existing musl thing in which each cancellable syscall needs to test the cancallable flag anyway to see if it needs to cancel even without a signal being sent.
Posted Apr 18, 2016 10:28 UTC (Mon)
by nix (subscriber, #2304)
[Link]
Posted Apr 18, 2016 4:20 UTC (Mon)
by neilbrown (subscriber, #359)
[Link] (3 responses)
No, but you probably have one sequence of op-codes to compare.
Posted Apr 18, 2016 10:27 UTC (Mon)
by nix (subscriber, #2304)
[Link] (2 responses)
If you can get away with scanning for this only when cancellation is actually detected, it seems that the cost would be very low, though the complexity would obviously be higher than a simple address comparison, and it would tie that part of the kernel to these fairly fine and arch-dependent details of glibc's implementation, in a way that would probably not be spotted fast if it broke :(
Posted Apr 18, 2016 11:10 UTC (Mon)
by itvirta (guest, #49997)
[Link] (1 responses)
Stupid question: Does the instruction cache help if you're reading the instruction bytes as data?
Posted Apr 20, 2016 16:56 UTC (Wed)
by nix (subscriber, #2304)
[Link]
Posted Apr 16, 2016 13:52 UTC (Sat)
by ballombe (subscriber, #9523)
[Link]
Yes, so ? This is a static property of the code.
Posted Apr 19, 2016 14:40 UTC (Tue)
by mjthayer (guest, #39183)
[Link]
Posted Apr 18, 2016 17:11 UTC (Mon)
by quotemstr (subscriber, #45331)
[Link]
This is why we can't have safe cancellation points
Uhm. Perhaps I'm misreading something but AFAICS the cancelability state and type of any newly created threads, including the thread in which main() was first invoked, shall be PTHREAD_CANCEL_ENABLE and PTHREAD_CANCEL_DEFERRED respectively means exactly what I wrote: you don't enable cancellation, you just use it. You can disable it, sure - but that's not default.
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
I appreciate that it only gets about a 3 or 4 on Rusty's API Design scale but they provide an extremely light-weight mechanism to protect threads from dying at awkward moments.
The only alternative I can see is for an application to use an ad-hoc signaling mechanism and for threads to only use non-blocking versions of 'accept' and other interfaces that allocate resources.
Apart from wheel-reinvention, that would be an interface that libraries couldn't share.
You and Rich have made it clear that cancellation can be implemented correctly and efficiently. So there isn't really any price to be paid. Let's just do it and move on ???
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
1/ create a data structure that contains a list of all resources I might hold (file descriptor, byte range locks).
2/ register a cleanup handler which walks that data structure and frees everything.
3/ write simple wrappers for open/accept/whatever which record the results in the data structure
4/ just call those wrappers, never the bare API.
This is why we can't have safe cancellation points
What specific risks do you see if cancellation is mostly enabled?
This is why we can't have safe cancellation points
The random program probably never cancels threads so it wouldn't want these steps anyway.
If the thread is allocating resources then it naturally needs to make sure they get de-allocated. Any code already needs to worry about this. Code that can be canceled needs to do maybe 10% more work.
It can disable cancellation over a short allocate/use/deallocate sequence that won't block. Or it can register a cleanup helper and record the allocation in some array or something.
If your allocations follow a strict LIFO discipline you can even
- alloc
- push cleanup handler
- use the allocation
- pop the cleanup handler
The point of deferred cancellation is that this can be done with no locking, no extra system calls. It just needs a little care - like not logging any messages between the allocation and pushing the cleanup handler.
This is why we can't have safe cancellation points
Many applications never cancel any threads. They are irrelevant.
They need do nothing and they suffer no cost (maybe a couple of instructions per syscall. If you can't afford that, hand-code your systemcalls).
It just needs a little care - like not logging any messages between the allocation and pushing the cleanup handler.
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
The trick is to have the parent allocate and free all resources in advance, and use robust
data structures like stacks.
If C++ is broken, do not use it for thread.
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
The comparisons might be a little more complex than "memcmp" but could you not test "is this EIP value within a thunk" by comparing surrounding bytes against the standard thunk at each of the (very few) possible offsets?
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points
This is why we can't have safe cancellation points