Yes, stock linux real-time responds very quickly, but it breaks down if there are multiple CPU intensive tasks at the same time. We give an example of that in our paper (generalized to various number of simultaneous tasks).
A simpler example (not from the paper) is as follows, imagine you run the Pulse Audio server with real-time (SCHED_FIFO, SCHED_RR), and an application such as a video conferencing app, also real-time. Both Pulse and the video app can be CPU intensive. Stock Linux real-time scheduling will not provide fast response to both apps while both are simultaneously active, as the timeslice/quantum is still very large under SCHED_FIFO, and SCHED_RR. Hence one of them can experience large delay.
As for wakeup, we use the hrtimer facility directly in our scheduler.
Your summery of coop_poll() is correct, except the application's responsibility is greater than "so long as this doesn't produce unfairness over the long term". Coop_poll() returns a time value back to the application that indicates when the application should call coop_poll() again. So it isn't just that the kernel provides preferential treatment by waking immediately, the application must also yield immediately when other applications' release-times arrive (well within a slack time).
In steady-state, "cooperative" tasks are never pre-emptively context switched by the kernel, instead they always rendezvous voluntarily with the kernel schedule in coop_poll().