@krasic,
Your approach is heading in the right direction, but it is not there yet. It just implements precise sleep(). It needs one more step.
I'll try to explain it in the given audio context.
Audio cards capture and playback sound via DMA (Direct Memory Access). This means that the CPU is not pulling the card for every sample, it just gives some buffer, says "work here" and the card takes or puts memory values directly through the bus. When the card is done with the buffer it interrupts the cpu (irq) to request more work (actually it does it earlier to prevent underruns).
Now, imagine the following situation. Card plays, irq comes and says "need more", driver provides what it have. When irq handler is done, instead of restoring previously running task, we check if there is task that have open the device and is sleeping at poll/select. If it is, make so that task is immediately scheduled. The task is expected to feeds more data to the driver.
In that case the latency is derivate from the buffered data size. One benefit is that the task switching is almost free as it is done anyway. Second benefit is that only sleeping tasks are waken up (no infinite working). The biggest benefit is that this could work without real-time priority.
One problems is how would concurrent irq bound task operate (aka 2 different devices with 2 different tasks, interrupting each other). Other is that device drivers should communicate with scheduler when to task switch and when not to. The biggest problem is if the hardware device doesn't generate the needed irq, but in that case usually it is the driver that does the pulling (using timer).
You may notice that this whole paradigm is similar to device driver workqueues. In short the real time process is used as userspace driver. The even better thing is that the kernel already provides some api for userspace i/o drivers, including the irq handling, so this may be good starting point to implement that functionality, as they would need it too.