LWN.net Logo

RealtimeKit and the audio problem

RealtimeKit and the audio problem

Posted Jul 5, 2009 18:55 UTC (Sun) by krasic (subscriber, #4782)
Parent article: RealtimeKit and the audio problem

My colleagues and I have been working on a new approach to general-purpose real-time scheduling that may be of interest to the Linux audio community.

I hope some developers here may find it interesting.

We are able to achieve ~1ms range scheduling response even during persistent 100% system load. This represented significant improvements over Linux's CFS and real-time schedulers (see paper and slide links below). That's the good news. The bad news is that we do introduce a new system call, and we require real-time applications to be modified to use the interface. In the paper, we describe how the interface is used in our own video player as well as a modified version of the X.Org X11 server.

How does our approach work (briefly?)

Our approach combines fair scheduling and real-time. We use a new system call "coop_poll()" that is used by application event loops to rendezvous with the kernel, and to share essential timing information (a "release-time") with the kernel scheduler. The kernel scheduler wakes the task up (i.e. return from coop_poll) at the application specified time. When returning, coop_poll provides timing information from the rest of the sytem (kernel and other cooperative tasks). And the whole cycle repeats---the thread later calls coop_poll to yield back to the kernel at appropriate time, in return the kernel scheduler resumes the thread very quickly when its own "release-time" arrives. Aside from short timeslices around release-time, the scheduler timeslices, divides CPU time, among tasks according to a fair queuing approach. Tasks that behave well (use the coop_poll interface as just described) will receive very low latency execution at release-times, and overall fair share of CPU over time otherwise. Our kernel scheduler provides enforcement, so a task that attempts to abuse the interface will lose low-latency execution, although it may regain it if the abuse is not persistent. All tasks receive (weighted) fair CPU allocation. There is no need for root or real-time privilidges.

We presented the work earlier this year at the EuroSys 2009 conference:

(slides)
http://eurosys2009.informatik.uni-erlangen.de/fileadmin/D...

(full paper info)
http://portal.acm.org/citation.cfm?id=1519065.1519077

Having taken a very brief look at Pulse, for example, I think it would be reasonable to modify the Pulse server to take advantage of a coop_poll enabled kernel.

-- Buck

Charles 'Buck' Krasic
Assistant Professor,
Computer Science
University of British Columbia
Vancouver, Canada


(Log in to post comments)

RealtimeKit and the audio problem

Posted Jul 6, 2009 9:53 UTC (Mon) by njs (guest, #40338) [Link]

I find your comment intriguing but somewhat puzzling. In my tests, I've never had trouble achieving ~1ms scheduling accuracy with stock kernels (at least since the soft real-time patches were first merged years ago), and -rt kernels are substantially better (~10us?). That's using SCHED_FIFO though, of course. Getting precisely timed wakeups out of the kernel *has* historically required somewhat obscure APIs, but this has gotten much easier over time (from /dev/rtc to real-time signals to timerfd to now, finally, hrtimer-enabled poll).

I can't seem to get your paper, but if the graphs for "RT" in your slides were generated using a real high-precision wakeup mechanism then I'm very surprised. Can you confirm what you used? IIRC for 2.6.25 you still have to use a rather complex signal-based approach.

Still, getting RT-like behavior without SCHED_FIFO is quite excellent. Is the idea that coop_poll() is a poll-like syscall with two additional properties: 1) it uses hrtimers for precise wakeups, 2) when a wakeup does occur, the process gets preferential treatment in waking up *immediately*, so long as this doesn't produce unfairness over the long term?

RealtimeKit and the audio problem

Posted Jul 7, 2009 22:21 UTC (Tue) by krasic (subscriber, #4782) [Link]

Hi NJS,

Yes, stock linux real-time responds very quickly, but it breaks down if there are multiple CPU intensive tasks at the same time. We give an example of that in our paper (generalized to various number of simultaneous tasks).

A simpler example (not from the paper) is as follows, imagine you run the Pulse Audio server with real-time (SCHED_FIFO, SCHED_RR), and an application such as a video conferencing app, also real-time. Both Pulse and the video app can be CPU intensive. Stock Linux real-time scheduling will not provide fast response to both apps while both are simultaneously active, as the timeslice/quantum is still very large under SCHED_FIFO, and SCHED_RR. Hence one of them can experience large delay.

As for wakeup, we use the hrtimer facility directly in our scheduler.

Your summery of coop_poll() is correct, except the application's responsibility is greater than "so long as this doesn't produce unfairness over the long term". Coop_poll() returns a time value back to the application that indicates when the application should call coop_poll() again. So it isn't just that the kernel provides preferential treatment by waking immediately, the application must also yield immediately when other applications' release-times arrive (well within a slack time).

In steady-state, "cooperative" tasks are never pre-emptively context switched by the kernel, instead they always rendezvous voluntarily with the kernel schedule in coop_poll().

-- Buck

ps Perhaps this link to our paper will work better: http://www.eecg.toronto.edu/~ashvin/publications/timely-s...

RealtimeKit and the audio problem

Posted Jul 8, 2009 0:07 UTC (Wed) by njs (guest, #40338) [Link]

Hi Buck,

Thanks for the clarifications. I really like coop_poll's semantics (not that my opinion matters much). The classic approach to this problem is "you need low latency so we'll give you high priority whoops but not *that* high priority". Saying instead "you keep the same priority (= available CPU time) but we'll let you request where to spend it and give you feedback on what you got" seems to capture what's going on here much better. (Though it won't help much if it turns out that pulseaudio does, in fact, need more than its fair share of the CPU.)

I'm still a bit confused about how classic RT fails, though. IIUC, under SCHED_FIFO there *is* no scheduling quantum -- processes run until they yield. SCHED_RR does have timeslice-based preemption, but I can't see how any app you might care about on the desktop would ever use it. In practice, an app like Pulseaudio or Ekiga or whatever is going to yield very often, after each slice of work. Under high load it will become runnable again very quickly, but it still gives any other RT apps a chance to take over. So the apps create your effective scheduling quantum, not the kernel.

RealtimeKit and the audio problem

Posted Jul 8, 2009 4:26 UTC (Wed) by krasic (subscriber, #4782) [Link]

Hi NJS,

This is a good discussion.

I agree that compute intensive, yet time sensitive, apps could "yield often" as you describe. Indeed, coop_poll relies on that too.

A couple of points about such yielding:

1) if it so, then the difference between SCHED_RR and SCHED_FIFO is moot, as the application would be setting the quantum as you observe;

2) coop_poll allows such applications to yield more intelligently, by providing the application with direct information about when it should yield.

Point #2 can mean far fewer yields/context switches, hence lower overhead and better responsiveness. In the paper we show that a purely periodic approach (i.e. yield often regardless of whether it is absolutely needed or not) has measurably worse performance; an experiment in the paper show this where coop_poll responsiveness is ~1ms vs periodic approach at ~5ms, while at the same time coop_poll has almost 5x fewer context switches. This is part of why I think SCHED_FIFO doesn't cut it. The issue is to divide time between short rapid timeslices for time sensitive events, and longer timeslices for longer, less time sensitive computations. coop_poll can and does do this. With SCHED_FIFO, a similar result could be achieved by dividing the application threads in a way that ensures that any heavy computation runs only in separate, non real-time threads, while time-sensitive actions are within a SCHED_FIFO thread. I think having fewer threads is architecturally preferable, but I guess it is a matter of personal taste.

Also, I'd point out that the share of CPU, e.g. with Pulse, is not that big an issue. Fair share schedulers support weighted fair sharing--e.g. in Linux, the nice value is translated into the fairshare weight (both with CFS and our scheduler). Thus giving a 'nice boost' to Pulse, and other servers such as the X11 server, is safe and arguably sensible thing to do.

-- Buck

RealtimeKit and the audio problem

Posted Jul 20, 2009 16:15 UTC (Mon) by iive (guest, #59638) [Link]

@krasic,
Your approach is heading in the right direction, but it is not there yet. It just implements precise sleep(). It needs one more step.

I'll try to explain it in the given audio context.
Audio cards capture and playback sound via DMA (Direct Memory Access). This means that the CPU is not pulling the card for every sample, it just gives some buffer, says "work here" and the card takes or puts memory values directly through the bus. When the card is done with the buffer it interrupts the cpu (irq) to request more work (actually it does it earlier to prevent underruns).

Now, imagine the following situation. Card plays, irq comes and says "need more", driver provides what it have. When irq handler is done, instead of restoring previously running task, we check if there is task that have open the device and is sleeping at poll/select. If it is, make so that task is immediately scheduled. The task is expected to feeds more data to the driver.

In that case the latency is derivate from the buffered data size. One benefit is that the task switching is almost free as it is done anyway. Second benefit is that only sleeping tasks are waken up (no infinite working). The biggest benefit is that this could work without real-time priority.
One problems is how would concurrent irq bound task operate (aka 2 different devices with 2 different tasks, interrupting each other). Other is that device drivers should communicate with scheduler when to task switch and when not to. The biggest problem is if the hardware device doesn't generate the needed irq, but in that case usually it is the driver that does the pulling (using timer).

You may notice that this whole paradigm is similar to device driver workqueues. In short the real time process is used as userspace driver. The even better thing is that the kernel already provides some api for userspace i/o drivers, including the irq handling, so this may be good starting point to implement that functionality, as they would need it too.

RealtimeKit and the audio problem

Posted Dec 20, 2010 16:41 UTC (Mon) by lievenmoors (guest, #53544) [Link]

Would be interesting to see JACK using
this cooperative scheduling scheme...

Now that realtime preemption is becoming
mainstream, it might be time to move on to
something new :-)

Very interesting paper btw...
Thanks for sharing.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds