LWN.net Logo

What is Real-Time?

What is Real-Time?

Posted Dec 28, 2004 23:15 UTC (Tue) by ccchips (subscriber, #3222)
In reply to: What is Real-Time? by dlang
Parent article: Where Is Real-Time Linux? (TechNewsWorld)

I have an example that's near to my heart, and about audio.

I use Csound in combination with FLTK to experiment with synthesis techniques. Sometimes I use it in console mode, without FLTK. The signals come from a MIDI keyboard.

It is very hard to get Linux to be ultra-responsive to my requests--that is, to give me instantaneous response from the keyboard---without getting "buffer underrun" errors from the software synthesizer. Ideally, I should be able to get the OS to appear *totally transparent* to my actions, instead of having to perform the music as if I were using an old pneumatic pipe organ....where there is a clearly perceptible delay between striking the keyboard and getting the sound from the pipes.

Now, as I understand it, a really fast CPU running MS-DOS should be able to provide the performance snappiness I need, or I should be able to get better response from Linux with the "low-latency patch."

I think the same sorts of issues come up with multi-track recording; in fact, I have noticed that csound, when used as a multi-track recorder from console mode, still gets both input and output buffer underruns. And again, the point here is that the kinds of audio applications I desire are by nature as close to *real-time* as you can get, because there are issues of multiple performers involved.

In short, I feel that audio and video production work really does test the *real-time* capabilities of an OS, and far more effectively than, say, recording a record to a .wav file and then playing it back after converting to MP3, etc.


(Log in to post comments)

What is Real-Time?

Posted Dec 29, 2004 0:21 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]

yes, if you are reading in, processing, and writign out you can't buffer things much (and buffering, if you can use it, is a WONDERFUL tool to relax the real-time requirements)

try turning on preemption, and take a look an Ingo's real-time hacks. from what I've been seeing on the kernel list he seems to have made huge gains in how well this works.

however there are a lot of applications that have their buffering set way too hight and even with the best OS timing in the world there will be a delay between the press of the key and the sound comeing out that matches the length of these buffers (when you shrink the buffers and start hearing gaps in the sound instead you know you have hit the limits)

there's also a lot of stuff that you can do if you are building a machine for this type of work to make it more likly to work well. fast drives (SCSI seems to work better then IDE, at least with the default configurations), pleanty of CPU and memory (the more headroom you have the less likly you are to run into problems). I've also heard people mention that some sound cards are far better then others for this sort of thing.

Response to events

Posted Dec 29, 2004 0:51 UTC (Wed) by jd (guest, #26381) [Link]

The responsiveness of the OS is, as you correctly say, a function of latency and the lower the latency the better. Yes, this is "real-time" - the monitor polls for activity every fixed interval, and if an event has occured, it runs the appropriate code. (In this case, it would monitor the MPU-401 and record every incoming MIDI event, as and when it occured.)

For something like that, you want extremely small time-slices (just enough to poll, and - if something happens - record a few bytes of data). This is fairly typical of the way a scientific monitoring device would be configured. For example, if you're running a particle detector on the end of a nuclear accelerator, the last thing you want is for the detector to be doing something else at the crucial moment.

For this reason, RTOS' for this kind of work tend to be extremely minimal, have virtually nothing not absolutely essential running on them, and often run on very specialized hardware. In science labs that do highly precise work, it's not unusual to see a dedicated VME machine running a version of VxWorks that's as pared to the bone as physically possible. While that would probably also work very nicely in a studio setting, it's probably a little hefty in the price-tag department for most.

Anyways, I'd suggest in your case that the TimeSys Linux patches may be useful to you. Most of the other real-time kernels only give real-time advantages to specially-written software. Unless you've already done so, I also strongly recommend upgrading to Linux kernel 2.6.9 or 2.6.10, as these tend to be better for real-time and precision work than the 2.4 series.

Thirdly, I suggest rolling your own kernel, rather than using a standard pre-built one. If you do this, pare the kernel to the bone. If it's not on your system, it shouldn't be in your kernel. Every time the kernel has to check if something's in use, or set-up a certain way, you're adding latency to the kernel. Oh, and no modules. Compile everything in. Modules are slower.

Fourthly, don't run software you're not using. If you're booting in text mode (runlevel 3), then go to /etc/rc.d/rc3.d and rip out everything you're not going to use. (You don't need sendmail to run MIDI software, for example.) Also visit /etc/inittab and disable any consoles you don't need. Every console takes memory and CPU resources. If you only need one, then only run one. It won't save a vast amount, but it'll save some.

Finally, if (and only if) you have lots of RAM installed, have your software save your recordings to ramdisk, rather than to physical disk. Whenever the buffers flush on a physical disk, it'll eat your CPU and spit out the bones. The program itself is less important, as once it is loaded, it is loaded. I'm not sure of how Linux currently handles virtual memory, but it might be worth switching off swapspace to prevent the OS from swapping to disk unnecessarily. How well that works, though, depends on whether or not the kernel assumes there's going to be swap present.

Response to events

Posted Dec 29, 2004 2:37 UTC (Wed) by mmarq (guest, #2332) [Link]

"... Oh, and no modules. Compile everything in. Modules are slower."

Isn't that because modules trend to fragment the kernel memory space ?

I know the point here is to get every drop of juice one can get, but in normal use, without real time requirements, an equal kernel 2.4.22-mm with everything compiled as modules as possible and with everything compiled in, the difference is not noteceable at 'clean eye' and time counting watch ( i didn't run benchmarks)... with 2.6 i havent done the experiment, but i guess the differences should be even much shorter because 2.6 has better granulity,i.e., has much better preemption, and has much better multithreading,... so memory space fragmentation pose a lesser impact.

To help my point, there seems to be good memory defrag patches out there waiting to get to main tree,... and that is that there aren't problems with modules for Real time applications, but instead the memory fragmentation and overhead( to many buffers and maps) needs a fix.


Response to events

Posted Dec 29, 2004 3:56 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]

memory fragmentation is one issue, but another issue is that on a monolithic kernel the compiler can do some slight optimizations on finction calls that it can't do if the function is modular (near vs far calls IIRC) not a huge difference by any means, but I've seen posts by people who should know what they're talking about who have said it may make a percent or two of difference in overall speed

Response to events

Posted Dec 29, 2004 18:55 UTC (Wed) by mmarq (guest, #2332) [Link]

" but I've seen posts by people who should know what they're talking about who have said it may make a percent or two of difference in overall speed "

Thanks for the tip, but never the less i belive that my point stands; 1% to 2% for a tweaked system is really very very short... and "i belive" ads to this the fact that implementing a 'as full as possible' preemtible kernel, defraging the kernel memory space, implementing AIO to the bone and improving kernel threading will make the gain percentage of compile tweaked systems lower to complete insignificance, because it will be a complete different way for a monolithic kernel to behave...

Response to events

Posted Dec 29, 2004 19:46 UTC (Wed) by jd (guest, #26381) [Link]

There's memory fragmenting, but also modules are far calls (which are slower). I'm not 100% certain, but they may also add other layers. For example, the kernel needs to check if the system call you are wanting to make is available (ie: the module is loaded) and, if so, where in memory that system call resides. This means you've a search, followed by at least an indirect (and more likely a double indirect) call to the required function.
<p>
I don't know the specifics of whatever optimizations the kernel programmers use, but if something is compiled in, you know it falls inside the kernel object, so you can likely do a near call. Also, since both the names and the locations of all the compiled-in symbols are known at compile-time, it should be much easier to look them up. Indeed, it should be possible to directly embed the addresses in the calls, so that they are all direct. (This would require some serious multi-pass compiling, though.) It would also be possible to re-order functions, so that frequent jumps were short wherever possible. This probably wouldn't apply too often, though.

Response to events

Posted Dec 29, 2004 2:55 UTC (Wed) by flewellyn (subscriber, #5047) [Link]

Switching off swap space would be an extreme, and unnecessary, measure. The proper (and much more convenient) method of doing this is to "lock" the pages your program is using into memory, both the code and the program buffers. The functions you want for this are "mlock(2)" and "munlock(2)". You can use "mlockall(2)" as well, if you need the entire process address space to be locked.

This, and writing the buffers to a ramdisk, could be a major help in reducing latency. (Another, of course, would be to make sure the encoder program is small enough to fit within the L1 or L2 cache of your processor, so that cache misses don't cause pipeline stalls.)

Response to events

Posted Dec 29, 2004 4:01 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]

I think his point was to eliminate any possibility of the system deciding to write to disk, which can freeze the system for the duration of the write on some IDE based systems.

yes it's extreme, but it may be appropriate if the results of failing to meet your deadline are severe enough (for example, a multi-hour recording session wasted becouse the recording isn't useable)

Response to events

Posted Dec 29, 2004 6:12 UTC (Wed) by flewellyn (subscriber, #5047) [Link]

If that's the case, you can call sched_setscheduler(2) to set the scheduler type to FIFO, and then set the absolute priority to 99. That way, NOTHING can interrupt the process, unless it does I/O of its own (I'm assuming that the recording would be using non-blocking reads on the input stream) or otherwise blocks. If nothing can preempt the process, then nothing gets paged to disk. Ever.

This probably still isn't enough for REALLY sensitive stuff, but there's the RTAI patches for that sort of work. There's also the caveat that whatever program runs at priority of 99 had better not have an infinite loop, or it will be unkillable.

Having a root shell at priority 99, and running the process at priority 98, would be a good safeguard: the root shell would usually block, and not do anything, since it would just be waiting for I/O. Only if it actually got some (in the form of a "kill -9") would it preempt the real-time process, and if you're killing it anyway, that's a moot point.

Response to events

Posted Dec 29, 2004 14:44 UTC (Wed) by russelst (guest, #24599) [Link]

There's a handy utility (schedutil) that calls sched_setscheduler() and makes this slightly more user friendly. If you've got multiple processes that are all quasi-RT, you might want to experiement with the RR scheduler option rather than FIFO. If you have lots of disk IO going on concurrently that is not integral with your RT process, you might also want to tune down your pagecache and dentry_cache values as well as use the mlock values specified elsewhere in this thread. Many of the truly ugly/long running/non-pre-emptible kernel bits are in the IO subsystem, so minimizing the impact of non-RT IO is key to success. Of course if your RT process is itself IO intensive, reducing pagecache is counter productive. Ingo's (and others) patches in 2.6.9 and 2.6.10 go a long way to reduce some of these latencies, but there's still much work remaining.

Response to events

Posted Dec 29, 2004 20:55 UTC (Wed) by flewellyn (subscriber, #5047) [Link]

Oh, yes. The Round Robin scheduler is indeed the way to go if you have multiple processes that need near-RealTime priority ("soft" RT, I think?). I was thinking of the case in which you had only one process which really needed that kind of guarantee. FIFO is the best way to go in that case, since FIFO does no timeslicing (unlike RR).

As for an I/O intensive RT process, I think mlock and output to a ramdisk (if you have enough memory) is the way to go: you minimize the use of the disk, and thus avoid the hangs on I/O. You can then write all the data to disk when the process completes. Again, I'm assuming a situation such as audio or video recording, where there is a definite "end" to the need for the process: once you've made the recording, the data can be written to disk without too much worry about response time, which is critical while the data is being read in.

Response to events

Posted Dec 29, 2004 19:27 UTC (Wed) by jd (guest, #26381) [Link]

Yeah, writing to the disk is Bad News, when things are time-sensitive. Even when using SCSI, you're using some CPU cycles to run the SCSI driver to get the data from memory onto the SCSI bus. There's no escaping it, unfortunately. IDE is many times worse, though, as many operations are blocking. Anything that blocks is Really Bad News and should be avoided at all costs.

You've also got to take into account the time the swap handler is using up. It needs to be monitoring swap events constantly to decide if anything needs to be swapped in or out. If swapping in, it needs to decide if something else needs to swap out to free enough space. If swapping out, it needs to collect the data, send it to disk, and record that it is in swap-space and where in swap-space it is.

All of this takes time. If you're wanting raw performance, then this time isn't being usefully spent. It also takes RAM to store all the code for this, which may be more usefully spent storing data. In addition, if you've a diskless system - I believe early TiVos fit in this category - there's really no point in having the overhead of a swap-to-disk routine.

Response to events

Posted Dec 29, 2004 8:57 UTC (Wed) by Wol (guest, #4433) [Link]

A couple of comments ...

Yup you do want the 2.6 kernel. One of the changes (needed for SMP, amongst other things) in 2.6 is that the "Big Kernel Lock" (the BKL) is slowly being eliminated. If this is invoked, it blocks everything except the kernel, so of course it plays havoc with realtime...

Modules ... I believe in the not-too-distant future, everything will be modules. And they won't be unloadable.

Swap. The most recent advice (issued for 2.4, and presumably valid for 2.6) is *don't* have a swap between nought and twice ram. If you can't have at least twice ram, then you're better off with nothing. The reason is that the basic design needs that to function efficiently, and if it hasn't got it then you get a degraded system with loads of error-check code getting hammered.

And all the various realtime/low-latency patches are slowly working their way into the kernel. The biggest problem is that parts of the kernel are still not re-entrant (hence the BKL), and this is taking quite a time to eliminate. Not surprising, as re-entrancy is a tricky problem.

Cheers,
Wol

Response to events

Posted Dec 29, 2004 9:02 UTC (Wed) by Wol (guest, #4433) [Link]

I forgot - IDE and SCSI ...

SCSI is best. The reason is that IDE is sequential - once linux sends a command to the disk controller, it has to wait for the controller to finish before it sends another command - and two disks on the same controller will get in each others' way ...

SCSI - linux can send as many commands, to as many devices, as it cares to. Rather than having to wait for a command to finish, linux can then go off and do something else, and wait for the device to send a response back saying "I've finished, here's the data".

So, for example, not only can linux optimise the order in which it sends requests to the drive, but a scsi drive can then further optimise based on stuff it knows about the drive but linux doesn't.

Cheers,
Wol

Response to events

Posted Dec 29, 2004 9:59 UTC (Wed) by freddyh (subscriber, #21133) [Link]

The delays the IDE module causes are around hundreds of microseconds. That's indeed probably not what you want...

Response to events

Posted Dec 30, 2004 0:53 UTC (Thu) by job (subscriber, #670) [Link]

Doesn't TCQ (NCQ in Seagate-speak) give the same advantage with IDE?

Response to events

Posted Dec 30, 2004 8:59 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

TCQ helps, but is not avery useable on standard IDE drives from what I've seen of comments on the kernel list, apparently it's implementation is not really standard enough to use.

apparently SATA drives are significantly better.

unfortunantly becouse of all the horrible IDE drives out there the default settings result in fairly poor performance, if you hook the same drives up to a really good controller (say the 3ware boards which claim to be SCSI to the kernel) you end up with significantly better performance so the problem isn't the IDE drives themselves as much as the driver interface and defaults nessasary to make things work reliably under all conditions.

Response to events

Posted Dec 29, 2004 22:17 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]

also, when you roll your own kernel definaantly optimize it for the specific CPU you have in the system. I've seen 30% speed improvements when going from distro kernels to custom compiled kernels, without going to heroic efforts (one thing I do always do is compile things in instead of useing modules, it's just so much easier to move the kerntl around when I don't have to deal with all the files for the modules)

What is Real-Time?

Posted Dec 29, 2004 21:57 UTC (Wed) by njhurst (guest, #6022) [Link]

Actually, pipe organs have a noticable delay even when electromechanically switched - it takes a small but noticeable delay to get the pipe resonating (called chiff). Check out aeolus for a really nice pipe organ synthesizer. (But yes, I agree, there should be less computer delay with linux midi :)

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds