GStreamer: Past, present, and future
GStreamer: Past, present, and future
Posted Oct 27, 2010 18:07 UTC (Wed) by alankila (guest, #47141)In reply to: GStreamer: Past, present, and future by elanthis
Parent article: GStreamer: Past, present, and future
I disagree on the need for 64 bits. 32-bit floats already have 23 bits of precision in the mantissa, and plenty of range in the exponent. Given the typical quantization to just 16 bits, it is hard to argue for the need of more intermediate precision.
I agree it's not necessarily the extra code that hurts me (although I do find gstreamer's modules to be pretty gory, and the use of ORC for a trivial volume scaler was astonishing); what I perceive as poor architecture hurts me. Especially the audio pipeline looks pretty strange to me as an audio person. The need to specify conversions explicitly is baffling. How does one even know that in the future some format doesn't get added or removed from a module, thus either requiring the addition of a conversion step, or making the specified conversion unnecessary and potentially even harmful?
I am convinced that a more ideal audio pipeline would automatically convert between buffer types where necessary and possible, and that audio processing would always imply promotion to some general high-fidelity type, be it integer or float (might be compile-time switch) so that at most there is one promotion and one demotion within any reasonable pipeline.
Posted Oct 27, 2010 19:23 UTC (Wed)
by wahern (subscriber, #37304)
[Link] (2 responses)
If the decoder produces audio which isn't in a raw format that the encoder can handle (wrong number of channels, sample rate, etc) than the controller transforms it before passing to the encoder. Of course, ideally both the encoder and decoder can handle the widest possible formats, because interleaving and resampling is incredibly slow, mostly because it takes up memory bandwidth, not because the CPU is overloaded in doing the conversions. But sometimes you have to resample or change the channels because that's how its wanted downstream, no matter that the encoder can handle it.
The server design can handle close to 20 unique transcoded streams per CPU on something like a Core2 (averaging 3-4% CPU time per stream)--the server doesn't use threads at all, each process is fully non-blocking with an event loop. (It can also reflect internally, which significantly increases the number of output streams possible.)
Systems which spin on gettimeofday--or rely on some other tight loop with fine grained timeouts--are retarded, too. There are various way to optimize clocking by being smart about how you poll and buffer I/O; you can usually readily gauge the relationship between I/O and samples. For example, a single AAC frame will always produce 1024 samples*. So even if the size of a particular frame isn't fixed, you can at least queue up so many frames in a big gulp, knowing how many seconds of audio you have, sleep longer, and then do a spurt of activity, letting the output encoder buffer on its end if necessary. If you need a tight timed loop to feed to a device, it should be in its own process or thread, separate from the other components, so it isn't hindering optimal buffering.
[*AAC can also produce 960 samples per frame, but I've never seen it in practice, but in any event its in the meta-data; MP3 encodes 384 or 1152 samples per frame; so If you know the sample rate and number of samples you know exactly how many seconds of compressed audio you have.]
My pipeline can do double or triple the work that FFmpeg, Vorbis, and others can handle, even though it's passing frames over a socket pair (the backend process decodes protocols, formats, and codecs; but encodes only to a specific codec; the front-end encodes to a particular format and protocol; I did this for simplicity and security). It's a shame because I'm no audiophile, and many of the engineers on those teams are much more knowledgeable about the underlying coding algorithms.
Adding video into the mix does add complexity, but you can be smart about it. All the same optimization possibilities apply; and synchronization between the audio and video streams isn't computationally complex by itself; it's all about being smart about managing I/O. Like I said earlier, pipelines should be separated completely from the player (which might need to drop or add filler to synchronize playback). It wouldn't be a bad idea at all to write a player which only knows how to playback RTSP, and then write a back-end pipeline which produces RTSP channels. That's a useful type of abstraction missing entirely from all the players I've seen. RTSP gives you only rough synchronization, so the back-end can be highly optimized. The client can then handle the find-grain synchronization. Overall you're optimizing your resources far better than trying to hack everything into one large callback chain.
Posted Oct 27, 2010 20:55 UTC (Wed)
by wahern (subscriber, #37304)
[Link]
Posted Oct 29, 2010 10:36 UTC (Fri)
by dgm (subscriber, #49227)
[Link]
Posted Oct 29, 2010 0:41 UTC (Fri)
by Spudd86 (subscriber, #51683)
[Link] (24 responses)
No, we cannot. Want to watch a DVD? Then you're dealing with 5.1 48KHz audio. DVD-Audio? Could be up to 192KHz. Blu-Ray? 7.1/5.1 and IIRC could be 96KHz. DVD-Video also allows 96KHz stereo.
And that's not even getting into stuff that's slightly less common than things almost everybody does at some point. (OK DVD-Audio doesn't really come up since there's no software players for it and pulseaudio currently caps it's sample rate at 96KHz so it has something to use as a maximum sample rate)
Posted Oct 29, 2010 14:28 UTC (Fri)
by nix (subscriber, #2304)
[Link] (20 responses)
What hardware would you play them back on?
Posted Oct 29, 2010 16:04 UTC (Fri)
by paulj (subscriber, #341)
[Link] (11 responses)
Posted Oct 29, 2010 18:34 UTC (Fri)
by alankila (guest, #47141)
[Link] (10 responses)
However, the idea of consumer-level 96 kHz audio (as opposed to 44.1 kHz audio) is pointless. It may sell some specialized, expensive equipment at high markup for people who are into that sort of thing, but there appear to be no practical improvements in the actual sound quality.
Posted Oct 29, 2010 23:32 UTC (Fri)
by dlang (guest, #313)
[Link] (9 responses)
I remember when the same statements were being made about video, how anything over 24Hz refresh rate was a waste of time because we had decades of studies that showed that people couldn't tell the difference.
Well, they found out that they were wrong there, at 24Hz people stopped seeing things as separate pictures and saw things as motion instead, but there are still benefits to higher refresh rates.
I think the same thing is in play on the audio side.
not everyone will be able to tell the difference, and it may even be that the mythical 'average man' cannot, but that doesn't mean that it's not worthwhile for some people. It also doesn't mean that people who don't report a difference in a test won't see a difference over a longer timeframe of useage (for example, going from 30Hz refresh rates to 80Hz refresh rates appears to decrease eye strain and headaches for people over long time periods, even for people who can't tell the difference between the two when they sit down in front of the two side by side.
Posted Oct 30, 2010 0:12 UTC (Sat)
by jspaleta (subscriber, #50639)
[Link] (1 responses)
Video framing on the other hand is relatively quite new...unless you count thumb powered flipbooks pen and paper animations.
-jef
Posted Oct 30, 2010 15:01 UTC (Sat)
by corbet (editor, #1)
[Link]
The other thing that nobody has pointed out: if you're sampling at 44KHz, you need a pretty severe low-pass filter if you want to let a 20KHz signal through. That will cause significant audio distortion at the upper end of the frequency range, there's no way to avoid it. A higher sampling rate lets you move the poles up much higher where you don't mess with stuff in the audio range.
That said, I'm not such an audiophile that I'm not entirely happy with CD-quality audio.
Posted Oct 30, 2010 14:42 UTC (Sat)
by alankila (guest, #47141)
[Link] (4 responses)
Your specific example "20 kHz signal playing with 44 kHz samples, and played at 96 kHz samples" is a particularly poorly example. I assume you meant a pure tone signal? Such a tone can be represented by any sampling with a sampling rate > 40 kHz. So, 44 kHz and 96 kHz are equally good with respect to representing that signal. If there is any difference at all favoring the 96 kHz system, it arises from relatively worse engineering involved with the 44 kHz system -- poorer quality of handling of frequencies around 20 kHz, perhaps -- and not from any intrinsic difference between the representations of the two signals themselves.
Many people seem to think---and I am not implying you are one---that the way digital signals are converted to analog output waveforms occurs as if linear interpolation between sample points were used. From this reasoning, it looks as if higher sampling rates were better, because the linearly interpolated version of 96 kHz signal would look considerably closer to the "original analog waveform" than its 44 kHz sampling interpolated the same way. But that's not how it works. Digital systems are not interpolated by fitting line segments, but by fitting sin waveforms through the sample points. So in both cases, the original 20 kHz sin() could be equally well reconstructed.
Posted Oct 30, 2010 15:04 UTC (Sat)
by corbet (editor, #1)
[Link] (3 responses)
I knew all those signal processing classes would come in useful eventually...
Posted Oct 31, 2010 11:27 UTC (Sun)
by alankila (guest, #47141)
[Link] (2 responses)
Posted Nov 2, 2010 4:02 UTC (Tue)
by Spudd86 (subscriber, #51683)
[Link] (1 responses)
Posted Nov 6, 2010 10:55 UTC (Sat)
by alankila (guest, #47141)
[Link]
Posted Nov 3, 2010 2:42 UTC (Wed)
by cmccabe (guest, #60281)
[Link] (1 responses)
Some people can hear it, some people can't. Unfortunately, the "can't" people designed the Red Book audio format, apparently. I forget the exact frequency at which it became inaudible.
P.S. A lot of people have hearing damage because they listen to music at a volume which is too loud. You need earplugs at most concerts to avoid this.
Posted Nov 3, 2010 21:03 UTC (Wed)
by paulj (subscriber, #341)
[Link]
Posted Oct 29, 2010 19:20 UTC (Fri)
by nicooo (guest, #69134)
[Link] (1 responses)
Posted Oct 31, 2010 13:10 UTC (Sun)
by nix (subscriber, #2304)
[Link]
Mice might need it too, for their supersonic squeaks of delight.
Perhaps... Douglas Adams was right?
Posted Oct 29, 2010 23:12 UTC (Fri)
by dlang (guest, #313)
[Link] (5 responses)
It's that using more samples to represent the data makes the resulting audio cleaner.
remember that you aren't recording the frequency, you are recording the amplitude at specific periods. the more samples you have, the cleaner the result.
Posted Oct 29, 2010 23:13 UTC (Fri)
by dlang (guest, #313)
[Link] (4 responses)
Posted Oct 30, 2010 0:09 UTC (Sat)
by gmaxwell (guest, #30048)
[Link] (3 responses)
Given unlimited precision samples a signal which has no energy above the the system nyquist is _perfectly_ re-constructable, not just "good".
If the signal does have energy above the nyquist then it's not "no hope": the system is under-determined and there are a number of possible reconstructions.
Of course, we don't sample with infinite precision but increasing the sampling rate is a fairly poor way of increasing the SNR for lower frequencies if thats your coal. For example, a 1 bit precision 3MHz process can give as much SNR in the 0-20kHz range as a 20 bit 48khz process but it takes about 3x the bitrate to do so.
24bit converters with >110dB SNR are readily and cheaply available. These systems can represent audio as loud as 'dangerously loud' with the total noise still dwarfed by the thermal noise in your ear and the room around you. It's effectively infinite precision. Heck, given reasonable assumptions (that you don't need enough dynamic range to cover hearing damage to the faintest discernible sounds) well mastered CDDA is nearly so too.
There has been extensive study of frequency extension into the ultrasonic, and none of the studies I've seen which weren't obviously flawed could support that hypothesis. If this perception exists it is so weak as to be unmeasurable even in ideal settings (much less your common listening environment which is awash in reflections, distortions, and background noise). There also is no real physiological basis to argue for the existence of significant ultrasonic perception Heck, if you're posting here you're probably old enough that hearing is mostly insignificant even at 18kHz (HF extension falls off dramatically the early twenties for pretty much everyone) much less higher.
But hey if you want to _believe_ I've got some dandy homeopathics to sell you.
Posted Oct 30, 2010 0:36 UTC (Sat)
by dlang (guest, #313)
[Link] (1 responses)
I disagree with this statement. something can be reproduced, but not neccessarily _perfectly_
also, any time you have more than one frequency involved, they are going to mix in your sensor, and so you are going to have energy above this frequency.
sampling faster may not be the most efficient way to get better SNR, but it's actually much easier to sample faster than to sample with more precision.
using your example, setting something up to sample 1 bit @ 3MHz may be far cheaper than setting up something to sample 20 bits @ 48KHz. In addition, the low-precision bitstream may end up being more amenible to compression than the high precision bitstream. with something as extreme as the 1bit example, simple run-length encoding probably will gain you much more than a 3x compression ratio. That's not to say that a more sophisticated , lossy, compression algorithm couldn't do better with the 20 bit samples, but again, which is simpler?
I am in no way saying that people hear in the ultrasonic directly, However I am saying that some people listening to a 15KHz sine wave vs a 15KHz square wave will be able to hear a difference.
Posted Oct 30, 2010 14:21 UTC (Sat)
by alankila (guest, #47141)
[Link]
This may be confusing two ways to look at it: as mathematical issue, or as engineering problem. Mathematically the discrete representation and the analog waveform are interchangeable: you can get from one to the other. The quality of the conversion between the two can be made as arbitrarily high as you desire -- typically design targets are set beyond assumed limits of human perception.
>also, any time you have more than one frequency involved, they are going to mix in your sensor, and so you are going to have energy above this frequency.
Intermodulation distortion can generate extra tones, and depending on how strong the effect is, they may even matter. Such nonlinearities do not need more than one frequency, though.
This is normally an undesirable artifact, and our ADC/DACs have evolved to a point where they are essentially perfect with respect to this problem. In any case, from viewpoint of a digital system, artifacts that occurred in the analog realm are part of the signal, and are processed perfectly once captured.
> I am in no way saying that people hear in the ultrasonic directly, However I am saying that some people listening to a 15KHz sine wave vs a 15KHz square wave will be able to hear a difference.
The amusing thing is that a 44.1 kHz representation of a 15 kHz square wave will look identical to a 15 kHz sin wave, because none of the pulse's harmonics are within the passband of the system. Do you happen to have a reference where a system such as this was tested with test subjects so that it would be possible to understand how such a test was conducted?
Posted Oct 30, 2010 16:27 UTC (Sat)
by magnus (subscriber, #34778)
[Link]
In practice though, audio signals will have some information (harmonics etc) at higher frequencies and no filters (not even digital ones) can be perfectly brick-wall shaped, so some aliasing will occur plus you will have some attenuation below the Nyqvist frequency. Sampling at 96 kHz might (if well designed) give you a lot more headroom for these effects.
I have no experience with 96 kHz audio so I don't know if this is actually audible or just theory+marketing.
Since human hearing is non-linear it's also possible that people can pick up harmonics at higher frequencies even if they can't hear beeps at these frequencies. The only way to know is double blind-testing I guess...
Posted Oct 29, 2010 18:14 UTC (Fri)
by alankila (guest, #47141)
[Link] (2 responses)
I was talking about performance. Nobody expects a mobile phone to spit out a 7.1 stream, ac3 or not, or whatever. I believe my point was that I wanted to argue for the case of simplified internal pipeline of gstreamer, where special case formats could be removed and replaced with more general ones. Your 7.1 192 kHz streams could be just 8 32-bit floating point channels for the purposes of the discussion, but I predict that you'd have severe difficulties transmitting those channels to amplifier.
See? This is not a point that is really worth discussing.
Posted Nov 2, 2010 3:57 UTC (Tue)
by Spudd86 (subscriber, #51683)
[Link] (1 responses)
You CAN'T just say 'all audio is 16bit@44.1KHz' because it simply is not the case, 48KHz audio exists, as does 24 bit audio, some people by expensive sound cards to get these sorts of things, and you want to tell them they can't have it?
All I was objecting too is the first bit.
Getting to the rest of your post:
Of COURSE nobody expects their mobile phone to spit out 24bit 192KHz 7.1 channel audio, but some people DO expect it from their desktops, GStreamer is used it a very wide variety of places, and some of them need things your phone doesn't, some of them need things you don't ever need, but that's not a reason for GStreamer to not support them.
Certainly 32 bit float is the most (more in fact) sample resolution you'll ever need in a storage format... but GStreamer is sometimes used in a processing pipeline so it MAY at some point have a use for doubles... probably not though.
ORC is a perfectly reasonable thing to use for a simple volume scaler, especially on something like a mobile phone where CPU time might be at a premium.
I think part of the redesign was to make the format negotiation better and more automatic, however, avoiding conversions is always a good idea (hey large chunks of pulseaudio code are dedicated to doing as few conversions as possible, because of phones and embedded stuff, and even on a desktop rate conversions add error every time you do one since the bandlimiter isn't perfect so it introduces aliasing and noise every time it's run, good ones don't introduce much, but they are expensive to compute even on a desktop)
Posted Nov 6, 2010 11:08 UTC (Sat)
by alankila (guest, #47141)
[Link]
I do not have a principal objection to using a different sampling rate or number of channels. It's just that there are useful gains to be had from limiting the number of sample formats. As an example, processing 16-bit integer audio with the volume plugin will currently cause quantization, because the volume plugin does not do dithering.
And when I said that 44.1 kHz and 16 bits, I was talking about mobile context, I admit android flashed through my mind. Did you know that it does not even support any other output format at all? For a mobile device, it is an entirely reasonable output format, and given its other constraints it should be extremely well supported because it's simply the most important input and output format. As we learnt in this thread, N900 people made a ridiculous mistake with selecting audio hardware that apparently uses native sample rate of 48 kHz because that will force them to do resampling for vast majority of world's music. It is possible to do, but doesn't really strike me as especially smart thing to have done.
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
For all of our experience with audio, there was a small subset of us who were driven absolutely nuts by the weird high-pitched chirper things that the Japanese seem to like to put into doorways for whatever reason. Everybody else wondered what we were griping about. Some people hear higher than others.
GStreamer: Past, present, and future
GStreamer: Past, present, and future
Sinc waveforms, actually (sin(θ)/θ) :)
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
There is lots of misinformation on this subject out there.
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future
Given unlimited precision samples a signal which has no energy above the the system nyquist is _perfectly_ re-constructable, not just "good".
Theoretically, you don't only need unlimited precision on each sample, you also need to have an infinite number of samples, from time -∞ to +∞, to perfectly reconstruct the original signal.
GStreamer: Past, present, and future
GStreamer: Past, present, and future
GStreamer: Past, present, and future