GStreamer: Past, present, and future

By Jake Edge
October 26, 2010

Longtime GStreamer hacker Wim Taymans opened the first-ever GStreamer conference with a look at where the multimedia framework came from, where it stands, and where it will be going in the future. The framework is a bit over 11 years old and Taymans has been working on it for ten of those years, as conference organizer Christian Schaller noted in his introduction. From a simple project that was started by Eric Walthinsen on an airplane flight, GStreamer has grown into a very capable framework that is heading toward its 1.0 release—promised by Taymans by the end of 2011.

Starting off with the "one slide about what GStreamer is", Taymans described the framework as a library for making multimedia applications. The core of the framework, which provides the plugin system for inputs, codecs, network devices, and so on, is the interesting part to him. The actual implementations of the plugins are contained in separate plugin libraries with a core-provided "pipeline that allows you to connect them together".

Some history

When GStreamer was started, the state of Linux multimedia was "very poor". XAnim was the utility for playing multimedia formats on Linux, but it was fairly painful to use. Besides GStreamer, various other multimedia projects (e.g. VLC, Ogle, MPlayer, FFmpeg, etc.) started in the 1999/2000 timeframe, which was something of an indication of where things were. The competitors were well advanced as QuickTime had appeared in 1991 and DirectShow in 1996. Linux was "way behind", Taymans said.

GStreamer's architecture came out of an Oregon Graduate Institute research project with some ideas from DirectShow (but not the bad parts) when the project was started in 1999. Originally, GStreamer was not necessarily targeted at multimedia, he said.

The use cases for GStreamer are quite varied, with music players topping the list. Those were "one of the first things that actually worked" using GStreamer. Now there are also video players (which are moving into web browsers), streaming servers, audio and video editors, and transcoding applications. One of the more recent uses for GStreamer, which was "unpredicted from my point of view", is for voice-over-IP (VoIP) and both the Empathy messaging application and Tandberg video conferencing application are using it.

After the plane flight, Walthinsen released version 0.0.1 in June 1999. By July 2002, 0.4.0 was released with GNOME support, though it was "very rough". In February 2003, 0.6.0 was released as the first version where audio worked well. After a major redesign to support multi-threading, 0.10.0 was released in December 2005. That is still the most recent major version, though there have been 30 minor releases, and 0.10.31 is coming soon. 0.10.x has been working very well, he said, which raises the question about when there will be a 1.0.

To try to get a sense for the size of the community and how it is growing, Taymans collected some statistics. There are more than 30 core developers in the project along with more than 200 contributors for a codebase that is roughly 205K lines of code. He also showed various graphs of the commits per month for the project and pointed a spike around the time of the redesign for 0.10. There was also a trough at the point of the Git conversion. As expected, the trend of the number of commits per month rises over the life of the project.

In order to confirm a suspicion that he had, Taymans made the same graph for just the core, without the plugins, and found that commits per month has trailed off over the last year or so. The project has not been doing much in the way of new things in the core recently and this is reflected in the commit rate. He quoted Andy Wingo as an explanation for that: "We are in 'a state of decadence'".

When looking at a graph in the number of lines of code, you can see different growth rates between the core and plugins as well. The core trend line is a flat, linear growth rate. In contrast, the trend line for the plugins shows exponential growth. This reflects the growing number of plugins, many of which are also adding new features, while the core just gets incremental improvements and features.

The current state

Taymans then spent some time describing the features of GStreamer. It is fully multi-threaded now; that code is stable and works well. The advanced trick mode playback is also a high point, and it allows easy seeking within audio and video streams. The video editing support is coming along, while the RTP and streaming support are "top notch". The plugins are extensive and well-tested because they are out there and being used by lots of people. GStreamer is used by GNOME's Totem video player, which puts it in more hands. "Being in GNOME helps", he said.

The framework has advanced auto-plugging features that allow for dynamic pipeline changes to support a wide variety of application types. It is also very "binding friendly" as it has bindings for most languages that a developer might want to use. Developers will also find that it is "very debuggable".

There are many good points with the 0.10 codebase, and he is very happy with it, which is one of the reasons it has taken so long to get to a 1.0 release. The design of 0.10 was quite extensible, and allowed many more features to be added to it. Structures were padded so that additional elements could be added for new features, without breaking the API or ABI. For example, the state changes and clocks handling code was rewritten during the 0.10 lifetime. The developers were also able to add new features like navigation, quality of service, stepping, and buffering in 0.10.

Another thing that GStreamer did well was to add higher-level objects. GStreamer itself is fairly low-level, but for someone who just wants to play a file, there are a set of higher-level constructs to make that easy—like playbin2, for playing video and audio content, and tagreadbin to extract media metadata. The base classes that were implemented for 0.10, including those that have been added over the last five years, are also a highlight of the framework. Those classes handle things like sinks, sources, transforms, decoders, encoders, and so on.

There are also a number of bad points in the current GStreamer. The current negotiation of formats, codecs, and various other variable properties is too slow. The initial idea was to have a easy and comprehensible way to ask an object what it can do. That query will return the capabilities of the object, as well as the capabilities of everything that it is connected to, so the framework spends a lot of time generating a huge list of capabilities. Those capabilities are expressed in too verbose of a format in Taymans's opinion. Reducing the verbosity and rethinking the negotiation API would result in major performance gains.

The "biggest mistake of all" in GStreamer is that there is no extensible buffer metadata. Buffers are passed between the GStreamer elements, and there is no way to attach new information, like pointers to multiple video planes or information to handle non-standard strides, to those buffers. There also need to be generic ways to map the buffer data to support GPUs and DSPs, especially in embedded hardware. It is very difficult to handle that with GStreamer currently and is important for embedded performance.

While dynamic pipeline modifications work in the current code, "the moment you try it, you will suffer the curse of new segments", Taymans said. Those can cause the application to lose its timing and synchronization, and it is not easy to influence the timing of a stream, so it is difficult for an application to recover from. The original idea was that applications would create objects that encapsulated dynamic modifications, but that turned out not to be the case. There are also a handful of minor problems with 0.10, including an accumulation of deprecated APIs, running out of padding in some structures, and it becoming harder to add new features without breaking the API/ABI.

A look to the future

To address those problems, Taymans laid out the plans for GStreamer for the next year or so. In the short term, there will be a focus on speeding up the core, while still continuing to improve the plugins. There are more applications trying to do realtime manipulation of GStreamer pipelines, so it is important to make the core faster to support them. Reducing overhead by removing locks in shared data structures will be one of the ways used to increase performance.

In the medium term, over the next 2 or 3 months, Taymans will be collecting requirements for the next major version. The project will be looking at how to fix the problems that have been identified, so if anyone "has other problems that need fixing, please tell me". There will also be some experimentation in Git branches for features like adding extensible buffer metadata.

Starting in January, there will be a 0.11 branch and code will be merged there. Porting plugins to 0.11 will then start, with an eye toward having them all done by the end of 2011. Once the plugins have started being ported, applications will be as well. Then there will a 1.0 release near the end of 2011, not 2010 as was announced in the past. "This time we'll do it, promise". Taymans concluded his talk with a joking promise that "world domination" would then be the result of a GStreamer 1.0 release.

Index entries for this article
Conference	GStreamer Conference/2010

GStreamer: Past, present, and future

Posted Oct 26, 2010 23:18 UTC (Tue) by alankila (guest, #47141) [Link] (47 responses)

I'm not too impressed with gstreamer. It has difficulties to play vorbis here on ubuntu 10.10 due to some completely unknowable and unfathomable reason. It may well be pulseaudio's fault, though.

More to the point, I saw that something simple like volume control on the audio pipeline contains something like 30 kB of code because it handles every possible format like 8, 16, 24, 32 bit integer audio and 32-bit and 64-bit audio. Additionally I think there was some dynamic runtime code generation involved. So much of that 200k LOC might actually go away if the design was simplified somewhat.

Also when trying to hook a simple LADSPA plugin of mine into pipeline using gstreamer I encountered various random issues, such as the need to use audioconvert (which ought to be injected automatically where necessary, lest people have to pepper their pipelines full of that) and queue (some kind of buffer, many things don't work without explicit buffering for some reason).

Even worse, since gstreamer can't support stereo ladspa plugins in any way, one is forced to implement these manually by deinterleaving (= splitting) a stream into separate pipes, then putting them through the plugin using the appropriate sprinking of audioconvert / queue, and then assembling it back with interleave. Of course, the result is not usable to pulseaudiosink because the channel assignment is lost along the way with no way to define it on command line...

In short, I think this media framework is not quite the success story it is portrayed to be, IME.

GStreamer: Past, present, and future

Posted Oct 26, 2010 23:40 UTC (Tue) by gmaxwell (guest, #30048) [Link] (41 responses)

"It has difficulties to play vorbis here on ubuntu 10.10 due to some completely unknowable and unfathomable reason."

It may be less unknowable if you'd describe your problem in detail perhaps open a ticket in the Ubuntu bug tracker. :)

I've never encountered a tool that allowed you to use LADSPA plugins without a lot of technical fuss. Gstreamer is certainly not perfect but its strengths and weaknesses should be evaluated in comparison to the alternatives.

GStreamer: Past, present, and future

Posted Oct 27, 2010 8:42 UTC (Wed) by marcH (subscriber, #57642) [Link]

> Gstreamer is certainly not perfect but its strengths and weaknesses should be evaluated in comparison to the alternatives.

I second that. Even a non-technical, "who is using what?" comparison would be an interesting start.

GStreamer: Past, present, and future

Posted Oct 27, 2010 11:55 UTC (Wed) by alankila (guest, #47141) [Link] (2 responses)

I got this intel_hda audio driver, with pulseaudio generating 1000s of warnings per second about something or other. It generally doesn't say what the warning is, only that the ratelimiter suppressed thousands of warnings. Great design of ratelimiter!

If I try to watch a video, the sound dies with some pulseaudio-related connection error and then nothing works until I pretty much restart the desktop environment. I guess pulse has a habit of crashing. When it restarts, if it restarts, it sometimes selects the wrong outputs so I notice this because I have to switch output from analog headphone jack back to digital jack. I do this every few days.

On my java app, if I use the default ondemand cpu governor, sound is very choppy, but switching to performance helps (incidentally, this also mostly fixes my video watching problems). I guess ondemand scheduler doesn't work out that the CPU is actually pretty loaded, or maybe the pulseaudio time-based scheduling screws up thanks to variable CPU speed.

So I got numerous crazy issues and I haven't really even tried to make progress pinpointing the causes. I am expecting the fundamental problem is with the intel_hda driver, which has got a ton of issues because it's really an umbrella driver that supports a lot of different hardware, some better and some worse.

I guess my real complaint is just that this shaking pile of cards made out of alsa, pulse and gstreamer doesn't really manage to hide the issues at the bottom of the stack. (And neither can it. Wrappers can't fix problems as a general rule.) However, I could justifiably complain that the pile of junk on top of shaky foundation does make what problems there are somewhat worse.

-ENOTGST

Posted Oct 27, 2010 13:37 UTC (Wed) by wingo (guest, #26929) [Link] (1 responses)

GStreamer needs a clock to synchronize between audio and video. That clock is usually derived from the audio sink. In this case, something is wrong with your audio sink -- probably intel_hda related -- and so things go pear-shaped.

This problem is not in GStreamer, and is not related to the audio format.

-ENOTGST

Posted Oct 29, 2010 0:31 UTC (Fri) by Spudd86 (subscriber, #51683) [Link]

Or likely even pulseaudio, it's probably alsa

GStreamer: Past, present, and future

Posted Oct 27, 2010 12:15 UTC (Wed) by alankila (guest, #47141) [Link] (36 responses)

> Gstreamer is certainly not perfect but its strengths and weaknesses should be evaluated in comparison to the alternatives.

Meant to respond to this point as well. One of the criticism I would offer about gstreamer is that it is making simple things far more complicated than they have to be. The number of audio formats it supports is a prime example. It is as if there was a belief that converting between two formats was so impossibly costly that you can't possibly afford it, which is bit of crazy given the low data rate of audio and the extremely fast CPUs available everywhere, including mobile devices. So instead of supporting every imaginable format and providing algorithms that work with every format you can name, how about nominating one privileged format and promoting audio from source and demoting it (with appropriate dithering) when entering a sink?

A good compromise audio format might be 32-bit integer (with unity = 1 << 24) or 32-bit float, both would work quite well, and my personal preference would be for floating point audio because it is simpler to work with. The design does not really make it easy to pass high-quality audio through the pipeline, for instance the volume control does not support dithering and for this reason one should always convert audio to some high-quality intermediate format after demuxing. On the other hand, the audioconvert module does do dithering and even defaults to a reasonable algorithm, so if you know what you are doing it is possible to get good result. However, it would be better if the good result was given by default, especially as the cpu cost to do so is low and architecture would be significantly simplified.

GStreamer: Past, present, and future

Posted Oct 27, 2010 17:06 UTC (Wed) by elanthis (guest, #6227) [Link] (35 responses)

" the extremely fast CPUs available everywhere, including mobile devices."

You're slightly over-estimating the power of many mobile CPUs. And seriously underestimating the amount of audio data that some apps need to work with (and hence the amount of effort necessary for conversion).

If GStreamer were to work for every possible use case it needs to work for, you'd pretty much be forced to work with 64-bit floating point audio everywhere. That is ass-slow on many modern CPUs, especially many mobile CPUs that have no FPU and even a lot of desktop CPUs still in common use.

The extra code isn't hurting you, but it's helping a lot of other people. I'm all for cutting out bloat, but "necessary code" never counts as bloat.

GStreamer: Past, present, and future

Posted Oct 27, 2010 17:10 UTC (Wed) by gmaxwell (guest, #30048) [Link] (5 responses)

> You're slightly over-estimating the power of many mobile CPUs.

More than slightly. Pulseaudio's mixing/filtering stuff is using tens of percent of CPU on N900. It's more computationally expensive than the whole vorbis decoder. It's problematic (e.g. for webm playback in a browser).

> you'd pretty much be forced to work with 64-bit floating point

I'm pretty sure thats BS. Outside of some numerically unstable filtering internals there is no reason to have that much precision in any general audio pipeline.

But even 32-bit float is more or less a non-starter in some existing environments.

GStreamer: Past, present, and future

Posted Oct 27, 2010 17:45 UTC (Wed) by alankila (guest, #47141) [Link] (2 responses)

Maybe your pulseaudio is resampling, or maybe it spends 90 % of time in gettimeofday(). It'd be a good idea to acquire an oprofile trace to figure out where it is hurting so much. Also, be sure to fix CPU speed to some known frequency, otherwise percentual usage values are a very poor metric for measuring relative performance.

I know that mobile systems may require the use of integer arithmetics, although I am hoping that floating point capability will get added to every CPU in time. Software emulation of floating point exists, but in practice software float is too wasteful to be acceptable. I did some quick testing on a HTC Hero and got the result that software emulated float took about 5x the time of similar integer code.

My practical experience suggests that the sort of things gstreamer needs to do (dithering, scaling, mixing, copying) take insignificant time compared to any other work that is also ongoing. That would also include decoding any codec.

GStreamer: Past, present, and future

Posted Oct 28, 2010 20:22 UTC (Thu) by oak (guest, #2786) [Link] (1 responses)

When playing just music on N900, the dynamic frequency scaling scales the CPU speed down to 250MHz (easiest to see from /proc/cpuinfo Bogomips value changes). Pulseaudio needs to do on N900 more complex/heavier stuff than normally on desktop (sound volume increase, speaker protection...).

Oprofile tells that about third of CPU goes to pulseaudio internal workings, rest is sound data manipulation which is accelerated with NEON SIMD instructions (as you can see by objdump'ing the related libraries code).

N900 uses TI Omap3 (ARM v7) i.e. it has HW floating point support. Sound HW is AFAIK 48kHz natively.

GStreamer: Past, present, and future

Posted Oct 29, 2010 0:35 UTC (Fri) by Spudd86 (subscriber, #51683) [Link]

Then pulse doing resampling seems pretty likely since so much audio is 44.1KHz

GStreamer: Past, present, and future

Posted Oct 27, 2010 17:49 UTC (Wed) by baldridgeec (guest, #55283) [Link] (1 responses)

Ever done live mixdowns of 16-channel audio with separate effects pipelines running per channel? Some people do real audio work with Linux, and it is quite capable if set up properly.

The issues you're talking about are somewhat annoying for desktop audio, but that's not the only use case that has to be considered. Pulseaudio has been getting better - I like that it knows enough to get out of the way of JACK-enabled applications now.

GStreamer: Past, present, and future

Posted Oct 27, 2010 18:09 UTC (Wed) by drag (guest, #31333) [Link]

Another example is if your using it for something like VoIP.

If you want to be able to do voicemail, conferences, transfer phones, put people on hold, work with multiple protocols, hook into a T1 or POTS and all that then your VoIP system is going to require a backend that can handle manipulating audio and transcoding between different formats.

Sure most of the audio formats used in VoIP are uncomplicated, to say the least, but if your handling a call center with a 100 phones with the multiple voice bridges and all that stuff then it adds up pretty quick.

Then another issue is the sound cards itself. Sounds cards only support certain audio formats and your going to have to be able to support a multitude if your going have a efficient way of outputting to the real world.

GStreamer: Past, present, and future

Posted Oct 27, 2010 18:07 UTC (Wed) by alankila (guest, #47141) [Link] (28 responses)

We can make the reasonable assumption that a media system is dealing with 16-bit 44.1 kHz stereo audio.

I disagree on the need for 64 bits. 32-bit floats already have 23 bits of precision in the mantissa, and plenty of range in the exponent. Given the typical quantization to just 16 bits, it is hard to argue for the need of more intermediate precision.

I agree it's not necessarily the extra code that hurts me (although I do find gstreamer's modules to be pretty gory, and the use of ORC for a trivial volume scaler was astonishing); what I perceive as poor architecture hurts me. Especially the audio pipeline looks pretty strange to me as an audio person. The need to specify conversions explicitly is baffling. How does one even know that in the future some format doesn't get added or removed from a module, thus either requiring the addition of a conversion step, or making the specified conversion unnecessary and potentially even harmful?

I am convinced that a more ideal audio pipeline would automatically convert between buffer types where necessary and possible, and that audio processing would always imply promotion to some general high-fidelity type, be it integer or float (might be compile-time switch) so that at most there is one promotion and one demotion within any reasonable pipeline.

GStreamer: Past, present, and future

Posted Oct 27, 2010 19:23 UTC (Wed) by wahern (subscriber, #37304) [Link] (2 responses)

That's how my audio pipeline works. I have a system written from scratch--i.e. no FFmpeg--that pulls Internet radio (Windows Media, Shoutcast, Flash, or 3GPP; over HTTP or RTSP; in MP3 or AAC) and transcodes the codec and format to a requested format (same combinations as before), resampling and possibly splicing in a third stream.

If the decoder produces audio which isn't in a raw format that the encoder can handle (wrong number of channels, sample rate, etc) than the controller transforms it before passing to the encoder. Of course, ideally both the encoder and decoder can handle the widest possible formats, because interleaving and resampling is incredibly slow, mostly because it takes up memory bandwidth, not because the CPU is overloaded in doing the conversions. But sometimes you have to resample or change the channels because that's how its wanted downstream, no matter that the encoder can handle it.

The server design can handle close to 20 unique transcoded streams per CPU on something like a Core2 (averaging 3-4% CPU time per stream)--the server doesn't use threads at all, each process is fully non-blocking with an event loop. (It can also reflect internally, which significantly increases the number of output streams possible.)

Systems which spin on gettimeofday--or rely on some other tight loop with fine grained timeouts--are retarded, too. There are various way to optimize clocking by being smart about how you poll and buffer I/O; you can usually readily gauge the relationship between I/O and samples. For example, a single AAC frame will always produce 1024 samples*. So even if the size of a particular frame isn't fixed, you can at least queue up so many frames in a big gulp, knowing how many seconds of audio you have, sleep longer, and then do a spurt of activity, letting the output encoder buffer on its end if necessary. If you need a tight timed loop to feed to a device, it should be in its own process or thread, separate from the other components, so it isn't hindering optimal buffering.

[*AAC can also produce 960 samples per frame, but I've never seen it in practice, but in any event its in the meta-data; MP3 encodes 384 or 1152 samples per frame; so If you know the sample rate and number of samples you know exactly how many seconds of compressed audio you have.]

My pipeline can do double or triple the work that FFmpeg, Vorbis, and others can handle, even though it's passing frames over a socket pair (the backend process decodes protocols, formats, and codecs; but encodes only to a specific codec; the front-end encodes to a particular format and protocol; I did this for simplicity and security). It's a shame because I'm no audiophile, and many of the engineers on those teams are much more knowledgeable about the underlying coding algorithms.

Adding video into the mix does add complexity, but you can be smart about it. All the same optimization possibilities apply; and synchronization between the audio and video streams isn't computationally complex by itself; it's all about being smart about managing I/O. Like I said earlier, pipelines should be separated completely from the player (which might need to drop or add filler to synchronize playback). It wouldn't be a bad idea at all to write a player which only knows how to playback RTSP, and then write a back-end pipeline which produces RTSP channels. That's a useful type of abstraction missing entirely from all the players I've seen. RTSP gives you only rough synchronization, so the back-end can be highly optimized. The client can then handle the find-grain synchronization. Overall you're optimizing your resources far better than trying to hack everything into one large callback chain.

GStreamer: Past, present, and future

Posted Oct 27, 2010 20:55 UTC (Wed) by wahern (subscriber, #37304) [Link]

Oops. I meant VLC, not Vorbis.

GStreamer: Past, present, and future

Posted Oct 29, 2010 10:36 UTC (Fri) by dgm (subscriber, #49227) [Link]

Is that in an open source project?

GStreamer: Past, present, and future

Posted Oct 29, 2010 0:41 UTC (Fri) by Spudd86 (subscriber, #51683) [Link] (24 responses)

"We can make the reasonable assumption that a media system is dealing with 16-bit 44.1 kHz stereo audio."

No, we cannot. Want to watch a DVD? Then you're dealing with 5.1 48KHz audio. DVD-Audio? Could be up to 192KHz. Blu-Ray? 7.1/5.1 and IIRC could be 96KHz. DVD-Video also allows 96KHz stereo.

And that's not even getting into stuff that's slightly less common than things almost everybody does at some point. (OK DVD-Audio doesn't really come up since there's no software players for it and pulseaudio currently caps it's sample rate at 96KHz so it has something to use as a maximum sample rate)

GStreamer: Past, present, and future

Posted Oct 29, 2010 14:28 UTC (Fri) by nix (subscriber, #2304) [Link] (20 responses)

A foolish question, perhaps, but what are all these far-past-48kHz audio samples targetted at? Not even dogs can hear that high. Bats, perhaps?

What hardware would you play them back on?

GStreamer: Past, present, and future

Posted Oct 29, 2010 16:04 UTC (Fri) by paulj (subscriber, #341) [Link] (11 responses)

I've wondered that too. Then recently I saw Monty from Xiph explain it in http://xiph.org/video/vid1.shtml - from about 11min in. Basically, it's cause you don't want any freqs > nyquist frequency for your sample rate to remain in the signal, or it'll cause aliasing. The ultra-high sample rates basically give you more margin for your low-pass filter, making them easier/cheaper to build.

GStreamer: Past, present, and future

Posted Oct 29, 2010 18:34 UTC (Fri) by alankila (guest, #47141) [Link] (10 responses)

While that is important to folks that do sampling in the analog domain, once you have actually captured the signal, digital techniques can easily do the rest and represent artifact-free 44.1 kHz audio with frequency response cut at 20 kHz. There are other reasons to prefer high sampling rates during processing, such as the reduction of artifacts due to bilinear transform and having free spectrum available for spectral expansion before aliasing occurs due to nonlinear effects. Not all applications need those things, though.

However, the idea of consumer-level 96 kHz audio (as opposed to 44.1 kHz audio) is pointless. It may sell some specialized, expensive equipment at high markup for people who are into that sort of thing, but there appear to be no practical improvements in the actual sound quality.

GStreamer: Past, present, and future

Posted Oct 29, 2010 23:32 UTC (Fri) by dlang (guest, #313) [Link] (9 responses)

I really question the 'common knowledge' and 'studies show' statements that say that people can't tell the difference between a 20KHz signal playing with 44KHz samples, and played at 96KHz samples.

I remember when the same statements were being made about video, how anything over 24Hz refresh rate was a waste of time because we had decades of studies that showed that people couldn't tell the difference.

Well, they found out that they were wrong there, at 24Hz people stopped seeing things as separate pictures and saw things as motion instead, but there are still benefits to higher refresh rates.

I think the same thing is in play on the audio side.

not everyone will be able to tell the difference, and it may even be that the mythical 'average man' cannot, but that doesn't mean that it's not worthwhile for some people. It also doesn't mean that people who don't report a difference in a test won't see a difference over a longer timeframe of useage (for example, going from 30Hz refresh rates to 80Hz refresh rates appears to decrease eye strain and headaches for people over long time periods, even for people who can't tell the difference between the two when they sit down in front of the two side by side.

GStreamer: Past, present, and future

Posted Oct 30, 2010 0:12 UTC (Sat) by jspaleta (subscriber, #50639) [Link] (1 responses)

I think the existence of inaudible dog whistles is serious blow against your hypothesis. We've had a much longer experience with audio frequencies near the edge of human perception than you would perhaps realize at first blush. Much of that history pre-dates any attempt at digital sampling. If 99.9% of people can't perceive dog whistles at 22 Khz, they aren't going to hear it played on their Alpine speakers in their car either.

Video framing on the other hand is relatively quite new...unless you count thumb powered flipbooks pen and paper animations.

-jef

GStreamer: Past, present, and future

Posted Oct 30, 2010 15:01 UTC (Sat) by corbet (editor, #1) [Link]

For all of our experience with audio, there was a small subset of us who were driven absolutely nuts by the weird high-pitched chirper things that the Japanese seem to like to put into doorways for whatever reason. Everybody else wondered what we were griping about. Some people hear higher than others.

The other thing that nobody has pointed out: if you're sampling at 44KHz, you need a pretty severe low-pass filter if you want to let a 20KHz signal through. That will cause significant audio distortion at the upper end of the frequency range, there's no way to avoid it. A higher sampling rate lets you move the poles up much higher where you don't mess with stuff in the audio range.

That said, I'm not such an audiophile that I'm not entirely happy with CD-quality audio.

GStreamer: Past, present, and future

Posted Oct 30, 2010 14:42 UTC (Sat) by alankila (guest, #47141) [Link] (4 responses)

Let's just say that I remain skeptical.

Your specific example "20 kHz signal playing with 44 kHz samples, and played at 96 kHz samples" is a particularly poorly example. I assume you meant a pure tone signal? Such a tone can be represented by any sampling with a sampling rate > 40 kHz. So, 44 kHz and 96 kHz are equally good with respect to representing that signal. If there is any difference at all favoring the 96 kHz system, it arises from relatively worse engineering involved with the 44 kHz system -- poorer quality of handling of frequencies around 20 kHz, perhaps -- and not from any intrinsic difference between the representations of the two signals themselves.

Many people seem to think---and I am not implying you are one---that the way digital signals are converted to analog output waveforms occurs as if linear interpolation between sample points were used. From this reasoning, it looks as if higher sampling rates were better, because the linearly interpolated version of 96 kHz signal would look considerably closer to the "original analog waveform" than its 44 kHz sampling interpolated the same way. But that's not how it works. Digital systems are not interpolated by fitting line segments, but by fitting sin waveforms through the sample points. So in both cases, the original 20 kHz sin() could be equally well reconstructed.

GStreamer: Past, present, and future

Posted Oct 30, 2010 15:04 UTC (Sat) by corbet (editor, #1) [Link] (3 responses)

Sinc waveforms, actually (sin(θ)/θ) :)

I knew all those signal processing classes would come in useful eventually...

GStreamer: Past, present, and future

Posted Oct 31, 2010 11:27 UTC (Sun) by alankila (guest, #47141) [Link] (2 responses)

It is true that the resampling is typically done with convolving the signal with sinc, but the effect of this convolving is as if the interpolation had occurred with sin waveforms fit through the sampled data points.

GStreamer: Past, present, and future

Posted Nov 2, 2010 4:02 UTC (Tue) by Spudd86 (subscriber, #51683) [Link] (1 responses)

Err, generally not sinc, it's usually windowed so as to have better PSNR

GStreamer: Past, present, and future

Posted Nov 6, 2010 10:55 UTC (Sat) by alankila (guest, #47141) [Link]

True, true.

GStreamer: Past, present, and future

Posted Nov 3, 2010 2:42 UTC (Wed) by cmccabe (guest, #60281) [Link] (1 responses)

I built an RC oscillator, chained it with an op-amp, and used it to drive a speaker. Then I cranked it up to the 20 kHz range. So I can tell you that I can hear above 22 kHz. We did "double-blind tests" where someone else was turning the sound on and off. I could always tell.

Some people can hear it, some people can't. Unfortunately, the "can't" people designed the Red Book audio format, apparently. I forget the exact frequency at which it became inaudible.

P.S. A lot of people have hearing damage because they listen to music at a volume which is too loud. You need earplugs at most concerts to avoid this.

GStreamer: Past, present, and future

Posted Nov 3, 2010 21:03 UTC (Wed) by paulj (subscriber, #341) [Link]

Gah, yeah.. And even at the cinema - least I've suffered through uncomfortably loud movies at Cineworld in the UK a few times, and block my ears with fingers and/or shoulder.

GStreamer: Past, present, and future

Posted Oct 29, 2010 19:20 UTC (Fri) by nicooo (guest, #69134) [Link] (1 responses)

A bottlenose dolphin would need 300 kHz samples

GStreamer: Past, present, and future

Posted Oct 31, 2010 13:10 UTC (Sun) by nix (subscriber, #2304) [Link]

Ah! So the dolphins have been manipulating our video format work!

Mice might need it too, for their supersonic squeaks of delight.

Perhaps... Douglas Adams was right?

GStreamer: Past, present, and future

Posted Oct 29, 2010 23:12 UTC (Fri) by dlang (guest, #313) [Link] (5 responses)

it's not that the audio frequencies are > 48KHz (the audio frequencies are almost certainly below 20KHz)

It's that using more samples to represent the data makes the resulting audio cleaner.

remember that you aren't recording the frequency, you are recording the amplitude at specific periods. the more samples you have, the cleaner the result.

GStreamer: Past, present, and future

Posted Oct 29, 2010 23:13 UTC (Fri) by dlang (guest, #313) [Link] (4 responses)

by the way, the nyquist limit isn't the limit for where things sound good, it's the limit beyond which there is no hope of getting anything resembling the correct result.

GStreamer: Past, present, and future

Posted Oct 30, 2010 0:09 UTC (Sat) by gmaxwell (guest, #30048) [Link] (3 responses)

There is lots of misinformation on this subject out there.

Given unlimited precision samples a signal which has no energy above the the system nyquist is _perfectly_ re-constructable, not just "good".

If the signal does have energy above the nyquist then it's not "no hope": the system is under-determined and there are a number of possible reconstructions.

Of course, we don't sample with infinite precision but increasing the sampling rate is a fairly poor way of increasing the SNR for lower frequencies if thats your coal. For example, a 1 bit precision 3MHz process can give as much SNR in the 0-20kHz range as a 20 bit 48khz process but it takes about 3x the bitrate to do so.

24bit converters with >110dB SNR are readily and cheaply available. These systems can represent audio as loud as 'dangerously loud' with the total noise still dwarfed by the thermal noise in your ear and the room around you. It's effectively infinite precision. Heck, given reasonable assumptions (that you don't need enough dynamic range to cover hearing damage to the faintest discernible sounds) well mastered CDDA is nearly so too.

There has been extensive study of frequency extension into the ultrasonic, and none of the studies I've seen which weren't obviously flawed could support that hypothesis. If this perception exists it is so weak as to be unmeasurable even in ideal settings (much less your common listening environment which is awash in reflections, distortions, and background noise). There also is no real physiological basis to argue for the existence of significant ultrasonic perception Heck, if you're posting here you're probably old enough that hearing is mostly insignificant even at 18kHz (HF extension falls off dramatically the early twenties for pretty much everyone) much less higher.

But hey if you want to _believe_ I've got some dandy homeopathics to sell you.

GStreamer: Past, present, and future

Posted Oct 30, 2010 0:36 UTC (Sat) by dlang (guest, #313) [Link] (1 responses)

Quote: Given unlimited precision samples a signal which has no energy above the the system nyquist is _perfectly_ re-constructable, not just "good".

I disagree with this statement. something can be reproduced, but not neccessarily _perfectly_

also, any time you have more than one frequency involved, they are going to mix in your sensor, and so you are going to have energy above this frequency.

sampling faster may not be the most efficient way to get better SNR, but it's actually much easier to sample faster than to sample with more precision.

using your example, setting something up to sample 1 bit @ 3MHz may be far cheaper than setting up something to sample 20 bits @ 48KHz. In addition, the low-precision bitstream may end up being more amenible to compression than the high precision bitstream. with something as extreme as the 1bit example, simple run-length encoding probably will gain you much more than a 3x compression ratio. That's not to say that a more sophisticated , lossy, compression algorithm couldn't do better with the 20 bit samples, but again, which is simpler?

I am in no way saying that people hear in the ultrasonic directly, However I am saying that some people listening to a 15KHz sine wave vs a 15KHz square wave will be able to hear a difference.

GStreamer: Past, present, and future

Posted Oct 30, 2010 14:21 UTC (Sat) by alankila (guest, #47141) [Link]

> I disagree with this statement. something can be reproduced, but not neccessarily _perfectly_

This may be confusing two ways to look at it: as mathematical issue, or as engineering problem. Mathematically the discrete representation and the analog waveform are interchangeable: you can get from one to the other. The quality of the conversion between the two can be made as arbitrarily high as you desire -- typically design targets are set beyond assumed limits of human perception.

>also, any time you have more than one frequency involved, they are going to mix in your sensor, and so you are going to have energy above this frequency.

Intermodulation distortion can generate extra tones, and depending on how strong the effect is, they may even matter. Such nonlinearities do not need more than one frequency, though.

This is normally an undesirable artifact, and our ADC/DACs have evolved to a point where they are essentially perfect with respect to this problem. In any case, from viewpoint of a digital system, artifacts that occurred in the analog realm are part of the signal, and are processed perfectly once captured.

> I am in no way saying that people hear in the ultrasonic directly, However I am saying that some people listening to a 15KHz sine wave vs a 15KHz square wave will be able to hear a difference.

The amusing thing is that a 44.1 kHz representation of a 15 kHz square wave will look identical to a 15 kHz sin wave, because none of the pulse's harmonics are within the passband of the system. Do you happen to have a reference where a system such as this was tested with test subjects so that it would be possible to understand how such a test was conducted?

GStreamer: Past, present, and future

Posted Oct 30, 2010 16:27 UTC (Sat) by magnus (subscriber, #34778) [Link]

Given unlimited precision samples a signal which has no energy above the the system nyquist is _perfectly_ re-constructable, not just "good".

Theoretically, you don't only need unlimited precision on each sample, you also need to have an infinite number of samples, from time -∞ to +∞, to perfectly reconstruct the original signal.

In practice though, audio signals will have some information (harmonics etc) at higher frequencies and no filters (not even digital ones) can be perfectly brick-wall shaped, so some aliasing will occur plus you will have some attenuation below the Nyqvist frequency. Sampling at 96 kHz might (if well designed) give you a lot more headroom for these effects.

I have no experience with 96 kHz audio so I don't know if this is actually audible or just theory+marketing.

Since human hearing is non-linear it's also possible that people can pick up harmonics at higher frequencies even if they can't hear beeps at these frequencies. The only way to know is double blind-testing I guess...

GStreamer: Past, present, and future

Posted Oct 29, 2010 18:14 UTC (Fri) by alankila (guest, #47141) [Link] (2 responses)

Sorry, but we can. Almost all the media in the world is in this format, your examples being special cases or practically irrelevant. In particular, your example of a DVD audio is special case, as it is actually in AC3 codec and it is desirable to pass it through the system as-is, because Dolby wants money for encoder implementations. The 96/192/5.1/7.1 are not really even relevant to the point I was making.

I was talking about performance. Nobody expects a mobile phone to spit out a 7.1 stream, ac3 or not, or whatever. I believe my point was that I wanted to argue for the case of simplified internal pipeline of gstreamer, where special case formats could be removed and replaced with more general ones. Your 7.1 192 kHz streams could be just 8 32-bit floating point channels for the purposes of the discussion, but I predict that you'd have severe difficulties transmitting those channels to amplifier.

See? This is not a point that is really worth discussing.

GStreamer: Past, present, and future

Posted Nov 2, 2010 3:57 UTC (Tue) by Spudd86 (subscriber, #51683) [Link] (1 responses)

Nope, you're still implying we can just pick one format and use it, we can't VOIP apps frequently use lower rates and fewer bits per sample. Some people WILL want their system to do 7.1@96KHz and there's no real reason to stop them. Once you do all the other stuff that GStreamer and pulseaudio have to do ANYWAY just to support common use cases you might as well support all the other stuff.

You CAN'T just say 'all audio is 16bit@44.1KHz' because it simply is not the case, 48KHz audio exists, as does 24 bit audio, some people by expensive sound cards to get these sorts of things, and you want to tell them they can't have it?

All I was objecting too is the first bit.

Getting to the rest of your post:

Of COURSE nobody expects their mobile phone to spit out 24bit 192KHz 7.1 channel audio, but some people DO expect it from their desktops, GStreamer is used it a very wide variety of places, and some of them need things your phone doesn't, some of them need things you don't ever need, but that's not a reason for GStreamer to not support them.

Certainly 32 bit float is the most (more in fact) sample resolution you'll ever need in a storage format... but GStreamer is sometimes used in a processing pipeline so it MAY at some point have a use for doubles... probably not though.

ORC is a perfectly reasonable thing to use for a simple volume scaler, especially on something like a mobile phone where CPU time might be at a premium.

I think part of the redesign was to make the format negotiation better and more automatic, however, avoiding conversions is always a good idea (hey large chunks of pulseaudio code are dedicated to doing as few conversions as possible, because of phones and embedded stuff, and even on a desktop rate conversions add error every time you do one since the bandlimiter isn't perfect so it introduces aliasing and noise every time it's run, good ones don't introduce much, but they are expensive to compute even on a desktop)

GStreamer: Past, present, and future

Posted Nov 6, 2010 11:08 UTC (Sat) by alankila (guest, #47141) [Link]

Perhaps I should clarify that when I mean sample format, I mean the particular way to encode the value of a sample. What I was proposing was the use of a single format when processing, for implementation simplicity and guaranteed high-quality output.

I do not have a principal objection to using a different sampling rate or number of channels. It's just that there are useful gains to be had from limiting the number of sample formats. As an example, processing 16-bit integer audio with the volume plugin will currently cause quantization, because the volume plugin does not do dithering.

And when I said that 44.1 kHz and 16 bits, I was talking about mobile context, I admit android flashed through my mind. Did you know that it does not even support any other output format at all? For a mobile device, it is an entirely reasonable output format, and given its other constraints it should be extremely well supported because it's simply the most important input and output format. As we learnt in this thread, N900 people made a ridiculous mistake with selecting audio hardware that apparently uses native sample rate of 48 kHz because that will force them to do resampling for vast majority of world's music. It is possible to do, but doesn't really strike me as especially smart thing to have done.

GStreamer: Past, present, and future

Posted Oct 27, 2010 1:36 UTC (Wed) by drag (guest, #31333) [Link]

> In short, I think this media framework is not quite the success story it is portrayed to be, IME.

I never played around with gstreamer directly much. The only thing I've done is to take the pulseaudio monitor and feed it into Icecast... just to see how hard it would be to have my desktop have internet audio.

That was nothing complicated, just a shell script effectively. I was happy with that.

As far as desktop goes I have a lot more experience and it's replaced VLC for me. I've always used 2 media players: Mplayer and something else. Between Totem and Mplayer I can pretty much play whatever I run across, if it's possible to be played at all on Linux. What one cannot do the other can usually do.

GStreamer: Past, present, and future

Posted Oct 27, 2010 13:20 UTC (Wed) by jackb (guest, #41909) [Link] (3 responses)

"In short, I think this media framework is not quite the success story it is portrayed to be, IME."

None of them are.

I haven't found one single framework that will successfully play every video file I have on my hard drive.

It takes a combination of an mplayer, xine and totem (gstreamer) to watch them all.

GStreamer: Past, present, and future

Posted Oct 27, 2010 21:12 UTC (Wed) by wahern (subscriber, #37304) [Link] (2 responses)

It's because none of the frameworks have the proper design. Most of them shoe-horn plugins into a particular producer-consumer model on the premise that it makes them easier to write, but ultimately it just results in balkanization of efforts.

The very low-level codec implementations--LAME, mpg123, FAAC, FAAD, etc--all share almost identical APIs, even though there was zero cooperation. Given the evident success of that API scheme, why do all these other frameworks depart from that precedent? They try to bake in all sorts of bells and whistles long before the best API for doing so becomes evident, and the end result is crappy performance and nightmarish interfaces.

FFmpeg comes the closest to a good API, and it lies at the heart of many "frameworks", but it has several defects and shortcomings, such as enforcing a threaded pull scheme, and not providing a simple tagged data format which would aid in timing and synchronization. (For my projects I repurpose RTP for this purpose, because IMO it's more valuable to define an interface at the data level than at the function level.)

GStreamer: Past, present, and future

Posted Oct 29, 2010 13:19 UTC (Fri) by wookey (guest, #5501) [Link] (1 responses)

Right. I have never understood why each different media player supports a different subset of stuff. As a naive geek who knows very little about multimedia it seems to me that once I have libx264 and libogg and libquicktime installed then every media player I have should be able to support those formats. But clearly that's not the case and there must be something else going on. Do I understand from what you say that VLC, mplayer totem etc don't actually use the same codec libraries but each implement their own? But if that's true what _does_ use these libraries (I see them on my system).

There seem to be complex interactions between players, lower-level media frameworks and individual codec libraries that I clearly don't understand. Can someone explain (or point to docs that explain)?

GStreamer: Past, present, and future

Posted Oct 29, 2010 22:21 UTC (Fri) by Uraeus (guest, #33755) [Link]

I assume that when you say stuff you mean media files. Well the reason is that a media file is a collection of things. For instance most media frameworks and players do their own demuxers (as using library versions makes things like trick modes hard to do) and the demuxer is more often than the decoder the one which has to battle with weird files. The second differentation factor is crash policy. The more broken files your player tries to play, the easier it is for said player to encounter something that makes it crash. This is a security risk. So as a player developer one are made to make a decision on how strict to be with more strict meaning less crashes but also less files being playable.

GStreamer: Past, present, and future

Posted Oct 27, 2010 0:01 UTC (Wed) by JohnLenz (guest, #42089) [Link] (5 responses)

He also showed various graphs of the commits per month...

Anybody have a link to the slides so we can see the graphs the article is talking about?

GStreamer: Past, present, and future

Posted Oct 27, 2010 8:59 UTC (Wed) by wtay (guest, #55923) [Link] (4 responses)

I uploaded the slides here: http://people.freedesktop.org/~wtay/gstreamer-conf-2010.pdf

GStreamer: Past, present, and future

Posted Nov 4, 2010 7:58 UTC (Thu) by frazier (guest, #3060) [Link] (2 responses)

I really need to get you guys a vector build of the GStreamer logo. I gave one to Erik years ago (about a decade ago now!) and it apparently disappeared in the mist of time. I'll probably have to recreate it, but provided I can figure out what the core font is, it won't be that difficult.

-Brock

GStreamer: Past, present, and future

Posted Nov 4, 2010 23:01 UTC (Thu) by bazzargh (guest, #56379) [Link]

Optima Bold Italic with a hand-tweaked "g". Or something very like that.

GStreamer: Past, present, and future

Posted Nov 5, 2010 8:17 UTC (Fri) by tpm (subscriber, #56271) [Link]

There's already an SVG version of the logo at http://gstreamer.freedesktop.org/artwork/

GStreamer: Past, present, and future

Posted Nov 5, 2010 8:09 UTC (Fri) by tpm (subscriber, #56271) [Link]

The video of the keynote is up now on the UbiCast GStreamer Conference Video Portal: http://gstconf.ubicast.tv/categories/conferences/

See http://gstreamer.freedesktop.org/wiki/GStreamerConference... for slides and other links.