PipeWire: A year in review & a look ahead (Collabora blog)

[Posted March 8, 2022 by corbet]

The Collabora blog looks at recent developments in the PipeWire media system and looks forward to what is yet to come:

Now in 2022, we are looking to the future. We already have designs to improve WirePlumber and experiment with new things. On the short-term horizon, we have plans to rework some parts of WirePlumber in order to make its configuration more user-friendly and the scripts easier to work with. We are also planning to revisit the policy logic and try to go a step beyond what PulseAudio has ever offered. In addition, we are looking forward to experimenting with complex cameras to improve how PipeWire and libcamera work together for an optimal user experience.

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 8, 2022 19:17 UTC (Tue) by schessman (subscriber, #82966) [Link] (13 responses)

Pipewire works for me, replacing jackd and pulseaudio which do not get along very well with each other. My artix systems now let me run Jamulus and simultaneously handle audio from whatever browser is running without any major issues. Latency is slightly higher with pipewire than jackd (5ms instead of 2.5) but I can accept that on a 12 year old iMac.

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 9, 2022 8:47 UTC (Wed) by ovitters (guest, #27950) [Link] (12 responses)

> My artix systems

I didn't recognize the name, thought it might be some kind of audio hardware or something. After Googling I saw it's a distribution. One that prides itself on their homepage to run "real software". At least it's clear what the focus is I guess.

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 9, 2022 12:22 UTC (Wed) by Wol (subscriber, #4433) [Link] (11 responses)

Depends what they mean by "real software", but I do often get the impression that people who develop software seem to think that computers should only be used for developing software - they forget that without lusers there isn't actually any point of having a computer (unless it's a toy).

Cheers,
Wol

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 9, 2022 21:45 UTC (Wed) by nix (subscriber, #2304) [Link] (10 responses)

"Real" appears to mean "we like it", judging by their claim on their homepage to use "real init systems (i.e. not systemd)". I didn't realise so many Linux distros were running using imaginary init systems! I'm impressed that systemd can do so much (and cause so much controversy) while being entirely imaginary.

I wonder if Lennart actually exists? I'd ask, but of course I can't trust the reply: it might just be another of these quite remarkably detailed hallucinations everything seems to be so rife with. If PulseAudio is actually nonexistent too (which if Lennart is imaginary it surely must be) then I must be hallucinating all the sound coming from my computers as well. This is getting more and more disturbing... is it ghosts?

(because silly claims deserve a silly response)

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 10, 2022 4:18 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (9 responses)

Now, now, this sort of "real programmer" silliness has a very long and venerable history. A few decades ago, I bet these people would be complaining about Pascal. It's all the same, in the end: Is abstraction good or bad? Abstraction is good, because it enables you to ignore low-level details. Abstraction is bad, because it enables you to ignore low-level details.

IMHO the only sensible conclusion is that different problem spaces have different requirements, even problem spaces that superficially appear to be very similar. Getting too worked up about the "correct" level of abstraction for any given problem space is unlikely to be a productive attitude in practice. The people who use "real" init systems have found a solution that works for their use case, and so have the people who use systemd. It just so happens that those use cases are different from each other in various ways.

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 10, 2022 15:50 UTC (Thu) by nix (subscriber, #2304) [Link]

Ah, I didn't think of parsing "real" as "not much abstraction, close to the metal". I wonder if these people insist on using dietlibc rather than musl or god forbid glibc because of the amount of abstraction in there. (I mean, the libio layer under glibc's stdio clearly makes glibc completely dreadful and it must be avoided.)

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 10, 2022 20:41 UTC (Thu) by Wol (subscriber, #4433) [Link] (7 responses)

> It's all the same, in the end: Is abstraction good or bad? Abstraction is good, because it enables you to ignore low-level details. Abstraction is bad, because it enables you to ignore low-level details.

Cue my obligatory database rant. "Make things as simple as possible, but no simpler". Abstraction is BAD when it hides low level details and forces the luser to re-invent the wheel.

Pick is a key-value store where the value is an n-dimensional cube. From my experiemce with Oracle, I think Oracle is a key-value store where the value is a one-dimensional list. By hiding - abstracting - this fact, Oracle (and I suspect all other Relational databases) force the user to re-invent the wheel to get back the key-value functionailty that is, in fact, a pretty decent abstraction of the real world. After all, most real world data comes as a set of objects, with a list of attributes. (Key = object identifer, value = list of object attributes.)

So when I claim Pick is so much faster than Relational, it's because Relational has two abstraction layers that SHOULD, but CAN'T, be abstracted out of existence :-)

Cheers,
Wol

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 10, 2022 23:09 UTC (Thu) by mpr22 (subscriber, #60784) [Link] (6 responses)

From my experiemce with Oracle, I think Oracle is a key-value store where the value is a one-dimensional list.

In any SQL-interfaced RDBMS,

select * from some_table where primarykeycolumn = some_value

leads you to a single row (one-dimensional list), but the entries in that list are, in several of the leading SQL-interfaced RDBMSes (including Oracle, SQL Server, PostgreSQL, and DB2), allowed to be of a user-defined type which can perfectly well be an array of an object-like type.

(The databases I work with do not use that capability.)

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 11, 2022 0:48 UTC (Fri) by Wol (subscriber, #4433) [Link] (5 responses)

And how is that physically implemented in the datastore? Is it a pointer to blob? How is it stored on disk? What impact does it have on performance? From what you've said, IT DOESN'T GIVE ME A CLUE.

The point I'm trying to make is that with Pick, the abstraction presented to the user (key -> n-dimensional-array) closely matches the actual physical implementation. I can reason, from my user application, through the database, to the OS and predict performance pretty accurately.

The whole point of relational is that the interface presented to the user is so badly abstracted away from the physical implementation, that trying to predict what the real hardware impact will be, is actually forbidden by the definition of relational.

THAT is why the abstraction is bad, because it hides implementation and says "don't worry about the wizard behind the curtain". I know that by optimising my application to the database abstraction, the database for the most part can "get out of the way" and pretty much just passes data through between application and OS. (Yes I know Relational puts a security layer in there which also doesn't help performance, but I'm pretty sure that's implemented in Pick's SQL layer, and but running in RAM I don't think it has that much impact compared to retrieving from disk.

(In other words, it's maths. The problem has a solution, but finding it may take longer than the life of the Universe - you don't know. Pick on the other hand is Science. It lets me predict the future - when my query will complete.)

Cheers,
Wol

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 17, 2022 15:49 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

> And how is that physically implemented in the datastore? Is it a pointer to blob? How is it stored on disk? What impact does it have on performance? From what you've said, IT DOESN'T GIVE ME A CLUE.

This is not exposed to the user and may very well vary from db version to db version, from table to table, from row to row or from time to time depending on how the data is accessed, with the RDBMS changing the layout in the background to improve performance of the most-common-right-now query (most RDBMS's don't do this, but they *could*: that they don't do this sort of thing very much is one of my big problems with them, they're too stupid and too static and they could be much cleverer).

> The point I'm trying to make is that with Pick, the abstraction presented to the user (key -> n-dimensional-array) closely matches the actual physical implementation

This is a terrible idea. I mean it's *obviously* a terrible idea because it means both that the physical implementation *cannot change* and that if you want to change your set of queries you probably have to redesign how the data is stored and quite possibly make a bunch of your other queries worse in the process. This sort of stuff really should be done for you by the machine. We have abstractions above the physical storage *everywhere else* and nobody thinks this is anything but excellent, why is it suddenly a bad idea for databases?

> (In other words, it's maths. The problem has a solution, but finding it may take longer than the life of the Universe - you don't know. Pick on the other hand is Science. It lets me predict the future - when my query will complete.)

Um, query planners don't *run* for longer than the life of the universe: usually they throttle rapidly, but sometimes they keep going in parallel with the query in hope of finding a better query and speeding things up, or speeding up the query the next time it runs. You're proposing something obviously inflexible in place of something with abstraction layers that permit either the machine or dammit the human to do *better*, simply because they don't have to worry about the fine details of where data is stored any more, or because they can dynamically adjust to the state of the system from moment to moment. This is at least possible in theory with RDBMSes: with a system in which everything is nailed down to specific physical representations, this is totally impossible.

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 17, 2022 18:25 UTC (Thu) by Wol (subscriber, #4433) [Link] (3 responses)

> > And how is that physically implemented in the datastore? Is it a pointer to blob? How is it stored on disk? What impact does it have on performance? From what you've said, IT DOESN'T GIVE ME A CLUE.

> This is not exposed to the user and may very well vary from db version to db version, from table to table, from row to row or from time to time depending on how the data is accessed, with the RDBMS changing the layout in the background to improve performance of the most-common-right-now query (most RDBMS's don't do this, but they *could*: that they don't do this sort of thing very much is one of my big problems with them, they're too stupid and too static and they could be much cleverer).

In other words, as I've said before, Relational is pure maths. It can't be Science because it sets out to forbid any form of experimental proof.

> > The point I'm trying to make is that with Pick, the abstraction presented to the user (key -> n-dimensional-array) closely matches the actual physical implementation

> This is a terrible idea. I mean it's *obviously* a terrible idea because it means both that the physical implementation *cannot change* and that if you want to change your set of queries you probably have to redesign how the data is stored and quite possibly make a bunch of your other queries worse in the process. This sort of stuff really should be done for you by the machine. We have abstractions above the physical storage *everywhere else* and nobody thinks this is anything but excellent, why is it suddenly a bad idea for databases?

Because in your world view it's "Turtles all the way down"? You're actually denying the objective reality of existence, you do realise that? Of course it's a bad idea, in Maths, to admit to the presence of reality, but reality would beg to differ.

As for why is it a bad idea for databases, every abstraction increases the cost to reality. And the Relational abstraction increases it massively. Actually, I'd argue that my abstraction is objectively better than yours. I create a *single* layer above my physical reality, that hides the key and restricts N to 1, and I have your First Normal Form abstraction.

The cost of enforcing First Normal Form and all that falls equally on you and me (and Pick has a SQL layer that does exactly that), but because your fundamental abstraction is objectively a proper sub-set of mine, my abstraction just has to be superior ...

> > (In other words, it's maths. The problem has a solution, but finding it may take longer than the life of the Universe - you don't know. Pick on the other hand is Science. It lets me predict the future - when my query will complete.)

> Um, query planners don't *run* for longer than the life of the universe: usually they throttle rapidly, but sometimes they keep going in parallel with the query in hope of finding a better query and speeding things up, or speeding up the query the next time it runs. You're proposing something obviously inflexible in place of something with abstraction layers that permit either the machine or dammit the human to do *better*,

You are SERIOUSLY underselling human ingenuity here ... you've been smoking the relational psycho-juice here. Information theory says you're flat out wrong here. Sorry.

> simply because they don't have to worry about the fine details of where data is stored any more, or because they can dynamically adjust to the state of the system from moment to moment. This is at least possible in theory with RDBMSes: with a system in which everything is nailed down to specific physical representations, this is totally impossible.

But it's Maths - how are you supposed to prove that your query won't run for longer than the life of the Universe? You have no grounding in reality to enable you to make that claim.

You said it's possible IN THEORY to adjust everything with an RDBMS. Is it that in *reality* it doesn't happen because they just can't make it work?

You've said it's possible IN THEORY to dynamically adjust system state. Is it that in *reality* doing that is just too expensive?

Let's jump to my favourite invoice example. Let's say we've got an invoice, with two addresses on it (invoice and delivery), and ten line items. Any sensible Pick design will store the invoice data in one table, the addresses in a second, and the line items in a third (actually, quite possibly not, the addresses are supposed to be immutable, but I'll let that slide ...). I'll also assume, for the sake of argument, that each individual item fits in a 4K disk block.

The customer service rep types in the invoice number into her application, Pick will ask the OS for THIRTEEN 4K blocks, and I will guarantee that (within a +5% error margin) the customer service rep will have her entire invoice. (I'm ignoring the fact the disk blocks may be cached, or there may be multiple items in a single block - Relational could benefit equally.)

Please explain to me how it's possible to improve on that!

(And don't magically invoke relational indices - they have a cost - I'm assuming a cold database so you have to retrieve the index...)

I'll go further ... I will claim that for ANY SENSIBLE Pick data design and any SENSIBLE query the same logic will work - there is absolutely no way possible to retrieve the information with less effort. In fact, it holds for any nonsensical query too - Pick will retrieve it with the minimum effort possible.

If data is causally related in the real world, Pick will hold it physically related in the database world. I think you'll find that where Relational actually succeeds in doing that, it uses a Pick-style layer "behind the curtain" to fool you - it must do - it can't store n-dimensions in a single row. (Or actually, now relational allows you to store arrays in cells, it exposes it to you, it just makes it a damn nightmare to manage.)

Relational, on the other hand, in trying to make ALL queries equally easy, cripples the sensible queries to try and make the nonsensical ones easier.

Cheers,
Wol

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 18, 2022 8:05 UTC (Fri) by Wol (subscriber, #4433) [Link]

> > This is a terrible idea. I mean it's *obviously* a terrible idea because it means both that the physical implementation *cannot change* and that if you want to change your set of queries you probably have to redesign how the data is stored and quite possibly make a bunch of your other queries worse in the process. This sort of stuff really should be done for you by the machine. We have abstractions above the physical storage *everywhere else* and nobody thinks this is anything but excellent, why is it suddenly a bad idea for databases?

> Because in your world view it's "Turtles all the way down"? You're actually denying the objective reality of existence, you do realise that? Of course it's a bad idea, in Maths, to admit to the presence of reality, but reality would beg to differ.

> As for why is it a bad idea for databases, every abstraction increases the cost to reality. And the Relational abstraction increases it massively. Actually, I'd argue that my abstraction is objectively better than yours. I create a *single* layer above my physical reality, that hides the key and restricts N to 1, and I have your First Normal Form abstraction.

Apologies for replying to myself, but this is simple Physics. Second Law of Thermodynamics and all that. They've recently redefined "entropy" as "loss of information". (Somebody came up with a weird situation where two different scenarios could not be differentiated using the old definition :-)

So, in converting the Pick abstraction to the Relational Abstraction, I've thrown away information and increased entropy. Ergo, Pick is more powerful and efficient than Relational - the laws of Physics say it MUST be.

Cheers,
Wol

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Apr 8, 2022 10:40 UTC (Fri) by nix (subscriber, #2304) [Link] (1 responses)

> In other words, as I've said before, Relational is pure maths. It can't be Science because it sets out to forbid any form of experimental proof.

This argument could be used to argue against JIT compilers and any form of optimizer. Are you really saying that machine optimization is always bad and that humans should do everything by hand, and using arguments as crankish as claiming that wanting machines to optimize for you is "denying physical reality"? (hint: it's really not). Because that's an... interesting argument. I suspect you'd be almost alone making it.

(Obviously it is possible for the RDBMS to tell you what its optimizers and query planners are doing, and to influence their decisions. This is usually essential because they're so bad -- if they were better, this would be as unnecessary, or at least rarely used, as a specific lever to tweak compiler optimizations at particular lines of code in other languages.)

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Apr 8, 2022 15:28 UTC (Fri) by Wol (subscriber, #4433) [Link]

> > In other words, as I've said before, Relational is pure maths. It can't be Science because it sets out to forbid any form of experimental proof.

> This argument could be used to argue against JIT compilers and any form of optimizer. Are you really saying that machine optimization is always bad and that humans should do everything by hand, and using arguments as crankish as claiming that wanting machines to optimize for you is "denying physical reality"? (hint: it's really not). Because that's an... interesting argument. I suspect you'd be almost alone making it.

It's the difference between PURE maths, and APPLIED maths. A jit compiler seeks to minimise the size of the object code. It seeks to minimise the number of loops, removing them if possible.

More to the point, the jit people can presumably MODIFY THE INTERPRETER to remove complexity from the system!

> (Obviously it is possible for the RDBMS to tell you what its optimizers and query planners are doing, and to influence their decisions. This is usually essential because they're so bad -- if they were better, this would be as unnecessary, or at least rarely used, as a specific lever to tweak compiler optimizations at particular lines of code in other languages.)

And this is where Pickies just scream in frustration, because onde you UNDERSTAND the Pick model, it's blindingly obvious why (a) Pick doesn't have an optimiser, and (b) the query plan is "do as you're told, it's impossible to do any better".

farnz just posted that - if we've both got our data in 4NF, I've got ONE file, he has MULTIPLE tables, yet he can retrieve his data *from* *disk* (from those multiple tables) faster than I can retrieve just the one record from my file. Seriously?

The difference between me and farnz seems to be I know EXACTLY how my data is stored on disk, and the DBMS just looks after it for me. farnz relies on database smarts, and from your own comments above those smarts ain't that smart!

The difference between Relational and Pick is that with relational, the user and the analyst disagree over what data is. The analyst takes what the user gives them, completely re-organises into something that bears no resemblance whatsoever to the original, then plonks it INto the database, for the programmer the other side to take it out again and try to re-assemble the original. In farnz last post, he pointed me at PostgreSQL's documentation for how data is stored in rows and tables. At no time has he EVER tried to enlighten me about how his 4NF data is actually stored on disk (until that last post where apparently it's plastered across how ever many 1NF tables it needs). His definition is presumably MEANT to stop the programmer the other side seeing the emperor has no clothes ...

Pick, on the other hand, agrees with the user what data is. "It's what you give me". The DICTionary DEscribes the data (and yes, a proper relational analysis it time well spent, to ensure an accurate DEscription). A single row should be what the USER understands as an entity, and then if the attributes are properly defined, breaking that down is dead easy. The resulting database schema is then also dead easy to understand.

Let's assume you want to get all the information a Pick database holds about me. Properly designed, you have a PERSONS table - I'll key it on NI number, and if you've done a proper relational analysis on the attributes it'll be broken down cleanly into 1NF sub-tables for you to do what you like with.

LIST PERSONS AB123456Z field[ field[ ...]]

The query asks the database, the query engine says "ah, filename, key" and gets that ONE record, and then uses the dictionary to return whatever data the user wanted. How many table accesses is that in Relational 4NF. Sounds like a lot more than one from what farnz said. And therefore a lot more than one request to disk.

The layout of your BASIC file structure is key. Until you move away from physically storing your data as 1NF (and yes, with all this "array in a cell" stuff relational has done that) you can't even approach Pick's power. When you denormalise away from 1NF, are you physically changing the layout (of your data) on disk, or are you just putting a logical layer there to make sure the application programmer can't screw up? Because if it's the latter, YOU'RE STUFFED.

Think of Pick as an "object/hierarchical/relational" database. Hierarchical databases are fast because they drill down. Object databases are compact because they don't store unnecessary data, and are also fast because they store lots of attributes together that a normalised database wouldn't. And because Pick's blob is a sparse matrix, you can normalise it without breaking its "blobness".

And how do you query your relational database? Pick's query language is fully capable of handling "arrays within arrays" making it nice and simple (okay, it doesn't like more than two or three dimensions, mostly because it blows the human mind and it's tricky to formulate a query in an easy-to-understand language if you can't get your brain round the problem in English!). SQL is still stuck in 1NF as far as I can tell.

This was my blinding flash about entropy. And why I bang on about things being "as simple as possible BUT NO SIMPLER". The relational analyst simplifies the data, but in doing so he converts INFORMATION (things like order) into DATA. Well, as far as Pick's concerned that's METAdata, which Pick doesn't store. Which is why Pick uses less disk space.

And then all this data needs to be converted back into information by your SQL query. which is why it's so horribly complicated with joins and whatnot. And why so much INFORMATION needs to be encoded in said SQL, which Pick just handles for you. Seriously, how much data do you have where order is "random but preserved"? Most? Pick looks after all that for you. Relational forces you to explicitly handle it in your queries and/or user code. The database appears simpler (tables and rows), but the *system* is much more complex - "premature optimisation" and all that - the database has been optimised at the expense of the application.

Cheers,
Wol

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 10, 2022 0:17 UTC (Thu) by pabs (subscriber, #43278) [Link] (1 responses)

Anyone know how I would figure out why PipeWire has such choppy audio under load, while PulseAudio does not? When there is no load PipeWire audio is quite smooth though. The scenario is an old desktop with analog headphones.

PipeWire: A year in review & a look ahead (Collabora blog)

Posted Mar 10, 2022 2:45 UTC (Thu) by liam (guest, #84133) [Link]

https://gitlab.freedesktop.org/pipewire/pipewire/-/wikis/...