I'm not convinced by any of the current audio solutions, though not (by and large) for the reasons given. Rather, I'm not convinced because all the solutions (that I've seen, at least) are very special-purpose, making them optimal for the users they're developed for (good) but sub-optimal for everyone else (bad).
Unnecessary communications overhead or context-switching overhead will surely kill any kind of daisy-chaining mechanism for audio processing because of latency, although it shouldn't impact programs that are complete in and of themselves. Now, of course, this begs the question of when these overheads are unnecessary. I'd argue that this is something that can only be answered by experimentation, not by politics.
OS-specific layers are going to complicate things for apps-writers, who are not going to want one audio module for Windows, another for Linux and a third for all the *BSDs. Abstraction layers to hide the specifics just adds to the footprint, to the latency and to the number of places bugs can hide. Certain models are also extremely hard to hide by abstraction. Again, though, what is acceptable can really only be seen by trying the experiment and observing.
As far as I can tell, the underlying problem is that there is no good specification of what "acceptable" even is. There needs to be some solid standard for the "best", "median", "mean" and "worst" cases against which solutions can be measured. The cases don't necessarily refer to the percentage of CPU resources available, they could also refer to the driver architecture used, the hardware bus bandwidth available, the number of audio cards present, the amount of pre-processing required, the complexity of the API, the portability of the API, etc.
If the minimum criteria were laid down in stone, for all the different issues, then there could be no argument over whether a solution was acceptable or not. If it meets the criteria, it's acceptable. If it doesn't, it isn't. It won't completely eliminate the politics (nothing can do that, I fear) but politics will always be loudest where terms are defined the least. Everyone will always try and get their interpretation to be the "one true interpretation" and that's a Bad Idea. It's hard to put a number to things like complexity and portability, but it's surely still easier than the usual wading up to the armpits in virtual blood.