LWN.net Logo

Point optmization?

Point optmization?

Posted Mar 5, 2012 0:37 UTC (Mon) by i3839 (guest, #31386)
In reply to: Point optmization? by khim
Parent article: Speeding up D-Bus

You're pulling in 1/3 of glibc because you're compiling it statically.
Glibc's bloatness is bad enough, but don't blame it on "Hello, World!"
At least Glibc gets used and shared by most programs, dbus isn't.
And Glibc's bloatness is never an excuse for others to be bad too.


(Log in to post comments)

Point optmization?

Posted Mar 5, 2012 4:31 UTC (Mon) by pflugstad (subscriber, #224) [Link]

Frankly, the size of the library and executable is down in the noise as well, and isn't really my point.

IPC and context switching is blazing fast in Linux already - any improvement in IPC is down in the sub-millisecond level, and is not noticeable by humans. And given how Dbus seems to be used, it seems like it's mostly on the human scale of things.

So I'm really at a loss to figure out what they are trying to fix here. If they're seeing some really slow response times from some events, then I find it extremely hard to believe that improving the IPC by a factor of 1.8x (or honestly, even 10x) is going to make any _noticeable_ difference at all.

So: are they really looking at the right thing? Is IPC truly the bottleneck and the thing causing whatever perceived slowness they are trying to fix? Or, are they parsing XML at runtime, which is notoriously slow? Or something like that?

Point optmization?

Posted Mar 6, 2012 6:25 UTC (Tue) by i3839 (guest, #31386) [Link]

I know library size isn't your point, but it was my point:

A huge library size for simple functionality is a clear sign of badly written or designed code, with all downsides that come with that: Inefficient, unnecessary complex code which is hard to debug and hard to optimise properly.

Context switching may be very fast, until you trash all your caches between every switch, then it all goes down the drain.

So the main reason why improving the IPC works is probably because it means less crappy code of dbus is run instead. If dbus does the multicasting, it sends messages one by one. The processes receiving them where most likely idling, while dbus-daemon, the bloated pig it is, probably did enough jumping around its own code to eat its timeslice up. So the processes receiving the message gets scheduled, does its thing, and only then does dbus-daemon gets a new time slice and can send the message to the next processes waiting for it (SMP should help a lot though). This cycle repeats itself till all processes received the message. If dbus-daemon was mean and lean this extra ping-ponging wouldn't be very noticeable and wouldn't happen as much. By pushing the multicasting into the kernel, this particular problem is avoided.

Sending a short message to multiple processes should be very fast, and we agree that isn't what makes dbus so slow. What makes it slow is all the other things it does for no good reason, but what exactly all that is, I don't know. It could be a bug too.

Point optmization?

Posted Mar 6, 2012 7:41 UTC (Tue) by khim (subscriber, #9252) [Link]

A huge library size for simple functionality is a clear sign of badly written or designed code, with all downsides that come with that: Inefficient, unnecessary complex code which is hard to debug and hard to optimise properly.

Or perhaps it's just a code needed for the task on hand. How can you distinguish these two cases?

Sending a short message to multiple processes should be very fast, and we agree that isn't what makes dbus so slow. What makes it slow is all the other things it does for no good reason, but what exactly all that is, I don't know.

Let's summarize the discussion:
1. You have no idea about the dbus design.
2. You have no idea about the task dbus is trying to solve.
3. Yet “you know for sure” it's bloated pig which trashes caches and this is why it's slow.

Real feat of solid engineering thought! Not.

Compare libpthread from GLibC and bionic, for example. GLibC's one is about three or four times larger yet in a lot of cases it's 10-100 times faster (I'm not joking).

Sometimes you need a lot of code because the task you are trying to solve requires a lot of code. Sometimes it's just legacy. To say that bloat indeed affects performance you need benchmarks, not handwaving.

If dbus-daemon was mean and lean this extra ping-ponging wouldn't be very noticeable and wouldn't happen as much.

And this is what I'm talking about: why are you so sure dbus-deamon trashed everything in CPU if it's size is much smaller then CPU cache? Do you have any evidence that your outrageous theories have any relation to what happens on practice? Are you just writing rubbish because you can?

Point optmization?

Posted Mar 5, 2012 7:11 UTC (Mon) by khim (subscriber, #9252) [Link]

At least Glibc gets used and shared by most programs, dbus isn't.

Actually dbus is already used by a lot of programs and I suspect in the future it'll be more fundamental, not less:
$ ldd /usr/bin/* | grep libc.so.6 | wc    1181    4724   59026 $ ldd /usr/bin/* | grep libdbus-1.so.3 | wc     196     784   11760

From this POV it's size is not excessive and before we can talk about just sad it's always a good idea to compare “bloated pig” with “lean and mean” alternative. In case of GLibC it's size justified not by the fact that it's most popular Linux libc, but the fact that it's the only one with efficient threading, adequate i18n, etc. Oh, and significant piece of GLibC is backward-compatibility functions: today's GLibC is still compatible with all the programs compiled decade ago (and more). This may be insignificant advantage for you, but for me it's important. What's your alternative for dbus and how it compares to dbus feature-wise?

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds