|
|
Subscribe / Log in / New account

All hail the speed demons (O'Reillynet)

Here's an O'Reillynet article on the efforts to speed up Linux desktop performance. "What I find so interesting about Waldo's, Federico's and Michael's work is that they are playing with something of a black-art. Performance optimisation is something that not only requires an expansive knowledge of how software is built and represented in memory, but also how to optimise code and the way code is interpreted."

to post comments

Somewhat uncritical article

Posted Nov 2, 2005 19:35 UTC (Wed) by cantsin (guest, #4420) [Link] (39 responses)

Of course, it is a daunting task to optimize bloated GUI/desktop architectures instead of designing them bloat-free from the beginning.

Real-life optimization work

Posted Nov 2, 2005 19:56 UTC (Wed) by elanthis (guest, #6227) [Link] (38 responses)

And of course, most of the optimizations being done are low-level micro-optimizations to code, not huge massive code redesign efforts. It's not at all a problem of bloat, it's a problem that a coder of any moderately large application or framework would run into, be in GNOME, KDE, the Linux kernel, glibc, X, etc.

I'm getting really sick of this "bloat" mantra. Bloat would imply that there is excessive and useless code. That isn't the case, at least so far as most of Frederico's work has shown. It's simply a series of the kinds of optimizations that just aren't possible to predict until you have a stable codebase that is in use by a wide array of real-world applications.

No programmer could predict the necessity of many of those optimizations during the initial development of the software. Anyone *trying* to predict those kinds of low-level optimizations during initial development are idiots, and all they are going to do is waste tons of their time writing ineffective optimizations and causing their code to be unmaintainable down the road, making efficacious optimizations quite difficult to implement.

People need to quit with the "bloat" bullshit. When you see a music player that has a built-in file manager and RSS reader, by all means, shout "bloat!" When you see someone micro-tuning an incredibly complex (by real-world necessity) text layout engine, shouting "bloat" just displays your ignorance.

Real-life optimization work

Posted Nov 2, 2005 20:05 UTC (Wed) by RMetz (guest, #27939) [Link] (4 responses)

"When you see a music player that has a built-in file manager and RSS reader, by all means, shout "bloat!""

Hey now, one man's bloat is another mans essential feature. ; )

Real-life optimization work

Posted Nov 2, 2005 20:39 UTC (Wed) by JoeBuck (subscriber, #2330) [Link] (2 responses)

Even if the file manager and RSS reader are made available as a shared library, and the same code is used by all applications that want to do file management or RSS functions?

Real-life optimization work

Posted Nov 2, 2005 22:21 UTC (Wed) by nix (subscriber, #2304) [Link]

e.g. amaroK, which *does* use KDE's HTML renderer and can probably be coerced into using its RSS reader fairly easily. (But it cannot yet send mail, so its evolution is incomplete. It should learn from GNU Hello.)

Real-life optimization work

Posted Nov 3, 2005 1:52 UTC (Thu) by RMetz (guest, #27939) [Link]

It was a joke, a play on an idiomatic saying. Hence the winking smiley.

Real-life optimization work

Posted Nov 3, 2005 2:58 UTC (Thu) by piman (guest, #8957) [Link]

Amusingly, the growing popularity of podcasts means most music players will probably grow RSS readers within the next year.

Real-life optimization work

Posted Nov 2, 2005 20:58 UTC (Wed) by cdmiller (guest, #2813) [Link] (27 responses)

Bloat Mantra?

I used to run X on Linux on a 386 with 8MB of RAM, and it ran faster and was much smaller than it is today on my Athalon, P4, whatever.

MS Word 4 on Windows 3.1 on a 100MHz Pentium with 16 MB RAM starts up and runs faster than Open Office or MS Word XP on modern 2GHz processors.

Wordperfect 3 fit on single 360KB floppy diskette, 2 diskettes if you wanted to use the spell checker.

There was a word processor with support for 20 languages for the Apple II, running in 64KB of RAM.

I would venture to say, percentage of resource use wise, older software was far less bloated in general than what we have today, Free or closed source.

Real-life optimization work

Posted Nov 2, 2005 22:05 UTC (Wed) by jwb (guest, #15467) [Link] (21 responses)

Linux ran like a greased pig on my old Pentium 60 with 16MB RAM, too, but you must surely agree that it does a whole lot more these days. Back in 1996, all your X11 fonts looked like junk, none of the software properly supported UTF-8 and other Unicode encodings, right-to-left text, vertical text, antialised glyphs and shapes, or any of a thousand other features that are now standard.

If Netscape 1.1 and XFree86 3.3 worked great for you, then by all means, continue using them. But it isn't bloat when software does more.

Now, as an addendum, I'd *love* to know why the clock in GNOME needs 10MB of memory. That *is* bloat.

Real-life optimization work

Posted Nov 2, 2005 22:54 UTC (Wed) by cdmiller (guest, #2813) [Link] (10 responses)

Point well taken. You don't see me running Wordperfect 3 from the floppy on my Kaypro lunchbox these days, and the 386 is a motherboard in a box somewhere in the basement.

While I dislike what I perceive as the "bloat" in todays software, I'm certainly not confident I could do better. No insult(s) intended to any developers or products. Just my observation of what old software running on limited resources looks like compared to todays stuff on the modern readily available hardware. My Afterstep says it has 20 Meg resident, my first linux and X computer had 8 Meg of RAM and ran fvwm.

Anyhow, kudos to the folks taking on the profiling and optimization tasks.

Real-life optimization work

Posted Nov 3, 2005 0:06 UTC (Thu) by smoogen (subscriber, #97) [Link] (5 responses)

Having seen cross platform projects.. I have found that a lot of "bloat" people see comes from a lot of the hardware, the requirements, and the time to produce itself. In cases where a simple C program was compiled on an 8 bit computer.. you would see it grow 4x-16x on a 16 bit computer and 64x-128x on a 32 bit. This wasnt including anything like True Type fonts and UTF. We had a 1 mb browser that when we wanted UTF-8 it became a 32 mbit monstrosity at first because it had to deal with a whole bunch of rules that [left to right, right to left, and up and down rules plus certain checks that are language specific.] We supported 20 languages before as long as they were western european languages. Doing any of the eastern languages added tons of complexity.

We spent 3 months to get it down to a reasonable size (as this was the days of Pentium 60's) but basically ended up with shipping the large product because the 'optimizations' kept making it look like crap.

Heck, want to speed and smallify up Linux.. dictate that the world uses ASCII C only as K&R meant it to.

Real-life optimization work

Posted Nov 3, 2005 10:03 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

Indeed. A lot of this is increased alignment constraints, but in binaries as opposed to in memory a pile is caused directly by increased address sizes. e.g.:
-rwxr-xr-x  1 nix users 1165752 Nov  3 09:55 32/libcrypto.so.0.9.7
-rwxr-xr-x  1 nix users 1398112 Nov  3 09:55 64/libcrypto.so.0.9.7
That's two stripped UltraSPARC binaries, both built with -mcpu=ultrasparc (thus using almost identical instructions), one built with -m32 and one with -m64 with a biarch GCC. Major differences are thus alignment of data (25Kb size difference) code (20Kb size difference)... and relocations (100Kb difference: the 64-bit relocation sections are twice the size, because they're basically big tables of addresses and all the addresses have doubled in size).

Real-life optimization work

Posted Nov 3, 2005 19:55 UTC (Thu) by mcm (guest, #31917) [Link] (1 responses)

i guess the relocations could be compressed, as they can probably be represented as 32-bit offsets to a 64-bit base.

Real-life optimization work

Posted Nov 4, 2005 13:22 UTC (Fri) by nix (subscriber, #2304) [Link]

Indeed they could be compressed, but I think you might need a new relocation type for 64+32 base+offset... (I'm not sure and don't have the specs here).

Real-life optimization work

Posted Nov 5, 2005 3:11 UTC (Sat) by vonbrand (subscriber, #4458) [Link] (1 responses)

Sorry, but comparing the size of the binaries is useless. Use size(1) for that. Also, from what I understand, on SPARC 64-bit binaries are much larger due to larger constants (pointers, integers, ...) all over the place.

Besides, what is the point? To get anything running on an 8-bit machine was a challenge, lots of things you take for granted today weren't even the stuff of wet dreams then. You also have to remember that today the expensive part of the mix is people, not machine. Sure, one could develop mean and lean applications doing most of what today's software does. With enough care, you could even figure out how to include just the features people really use, and shave off quite a bit more. But the development would be a whole lot more expensive, just for letting a few MiB of RAM lay around unused for a change.

Real-life optimization work

Posted Nov 7, 2005 0:18 UTC (Mon) by nix (subscriber, #2304) [Link]

I size(1)d them, of course; I just didn't want to spray the result all over the comments page.

And, yes, I'd agree that normally shaving bytes off things isn't worth it: however, with that in mind I spent this weekend shaving a few bytes off one data structure in one program and reducing the number of instances of that data structure --- and reducing the program's peak memory consumption from many gigabytes to a few hundred Mb.

But microoptimizations without major results, or pervasive ones, are indeed generally not worth it.

Real-life optimization work

Posted Nov 3, 2005 3:17 UTC (Thu) by piman (guest, #8957) [Link] (3 responses)

The biggest tradeoff is not features but development time. If developers had twice the time to write twice the features, software would probably be faster and leaner. Instead, people want twice the features in half the time. Back in 2000 the chant was "good fonts!", not "good and fast fonts!" and now we're doing the other half of the work.

There is also something to be said for the average quality of programmers in 1980 versus the average quality today. There are more good programmers, but it's more common to have one good and five bad programmers writing your product, than two good programmers. Or sometimes just five bad programmers. On the other hand, we have a lot more software.

Real-life optimization work

Posted Nov 4, 2005 6:47 UTC (Fri) by zblaxell (subscriber, #26385) [Link] (2 responses)

There is also a shift in the nature of the programming task.

In 1989-1991 I wrote a personal calendering application in the best available programming tools for me at the time: 6809 assembler. From scratch. (OK, I had Unix-like system calls, but no library functions, not even math with integers larger than 16 bits).

The application contained many of the usual personal calendar features and some unusual ones: alarm notifications, recurring events, a categorization and prioritization scheme, expiration dates, interactive editing, printable sorted deadline lists, colored text, curses-like interface, etc. The particular combination of features was highly productive for me, and unfortunately a) I've never seen anyone else write a similar application, b) the source code is on an obsolete hard drive, and c) without it, I can't seem to organize my life to get the time to rewrite it.

One thing that happens when you manually type in 1300 assembler instructions is that you don't waste them. There was nothing in that code that didn't need to be there. I entered each instruction by hand, using no assembler macros, only function calls. Features were carefully designed to balance functional benefits against fairly painful coding cost--when 10% of your program is consumed by the functions that manipulate dates and intervals, you think twice before adding superfluous features, and you also find ways to *add* functionality by *removing* code.

This calendering application binary was about 3K. The smallest i386 binary I can get for the source code "int main(){return 0;}" is more than double that size, but it does less (now *that* is bloat ;-). Oddly enough, at the time I thought 3K was a huge investment in memory since it would be resident in RAM all the time.

If I cloned the old program line by line, but transliterated into C, it'd probably become 10 times larger (recall it became twice as large just by being replaced with a program that returns a constant integer). The i386 requires four bytes for memory addresses instead of two, many of the x86 instructions are longer than the 6809 equivalents, and C compilers don't usually find ways to exploit instructions that are designed for people who are writing date formatting functions by hand in assembler.

If I designed an equivalent program using the tools I'd normally use for binary software development today (C, curses, etc), it'd be 100 times larger. My program contains constant strings for terminal manipulation--this would be replaced with the while curses/termcap/terminfo/etc infrastructure. If I used malloc() instead of my own memory management library and ANSI C string functions instead of my own string management library the memory overhead on each event would double. localtime() and mktime() are considerably larger than my date manipulation library--my library didn't have to support time zones, for one thing. A lot of data that was stored in packed bit structures would end up being spread out over bytes, ints, or even text strings in a "modern" design.

On the other hand there is one saving--I won't need several hundred bytes of integer math library since modern CPU's come with these functions *built right into the hardware*. ;-)

If I designed an equivalent program in a scripting language, its source code might be somewhat smaller, but it will probably use more RAM at runtime than was available in the entire machine that used to run the application as a daemon--a bloat factor of over 200 (with a GUI, over 1000). It would also take me a single weekend, not three years, to write it.

But would the program do anything more? No. It would be the same little program, it would just be sitting on top of a mountain of accreted infrastructure.

Real-life optimization work

Posted Nov 6, 2005 0:58 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (1 responses)

Are you /sure/ it wouldn't do more?

You see, it's so easy to write a Unicode-enabled, locale-sensitive program that you might easily do so by accident. Your new program might, without you really intending it, support a lot of extra things that a lot of people (maybe even you) would find useful. Things which weren't so much missing from the original as simply never considered. Remember also that the OS support functions are much more powerful and robust than their equivalents on your 6809. Depending on the APIs used your "save file" routine may magically support saving a compressed file, over the network, with automatic versioning...

Real-life optimization work

Posted Nov 8, 2005 8:21 UTC (Tue) by piman (guest, #8957) [Link]

You forgot to mention bloated things like file permissions and multiple terminals. :)

Also, Unix code (meaning all those things the grandparent eschewed, like malloc and localtime) written in 1989-1991 would take a couple days to port to a modern GNU/Linux distribution. And probably only a few days to port to whatever comes 15 years from now.

So would it do more? Yeah. To start with, it would run in the first place. And without that ability, source code of any size is worthless.

Real-life optimization work

Posted Nov 2, 2005 23:31 UTC (Wed) by stef70 (guest, #14813) [Link] (6 responses)

Apart from the fact that in the 'good old time', application had far less features, I also believe that they were not as fast as we usually remember.

For example, my first experience on UNIX was on some SUN X stations with a 486 processors. I was really impressed by the speed of those 'beast' and by they graphic capabilities. Everything was fast on those machines.

I was upgraded to a newer SUN using a sparc processor.
After a few year, I had to work again on the 486.

The configuration (hardware+software) was exactly the same as before but everything was slow. I could clearly see the window redrawing themselves.
That obviously did not bother me a few years back.

About the clock applet: you should not trust an memory usage reported by the Linux kernel. In a desktop environment like Gnome, most of the memory is shared between applications and libraries. A more accurate way to evalute the memory footprint of the clock applet is to substract its SHARED memory from its RESIDENT memory. On my system (amd64) that gives 1.6MB.

Even 1.6MB is quite a lot for a simple clock.

A quick look in the memory map shows that about half of it is used by the clock applet itself (HEAP+STACK). The rest is used by the non-readonly segments (and so non-shared) of the shared libraries.

I think that the problem is the large number of shared library liked with each Gnome applications.
My clock applet is using 84 shared library.
Each shared lib requires at least one page (4KB) of non-shared memory for its non-constant global data. That's a minimum of 4K * 84 = 336K.
In practice, you should at least double or triple that number since some libraries use more that 4KB of non-constant global data.
The sad part is that most of the libraries are probably never used by the clock applet so that memory is allocated for nothing.
For example, does the clock really need an XML parser? libxml2 and libexpat are using 36KB+12KB of non shared memory.
And what to think of libgpg, libcrypt & libk5crypto in a clock applet?



Real-life optimization work

Posted Nov 3, 2005 0:41 UTC (Thu) by jwb (guest, #15467) [Link]

To be perfectly fair, the GNOME calendar can read your appointments and whatnot out of Evolution's database, so that may explain the presence of S/MIME libraries and so forth. However, I can think of way, way more efficient methods of implementing that functionality, mainly involving a daemon (which evolution already has in surplus) and interprocess communication.

Real-life optimization work

Posted Nov 3, 2005 10:08 UTC (Thu) by nix (subscriber, #2304) [Link]

A quick look in the memory map shows that about half of it is used by the clock applet itself (HEAP+STACK). The rest is used by the non-readonly segments (and so non-shared) of the shared libraries.
/proc/*/smaps is useful, isn't it?

Real-life optimization work

Posted Nov 3, 2005 19:05 UTC (Thu) by dann (guest, #11621) [Link] (3 responses)

The crypto libraries are brought in because gnome-vfs is linked to them.
libgnomeui links to gnome-vfs, so any GNOME application that links to libgnomeui will be linked to the crypto libraries.
It would be better if gnome-vfs dlopened the crypto libraries on demand when they are used, that would avoid linking all the GNOME applications to the crypto libraries (and probably avoid loading them from disk on startup, as they probably are not used).

Real-life optimization work

Posted Nov 4, 2005 13:26 UTC (Fri) by nix (subscriber, #2304) [Link] (2 responses)

Shared libraries are paged in, not `loaded from disk'; the overhead of using extra shared libraries on a prelinked system is very low indeed. (dlopen()ing is rather a lot more expensive, as you can't prelink dlopen()ed libraries.)

Real-life optimization work

Posted Nov 4, 2005 16:29 UTC (Fri) by dann (guest, #11621) [Link] (1 responses)

"Paging in" does not make a big difference for small libraries during a cold startup, at least the symbol table and the _init need to be read from the
disk. Extra disk seeks are expensive.

Real-life optimization work

Posted Nov 4, 2005 18:31 UTC (Fri) by oak (guest, #2786) [Link]

Only if your mass storage is slow at seeking.

This is not the case if you use instead of hard disk for example Flash memory like is done on many embedded devices.

Real-life optimization work

Posted Nov 3, 2005 9:16 UTC (Thu) by rossburton (subscriber, #7254) [Link] (2 responses)

Ah the classic "foo takes 20M it's evil!" argument.

10M of virtual memory, most of which is shared. That's GTK+, Pango, GConf, Bonobo, for a start, and often the Evolution calendar libraries being loaded to display your appointments and tasks in the calendar. Heap wise, the clock uses a meg, and the executable code itself is 72K.

pmap is your friend. Bannish the ignorance and see how memory is actually being used! I found an interesting bug in Evolution Data Server which resulted in vastly inflated "ps" memory counts: threads were not being destroyed correctly and for every thread (read: contact search) 8M was added to the VM size. Of this 8M only 4 bytes was actually used (it's the thread stack, and the thread didn't return anything), but it's easy to get "ps" sizes in the hundred of megabytes this way. One line patch later, bug fixed.

Real-life optimization work

Posted Nov 3, 2005 18:44 UTC (Thu) by dann (guest, #11621) [Link] (1 responses)

Well it would be nice if the calendar and appointment functionality would be loaded on demand. If one does not use evolution, then there's little point in
loading all those libraries, it just slows down the startup.

About pmap, it would be great if the linux pmap printed more details about the maps like the Solaris pmap -x:

Address Kbytes Resident Shared Private Permissions Mapped File
00010000 1688 1616 1616 - read/exec emacs
001C4000 4904 4816 1208 3608 read/write/exec emacs
...

This way you more more exactly how memory is used.

Real-life optimization work

Posted Nov 4, 2005 13:27 UTC (Fri) by nix (subscriber, #2304) [Link]

That is coming now the kernel exports that sort of info (as of 2.6.14).

Real-life optimization work

Posted Nov 2, 2005 22:41 UTC (Wed) by azhrei_fje (guest, #26148) [Link] (4 responses)

Perhaps. But you're comparing software that didn't/couldn't do what we wanted it to, with software that has more features than mere mortals are likely to ever use.

Given that, I will say that I think startup times are pretty bad. But I don't think it's bloat. I think that startup times are linked to building the GUI interface, and I think much of that interface is built without any parallelism. Just in playing around with a Java program of mine, I found that a lot can be built either in a separate thread or when the program goes idle waiting for user input. In both cases, the *apparent* response time was very much improved.

Real-life optimization work

Posted Nov 4, 2005 7:25 UTC (Fri) by drag (guest, #31333) [Link] (3 responses)

In gnome much of the start up time in programs doesn't have anything to do witth the binary sizes, or how it's programmed, or libraries it's linked to or anything like that.

What it is is that it's looking around on your harddrive for various configuration files and whatnot. Polling files here and there. So a large part of the start up time is when the program is fine and ready to run pretty much, but it is waiting on disk I/O.

With windows you have the registry were all this stuff is stored, which I suppose is mostly in memory most of the time anyways. It's much quicker interface then the Linux-style configuration files and directories stored in various places on your directory system.

That's the trouble with optimizing code. You could spend all day twiddling bits and re-aranging this or that and save maybe a half a second off a 7 second load time, were as you could spend time re-thinking out how the configuration files work and save 5 seconds off of the load time.

Linux itself also has numerious small things that have been developed and added to the kernel to greatly improve memory performance and whatnot, but nobody uses them because they are unaware of them, and when they are they often don't want to bother because it's a hassle to make Linux-specific code when other systems like the BSDs aren't nearly as sophisticated desktop-wise.

Or something like that. I am not a programmer though. But I found this interesting:
http://stream.fluendo.com/archive/6uadec/Robert_Love_-_Op...

Real-life optimization work

Posted Nov 6, 2005 8:08 UTC (Sun) by zblaxell (subscriber, #26385) [Link] (1 responses)

The Windows registry is organized into a vaguely tree-like recursive structure, demand-paged and cached in RAM.

The Linux filesystem is organized into a vaguely tree-like recursive structure, demand-paged and cached in RAM.

Performance-wise there isn't much difference unless you're using a braindead filesystem. The frequently accessed and recently modified stuff will be in RAM, and everything else won't.

It would be better to tweak the demand-paging of the executables. Reading 4K at a time according to quasi-random execution paths is stupid when it's faster to read 500K of data from disk than it is to read 4K, seek 492K ahead, and read 4K.

Real-life optimization work

Posted Nov 9, 2005 10:16 UTC (Wed) by njhurst (guest, #6022) [Link]

I think that the problem is each application looks in 10 different places for 10 different files.

Real-life optimization work

Posted Nov 8, 2005 8:17 UTC (Tue) by piman (guest, #8957) [Link]

> You could spend all day twiddling bits and re-aranging this or that and save maybe a half a second off a 7 second load time, were as you could spend time re-thinking out how the configuration files work and save 5 seconds off of the load time.

I'll believe that when I see a profile.

(Which, in my opinion, was rather the point of this article. Stop complaining and start profiling. That's how we'll get rid of "bloat".)

Real-life optimization work

Posted Nov 2, 2005 22:03 UTC (Wed) by cantsin (guest, #4420) [Link]

I'm getting really sick of this "bloat" mantra.
Then explain to me why a trivial Gnome taskbar clock applet eats up as much RAM as a whole instance of vim, or the difference in code size between gnome-terminal and xterm, the duplication of functionality in different libraries, or, in general, the performance and resource usage of Gnome + Nautilus vs. XFCE + rox, or, on other fronts, the resource usage of oowriter vs. abiword or oocalc vs. gnumeric.

Your example of a music player with a built-in file manager and RSS reader is, IMHO, not an example of bloat, if the application is made up of single, user-configurable components. (Which is why Emacs is not bloated.) An example of bloat is software that doesn't offer very much functionality, but still eats up lots of RAM and CPU time.

Real-life optimization work

Posted Nov 3, 2005 4:28 UTC (Thu) by marduk (subscriber, #3831) [Link] (3 responses)

I too have gotten pretty tired of the whole "bloat" thing. The word has been abused so much as to have lost its meaning.

Also, why does the term "bloat" seem to be used almost exclusively for software? When automobiles are enhanced with power-everything, air bags, and keyless entry nobody calls that bloat. When the AM transister radio with mono became the 5.1 digital satellite radio system no one cries "bloat". But put something on a Linux desktop other than a clock, an xterm and a text editor and you can't blink before someone screams bloody bloat.

The thing is: I want to use more than a clock, xterm and text editor. For those that don't: that's your choice. No one's forcing you to do otherwise. For those who think that today's solutions have too much bloat: show us your own solution if you have a better one. Maybe you'll convince me and I'll make the switch as well.

Real-life optimization work

Posted Nov 3, 2005 16:51 UTC (Thu) by beoba (guest, #16942) [Link] (2 responses)

The car doesn't start more slowly because of power airbags.

With software, adding features is often a tradeoff, and because of that, different people have different ideas of what position is optimal for their case.

Real-life optimization work

Posted Nov 4, 2005 16:04 UTC (Fri) by hppnq (guest, #14462) [Link] (1 responses)

The car doesn't start more slowly because of power airbags.

Of course it does, Newton proved that about 350 years ago. ;-)

The trick with optimization is knowing *what* to optimize. Most of the complaining about bloated and slow software is meaningless nonsense, it's like complaining that the tires of the average truck are so much bigger than my own car's -- and MY car runs fine, you know.

Real-life optimization work

Posted Nov 6, 2005 9:45 UTC (Sun) by zblaxell (subscriber, #26385) [Link]

Actually the car with airbags has probably also lost weight in other places thanks to materials and construction optimizations, and maybe has a more efficient and/or powerful engine. It most likely actually starts *more* quickly now. In my case, it actually does--my current car has airbags and starts in well under a second, whereas my previous car had no airbags and could not leave the safety of park without idling for at least three seconds *after* the starter motor finished getting the engine to run on its own power (and some days that took a while). The new one is bigger and heavier than the old, but consumes half the fuel (another thing naive Newtonian analysis suggests shouldn't happen). There's a lot of complexity in the new system, but it does something useful that the simpler system couldn't do, and the cost is reasonable given the benefits.

Some people are alive today because of airbags--I doubt they would consider them bloat. OTOH, some people do complain about airbags, although they don't use the word "bloat" to describe it ("dangerous" and "explosive" come to mind).

Sometimes software gets better when it gets bigger. A 6809 machine running a 1.87MHz processor doesn't have a complicated buffer cache subsystem, because it would be slower to read and cache data than to reread it from disk every time it is requested and copy it into a cache; however, a P4 machine running a 3.0GHz processor turns a complicated buffer cache subsystem into huge average-case improvements in I/O subsystem performance. Very few users would consider disk cache to be bloat on a modern machine (maybe people who write bootloaders that must share 512 total bytes of binary with a partition table, or embedded systems which use alternatives to cache, like execute-in-place, or high-end database systems which regard *everything* between the backend and the disk as bloat).

Often, bigger software is just bloated. Program A that performs a task in 1000 steps in a loop is generally slower than Program B that performs the same task in only 500 steps, all else being equal. No amount of spiffy new hardware can change the fact that program A is still twice as slow as program B on the same machine. The complaints about bloat start when program A fails to demonstrate useful improvements over program B, especially when program A is new and typically doesn't do everything that program B does.

And why shouldn't people complain when, all else being equal, someone proposes replacing their existing working software with something slower, larger, and more broken? Even entirely new software is bloated if its runtime cost far exceeds reasonable technical requirements for the problem it solves. It's one thing to say "it has a lot of new capabilities", but the people who are complaining care more about things they're doing now, than things they could do at some future time.

The text editor I used in 1991 (vi) is the same text editor I use in 2005, but its performance relative to CPU clock rate has been largely unchanged during that time (with the convenient side-effect that it is now 200 times faster than it was 14 years ago, or put another way, I can start editing a 200MB file today in the same time I used to need to load a 1MB file).

Today I expect a text editor to fit in well under 1MB of RAM (not including the file being edited of course), support all the editing operations vi does, and go from "zero RAM usage" to "editing a 50MB text file" in three seconds or less. It's possible to double the startup speed of vi by removing the recovery feature, so I'm already tolerating nearly 50% overhead in that standard. Anything slower would be bloated--no matter what fonts or rendering capabilities it has. It's certainly possible to achieve this performance with a Unicode-capable, locale-aware text editor--the fact that nobody seems to have managed to do it yet doesn't mean that all known attempts so far haven't been bloated monsters. To the people who are creating these monsters: don't deny this. Your code, or some code you have chosen to depend on, *is* bloated. Please, keep trying until you get it right. It *is* possible.

OTOH, bloat is often tolerable, although still nothing to be proud of. I have a gigabyte of RAM in this laptop because it's easier and cheaper than trying to make a bunch of huge, multi-modular, multi-layered applications smaller.

All hail the speed demons (O'Reillynet)

Posted Nov 2, 2005 20:04 UTC (Wed) by alq666 (guest, #11220) [Link] (7 responses)

I must strongly disagree with the comparison of optimization and black-art. This type of mindset is
one of the reasons why a lot of software is really not so good, open-source or closed-source.
Optimization requires rigour, tools and a good deal of analysis before any modification is made to
the code.

For instance stripping .comment out of the binaries strikes me as a shot in the dark; I'm not sure
whether mapping an extra 400k in memory is such a huge performance hit...

All hail the speed demons (O'Reillynet)

Posted Nov 2, 2005 21:35 UTC (Wed) by darthmdh (guest, #8032) [Link] (1 responses)

For instance stripping .comment out of the binaries strikes me as a shot in the dark; I'm not sure whether mapping an extra 400k in memory is such a huge performance hit...

When you have 20-odd shared libraries linked in with every application, that 400k soon becomes 8Mb. Even if it was, all-up, 400k - what happens if that 400k of useless cruft then pushes the required memory over a page boundary? And the new required page due to memory pressure requires paging something else to disk?

You may call this nit-picking but there's a variety of applications where there is substantial memory pressure (think so-called embedded devices like mobile phones, pda's and network appliances) where 8Mb can be 25% (or more) of your RAM.

All hail the speed demons (O'Reillynet)

Posted Nov 2, 2005 22:59 UTC (Wed) by proski (subscriber, #104) [Link]

My interpretation of "Saved 400k over all OO.o libraries by stripping the .comment sections" is that it was a total number for all the OO.o libraries together. I don't think most of those libraries are used in the same time, and it's hard to imagine what it would take to occupy 8M with .comment sections, considering that most applications (even Mozilla) use much fewer libraries that OO.o.

All hail the speed demons (O'Reillynet)

Posted Nov 3, 2005 1:30 UTC (Thu) by clugstj (subscriber, #4020) [Link] (4 responses)

The .comment section is never loaded into memory so the only savings in this case is disk storage space.

All hail the speed demons (O'Reillynet)

Posted Nov 3, 2005 12:20 UTC (Thu) by ekj (guest, #1524) [Link] (3 responses)

Not quite.

First you save disk-space since the binaries gets smaller.

Secondly your program starts quicker since there is less data to read (even if they're not loaded, they're still read, and even if they're not read, then there's an extra seek to skip them, and even then the VFS migth decide to do readahead anyway and thus physically read and transfer to RAM parts of the file which your application never ever touches or reads.

OK, so it's probably not major. But it wouldn't surprise me if the measured speedup was quite measurable.

All hail the speed demons (O'Reillynet)

Posted Nov 4, 2005 13:28 UTC (Fri) by nix (subscriber, #2304) [Link] (2 responses)

Sections which are not loaded are not read except if they happen to be in the same page as the loaded sections. You can ignore them except for their disk space consumption.

All hail the speed demons (O'Reillynet)

Posted Nov 7, 2005 15:35 UTC (Mon) by lypanov (guest, #8858) [Link] (1 responses)

umm you realize that disks are laid out in fairly large tracks
rather than 4kb sections right?

All hail the speed demons (O'Reillynet)

Posted Nov 7, 2005 23:37 UTC (Mon) by nix (subscriber, #2304) [Link]

Yes. Even so, that large chunk containing a LOADed section would be read *whether or not the other parts of it happen to be LOADed or not*, so, again, the worst that .comment does is to reduce packing efficiency of LOADed sections. This is hardly a killer --- given that we have no effective tools to improve locality of reference in shared libraries anyway, we're wasting far more disk accesses on unnecessary paging due to poor packing of accessed functions.

glibc

Posted Nov 3, 2005 4:44 UTC (Thu) by chant (guest, #20286) [Link] (3 responses)

Some small part of this may be glibc bloat.

A 10 instruction AMD64 assembly program to
xor a register to 0
increment that register
exit when that register is 0 again

assembled to 731 bytes (not stripped of symbols/tables/etc).

When linked with gcc -static -o <progname> assembly.o
becomes
614,748 bytes.

That is incredible.

glibc

Posted Nov 3, 2005 11:44 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

Here's the start of what happens with an empty file, simplifying where things start to explode.

A object file like

int main (void) { return 0; }
obviously pulls in nothing. But it gets linked with crtn.o, and then this happens (every object file except for crtn.o herein is in libc.a; things which do not lead to a size explosion omitted for clarity):
crtn.o:
 __libc_start_main
  libc-start.o:
  __cxa_atexit
   cxa_atexit.o:
    malloc
     malloc.o:
      fprintf, abort...
       [pulls in stdio, which pulls in libio, which pulls in i18n code, &c]
(There are other paths inside malloc() which also pull in stdio code, too).

The situation is basically unchanged since Zack Weinberg posted this post in 1999, except that changes in glibc since then mean that his solution won't quite work (you need to redefine __cxa_atexit()...)

Fixing it is difficult, and since everyone in the glibc team hates wasting time on static linking-related stuff that doesn't affect the common case of dynamically linked programs, it's not likely to happen soon.

glibc

Posted Nov 3, 2005 11:45 UTC (Thu) by nix (subscriber, #2304) [Link]

Oh, and of course none of this is true of dynamically linked programs and pretty much none of it is paged into memory for the vast majority of programs (whether statically or dynamically linked), so this explains no actual bloat at all.

glibc

Posted Nov 4, 2005 18:28 UTC (Fri) by oak (guest, #2786) [Link]

And note that Glibc cannot even produce really static binaries...
Name resolving and security stuff are always loaded dynamically.

However, it's silly to do static binaries with Glibc, you should use a C-library that's "designed" for that.
For example uClibc. :-)

Re: All hail the speed demons (O'Reillynet)

Posted Nov 3, 2005 7:29 UTC (Thu) by gvy (guest, #11981) [Link]

I don't. :-/

All hail the grammar checkers

Posted Nov 3, 2005 14:02 UTC (Thu) by gravious (guest, #7662) [Link] (5 responses)

first sentence: hero's
_groan_
why bother going on?

All hail the grammar checkers

Posted Nov 4, 2005 1:40 UTC (Fri) by TwoTimeGrime (guest, #11688) [Link] (4 responses)

No kidding. Hero's what?

All hail the grammar checkers

Posted Nov 7, 2005 15:36 UTC (Mon) by lypanov (guest, #8858) [Link] (3 responses)

pathetic. grow up

All hail the grammar checkers

Posted Nov 7, 2005 17:40 UTC (Mon) by TwoTimeGrime (guest, #11688) [Link] (2 responses)

You may not value your time but don't disparage me because you feel that Slashdot-quality journalism is acceptable on LWN. If someone has something to say then I expect their sentences to be clearly formed so that their ideas can be properly communicated and understood. If the writer can't even take the time to communicate clearly then why should I bother to read the article? I have better things to do than try to figure out what the author was really trying to say. A confusing grammatical error in the first sentence and an irrelevant detour about developers with the "coolest names" doesn't help me understand what he's saying about Liunx application performance.

What's even more disappointing is that O'Reilly is a publisher that I have felt has always published quality books. It's a shame that their editors did such a poor job on this article before it was published.

All hail the grammar checkers

Posted Nov 8, 2005 22:32 UTC (Tue) by chromatic (guest, #26207) [Link] (1 responses)

It's a weblog, not an article. We the editors don't edit those.

All hail the grammar checkers

Posted Nov 9, 2005 2:43 UTC (Wed) by TwoTimeGrime (guest, #11688) [Link]

Thanks for the tip. It wasn't clear that it was a web log. After knowing that now and looking at the page again the only thing that gives an indication that it might be a web log is some breadcrumb navigation right above the author's photo. Everything else has O'Reilly branding that makes it look like it's regular editorial content. The fact that it's a weblog is not conspicuous.

If by "we the editors" you mean that you work there, you might want to pass this comment on to someone there. I will see if there's a feedback address on the web page and email them as well.

All hail the speed demons (O'Reillynet)

Posted Nov 4, 2005 10:58 UTC (Fri) by NAR (subscriber, #1313) [Link]

I've just checked, it took 62 seconds after I typed

oowriter 6k_long_file.sxw

to get to a point where I can move the cursor in Writer. And as a side effect, most of my other processes were swapped out - and this is an 1.6GHz processor with 512 MB RAM. At least it motivates me to write code so I could avoid writing implementation proposal documentation...

Bye,NAR

All hail the speed demons (O'Reillynet)

Posted Nov 4, 2005 17:16 UTC (Fri) by dps (guest, #5725) [Link]

I think bloat *is* a problem, and it applies to my code too... when I estimate 300--500 lines of code and the actually code is more like 1500 lines this indicates a problem to me.

That said being big is not necessaitly bad... my current choice of CGI library is big because it was designed to allow you to decompress zip archives and feed the to cgi programs uncompressed, and not be vulnerable to zips of death consuming all avialable disc space. This feature now works.

The infrastructure for this is overkill for some applications but using it made sense if I had to have it anyway. The ability to replay a request when using gdb is an addictive side benefit. Unfortunately this library is not generally available, so not bother asking for a copy.

I reduced some string functions from 30% of a profile to under 5% by making them process 2 characters at a time. This was hard work and those functions are a lot bigger and more complex than the natural implementation.


Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds