Our bloat problem

Posted Aug 6, 2005 22:46 UTC (Sat) by nix (subscriber, #2304)
In reply to: Our bloat problem by dmantione
Parent article: Our bloat problem

I'm sorry, but much of this post is just plain wrong.

String operations become more expensive. Processing an UTF-8 string is usually n times more expensive than processing an ASCII string, because there is no match anymore between byte numbers an character numbers.

Actually, only some operations (random access, basically) become more expensive. Random access inside strings is rare, and apps that do it a lot (like Emacs and vim) have had to face this problem for years.

We can't make the world speak English, no matter how much we might like to. Half the world's population speaks languages that require Unicode.

o Conversion tables need to be in memory. Most applications load their own conversion tables, which means they are n times in memory. These tables are also loaded on startup, decreasing application startup times.

I think you meant `increasing' there. :) but yes, this is a potential problem. It would be nice if there were some unicode daemon and corresponding library (that talked to the daemon) which could cache such things... but then again, the extra context switches to talk to it might just slow thigns right back down again.

* Java, Mono, Python, Perl, TCL - These programming languages require runtime environments. The runtime environment needs to be loaded, can be slow itself and most importantly can use quite a bit of memory. It especially becomes bad if one multiple runtime environents get loaded on one desktop. Script languages can be good for scripts, but are bad for the desktop. The popularity of Java and Mono is propably a bad thing regarding the bloat on our machine.

Actually, Python, Perl and Tcl have very small runtime environments (especially by comparison with the rampaging monster which is Sun's JRE). The problem with these languages is that their data representations, by explicit design decision, trade off size for speed. With the ever-widening gulf between L1 cache and RAM speeds, maybe some of these tradeoffs need to be revisited.

* C++ - Even the de facto standard C++ is sometimes a problem. Especially if templates are being used C++ compilers output large amounts of code behind a programmers back. This can cause huge libraries and executables.

Now that's just wrong. The use of templates in C++ only leads to huge code sizes if you don't know what you're doing: and you can write crap code in any language.

The size problem with C++ at present is the (un-prelinkable) relocations, two per virtual method table entry... this can end up *huge*, even more so in apps like OOo which can't be effectively prelinked because (for reasons beyond my comprehension) they dlopen() everything and so most of them goes unprelinked.

There was a paper recently on reducing OOo memory consumption which suggested radical changes to the C++ ABI. Sorry, but that isn't going to fly :)

* Shared libraries - Many programmers are under the impression that use of shared libraries is free. WRONG. They need to be loaded, resolved and even if you only use part of them a large amount of code is executed within them before you know it.

Wrong. Most shared libraries contain no, or very few, constructors, and so no code is executed within them until you call a function in them. (Now many libraries do a lot of initialization work when you call that function, but that'd also be true if the library were statically linked...)

The dynamic loader has to do more work the more libraries are loaded, but ld-linux.so has been optimized really rather hard. :)

Oh, and it doesn't have to load and resolve things immediately: read Ulrich Drepper's paper on DSO loading. PLT relocations (i.e. the vast majority, that correspond to callable functions, rather than those which correspond to data) are normally processed lazily, incurring a CPU time and memory hit only when the function is (first) called.

o Libc is just a C runtime library but has unfortunately grown to several megabytes.

That's because it also implements all of POSIX that isn't implemented by the kernel, and passes the rest through to the kernel. Oh, and it also supports every app built against it since glibc2 was released, which means that old interfaces must be retained (even the bugs in them must be retained!)

This nceessarily costs memory, but most of it won't be paged in (and thus won't cost anything) unless you run apps that need it.

You do rather seem to have forgotten that binaries and shared libraries are demand-paged!