User: Password:
|
|
Subscribe / Log in / New account

Optimizing Linker Load Times

Optimizing Linker Load Times

Posted Jul 27, 2006 2:05 UTC (Thu) by martinfick (subscriber, #4455)
Parent article: Optimizing Linker Load Times

Maybe someone with a clue could explain to me why library symbol lookup is
not a perfect candidate for a binary tree? Afterall, lookup times are
what need to be optimized, not inserts (they only happen during compile
right?) Who cares if libraries take longer to compile but runtimes are
substantially sped up? Is it simply a matter of not breaking the ELF
standard?


(Log in to post comments)

Optimizing Linker Load Times

Posted Jul 27, 2006 3:14 UTC (Thu) by jreiser (subscriber, #11027) [Link]

I dare you to produce the code, and the measurements. :-)

Optimizing Linker Load Times

Posted Jul 27, 2006 3:49 UTC (Thu) by bluefoxicy (guest, #25366) [Link]

I actually proposed filling hash buckets with Patricia Tries, but never wrote and tested it. I like to put my ideas out there but keep in mind I'm not an expert on this stuff by far ;) (guys feel free to point out reasons this won't work)

A Patricia trie would basically let you find a bucket; locate the first character of the symbol; and then from there the string only goes as far as ALL symbols in that bucket go with a common prefix. Then it would fork towards each different symbol.

In other words you'd have this:

apple
applecore
appliance
brick

appl--e--core
    |-ianc
brick

The first is a list, the second is a Patricia Trie. The difference is that to find 'appliance' you have to compare 'apple', 'apple', 'appliance\0' with a list (if it's in that exact order); whereas with the Patricia Trie you would compare 'appl\0', 'e', 'iance\0'. 'brick' would be 'a' 'a' 'a' 'brick\0' vs 'a' 'brick\0'.

The size of the storage would be smaller and the data locality would thus be tighter. You would avoid problems with similar prefixes (hello C++ crap) and the trie could be implemented to give these guarantees fairly easily as long as they were kept small (i.e. big tries might have a pointer forward to the next segment a few pages ahead; so I said to stick them in hash buckets to keep them small and reduce the number of forks).

The data would be stored at each branch of the trie. So 'appl' would resolve, it would just resolve to "there is no value here"; and 'e' would store the dynsym entry for 'apple'; etc.

This could be done without breaking the ELF standard by using a different section and letting the dynamic linker use it instead of the typical sections.

The problem isn't so simple though. This would help with the case of "there's 3 symbols in every bucket that start with some 400 character string and we're wasting tons of cycles doing millions of character comparisons on load;" but overall it's not going to become instant look-up. You still have to walk through all loaded libraries; this is what -Bdirect linking takes care of for you.

In short, the reasons we haven't used binary tries are:

  • They're not optimal. Patricia Tries may not even be optimal, and they look (to me) to be a pretty big improvement over a flat linked list of strings!
  • They'd need a new section, which means more space usage on disk (no more memory usage though; we wouldn't use the original section with the linked list, so it'd never be faulted into memory)
  • Nobody has come up with a mechanism here, written it, tested it, and proposed it.
  • I'm not sure the speed boost here. Reading into Drepper's paper I'd guess ELIMINATING redundant character comparisons would offer at least 20%; but he only gives a guess, not a measurement, and doesn't say how this relates to overall dynamic link time. So hey, we could be talking about 5%, 10%, or 30% or 60% or who knows.

Optimizing Linker Load Times

Posted Jul 27, 2006 10:04 UTC (Thu) by nix (subscriber, #2304) [Link]

Binary trees generally have awful cache locality. Note that most of the speedups from the DT_GNU_HASH patch came from considering the impact of code on the dcache: even things like starting hash chains with a tiny bitmap showing which elements were in use was useful to avoid the cache misses involved in scanning the rest of the chain.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds