|
|
Log in / Subscribe / Register

Shared libraries

Shared libraries

Posted Nov 26, 2025 7:02 UTC (Wed) by mb (subscriber, #50428)
In reply to: Shared libraries by collinfunk
Parent article: APT Rust requirement raises questions

Yes, I know.
But the question was: Is this due to static linking?


to post comments

Shared libraries

Posted Nov 26, 2025 11:21 UTC (Wed) by bluca (subscriber, #118303) [Link] (29 responses)

Yes, having huge binaries like Rust has means there is a large cost in loading them, as opposed as more efficient and smaller binaries using shared libraries that are already loaded in memory anyway. This is a well known pitfall, that for some reason a lot of people seem to have suddenly forgot about

Shared libraries

Posted Nov 26, 2025 17:08 UTC (Wed) by mb (subscriber, #50428) [Link] (27 responses)

We are talking about this:

>$ time for i in $(seq 10000); do /lib/cargo/bin/coreutils/true; done

Does this have a loading into memory cost of 1 or 10000?
I would be *very* surprised if this would read the binary from disk 10000 times.

Also please note that gnu-true doesn't use any shared library except for libc. Which is exactly the same in Rust.
Saying that gnu-true is faster than Rust-true due to dynamic linking is clear nonsense, because there is no difference w.r.t. dynamic linking between them.

If Rust-true is slower than gnu-true due to its bigger size, then this has nothing to do with dynamic or static linking.

Such small binaries are typically larger than their C counterpart, because the Rust std library is typically not rebuilt together with the program and therefore contains lots of unused code.

Shared libraries

Posted Nov 26, 2025 18:46 UTC (Wed) by bluca (subscriber, #118303) [Link] (26 responses)

> I would be *very* surprised if this would read the binary from disk 10000 times.

Depends. Do you have enough memory available and is the system otherwise idle? Or is it near capacity with no room to spare and higher priority processes running and saturating whatever cache is available?
It's the difference between real production systems and synthetic benchmarks.

> Also please note that gnu-true doesn't use any shared library except for libc. Which is exactly the same in Rust.

It is not, because in the rust case you have the many-tenctacle monster that is the rust stdlib and whatever the cat, er, cargo dragged in that morning. So for coreutils really most of the stuff is in glibc so it's all already mapped, in the uutils case you load most of it every time from scratch.

Shared libraries

Posted Nov 26, 2025 19:41 UTC (Wed) by farnz (subscriber, #17727) [Link] (25 responses)

Linux loads all ELF objects (binaries and libraries alike) on demand. If you run it in a loop, and it doesn't have enough memory to load the parts used for true and keep it in page cache, then that's the same whether it's dynamically linked or statically linked.

In both cases, all the binaries involved are simply mmaped into place, and normal demand paging takes care of reading them as needed.

Shared libraries

Posted Nov 26, 2025 20:46 UTC (Wed) by bluca (subscriber, #118303) [Link] (24 responses)

> In both cases, all the binaries involved are simply mmaped into place, and normal demand paging takes care of reading them as needed.

The point is, once again, that glibc and libssl and other core system libraries will always already be loaded on any given system. Your 100MB chunky rust binary won't.

Shared libraries

Posted Nov 26, 2025 20:51 UTC (Wed) by farnz (subscriber, #17727) [Link] (23 responses)

Not necessarily - any page that includes a relocation is not shared. Add in that any shared page of any core system library that's not actively in use can get dropped from the page cache (since the kernel knows it can reload it on demand), and you could well find (and indeed, I see on my laptop) that glibc pages get paged in when you start a dynamically linked binary because they're not in cache already.

Shared libraries

Posted Nov 26, 2025 21:15 UTC (Wed) by bluca (subscriber, #118303) [Link] (22 responses)

And? It's theoretically possible sure, but it's not a very interesting argument in the real world

Shared libraries

Posted Nov 26, 2025 21:17 UTC (Wed) by farnz (subscriber, #17727) [Link] (21 responses)

Well, you'd have to ask this commenter why they thought the Rust binary would be read in 10000 times, but the C binary would not.

Shared libraries

Posted Nov 26, 2025 21:54 UTC (Wed) by bluca (subscriber, #118303) [Link] (20 responses)

Because there are ~0% chances your 100MB rust binary is identical to any other binary/library already loaded. The chances of glibc being already loaded are ~100%. Of course if one wants to be pointlessly pedantic they can set up pointless artificial experiments triggering the opposite, but it doesn't really matter, spherical chickens in a vacuum are fun but nothing more beyond that

Shared libraries

Posted Nov 27, 2025 0:44 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> Because there are ~0% chances your 100MB rust binary is identical to any other binary/library already loaded.

Unless it needs an nss module. Or it needs to be relocated.

Shared libraries

Posted Nov 27, 2025 9:06 UTC (Thu) by farnz (subscriber, #17727) [Link] (18 responses)

Firstly, as has already been said, I don't load the whole 100 MB at a time - the kernel demand pages the bits I need.

Secondly, and confirmed experimentally, significant chunks of glibc contain relocations, at least under Debian/aarch64. I find that with an eMMC, demand paging a 100 MB statically linked binary is faster than handling the relocations needed to dynamically link a library.

And note that this tradeoff would have been very different in the days when a 5400 RPM HDD was "fast". But technology has changed - and the speed with which I can demand page a binary is much higher than the speed with which my embedded CPU can process the relocations and associated page faults, along with also having to demand page in the unshared pages that it needs to do that.

Shared libraries

Posted Nov 27, 2025 10:21 UTC (Thu) by LtWorf (subscriber, #124958) [Link] (2 responses)

This sounds like a machine purpose built to win this benchmark honestly. Is it a device that one can buy? How many of this item exist? Should we optimise for this?

Shared libraries

Posted Nov 27, 2025 10:28 UTC (Thu) by farnz (subscriber, #17727) [Link]

It's an embedded Linux system - cheap parts throughout.

But, FWIW, this also applied to big servers at a hyperscaler - the cost of processing relocations outweighed the cost of paging in a bit more from the application binary from an SSD. Systems with HDDs only benefited speed-wise from dynamic linking, systems with SSDs benefited from static linking.

Shared libraries

Posted Nov 27, 2025 12:14 UTC (Thu) by malmedal (subscriber, #56172) [Link]

> Is it a device that one can buy?

Don't know which device farnz is talking about, but the description matches things like NanoKVM.

Shared libraries

Posted Nov 27, 2025 12:56 UTC (Thu) by bluca (subscriber, #118303) [Link] (14 responses)

It's the exact opposite. With an eMMC going to the disk means you are dead in the water. There's no universe in which using a slow eMMC (eg something half-duplex single-channel) it's better to go to disk than doing memory operations. I have no idea what kind of weird device you have where loading stuff from disk is slower than loading stuff from ram.

Shared libraries

Posted Nov 27, 2025 13:04 UTC (Thu) by farnz (subscriber, #17727) [Link] (13 responses)

This is measured on an Alliance eMMC device. Taking the same C program, and statically linking it, makes it faster to load than dynamically linking it, on the same processor.

I have no idea why you think that loading from eMMC (which is needed in the dynamically linked case, too, because the relocations have already been unshared, and the original data has to be reloaded), then doing the relocations, then doing more loading, is faster than just loading.

Shared libraries

Posted Nov 27, 2025 17:11 UTC (Thu) by bluca (subscriber, #118303) [Link] (12 responses)

Then... don't unload it? "Doctor, it hurts when I do this"

Shared libraries

Posted Nov 28, 2025 5:28 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (11 responses)

If you don't keep touching an executable (and multicore binaries are bad, mmkay?), then it'll likely be paged out by something. A multicall static binary is more likely to stay full in cache, because it keeps getting touched.

Shared libraries

Posted Nov 28, 2025 10:25 UTC (Fri) by bluca (subscriber, #118303) [Link] (10 responses)

You _really_ need to go out of your way to get glibc and other core system libraries paged out, on a system that is actually getting used. Of course if it's all sitting idle doing nothing then anything hardly matters.

Shared libraries

Posted Nov 28, 2025 10:43 UTC (Fri) by farnz (subscriber, #17727) [Link] (5 responses)

You really, really don't if you're RAM constrained (welcome to embedded!).

The kernel is not operating at the level of entire files; it's operating at the page level. Pages that contain relocations are only shared up until the relocation is overwritten by the dynamic linker; pages where the only data used at runtime is relocation metadata are only used up until all the relocations have been handled, at which point they become unused pages.

You can thus have 90% of glibc in RAM, but the critical parts for process startup are not, and you have to do small I/Os to get the missing pages.

Shared libraries

Posted Nov 28, 2025 11:01 UTC (Fri) by bluca (subscriber, #118303) [Link] (4 responses)

That can only happen if you somehow have a system that is both incredibly busy, but also incredibly idle, so that resources are fully saturated and things get aggressively evicted from the page cache, but somehow the system is completely static otherwise. IE, you need to go out of your way to artificially create such a situation. On a normal, busy Linux systems with socket activated services, timer activated services, dbus activated services, etc etc, there are pretty much always processes being started and stopped. If new processes are started so _rarely_ that you don't even have glibc in the page cache, then obviously starting processes is not a bottleneck and it doesn't matter anyway one way or the other.

I'm not really sure why you are trying to conjure up such a contrived and unlikely example just to prove a point? It's not really working

Shared libraries

Posted Nov 28, 2025 11:08 UTC (Fri) by farnz (subscriber, #17727) [Link] (3 responses)

No - it happens when the system is continuously busy, but new process starting is on the "once every 15 minutes" scale, not "every few seconds".

And I do have glibc in the page cache - I just don't have the stuff that's only loaded on process starting in cache.

But I get it, my experience and yours don't match, so you're going to tell me I'm wrong, rather than address a real situation.

Shared libraries

Posted Nov 28, 2025 13:40 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (2 responses)

I think that even if it does happen on some specially built devices with some very specific workloads, it would be counterproductive to optimise for that and make the common case slower for everyone else in the world instead.

Shared libraries

Posted Nov 29, 2025 20:53 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

But we still need to support obsolete hardware!

FWIW, static binaries help with responsiveness for most users. I encourage everyone here to try the fully static distro that I mentioned ( . It really feels more snappy.

Shared libraries

Posted Nov 29, 2025 22:08 UTC (Sat) by LtWorf (subscriber, #124958) [Link]

It's not "obsolete", it's just custom built. And it does work just fine with shared libraries for most workloads except whatever weird thing farnz is doing with it.

Feeling more snappy at 2 minutes after boot doesn't necessarily mean it's more snappy 5 days later.

Shared libraries

Posted Nov 30, 2025 1:28 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> You _really_ need to go out of your way to get glibc and other core system libraries paged out

Nope. glibc is loaded at a random location, and when it needs to be linked into an executable, the OS needs to do relocations to resolve the addresses. This information can be easily paged out, especially for rarely used binaries.

glibc by itself is not too large, but when you add other libraries like libstdc++, libz, libsystemd, and others it starts adding up. This is compounded by libraries that use NSS plugins, because dlopen() requires new relocations each time (AFAIR?).

Shared libraries

Posted Dec 1, 2025 11:10 UTC (Mon) by paulj (subscriber, #341) [Link] (1 responses)

Does this imply the static binaries you advocate for are a lot less secure, cause - other than perhaps the initial base address of where it is loaded - all the other addresses are a known entity? Does it not significantly reduce security, by largely negating ALSR? (So far as ALSR provides security - I'm aware there is the odd bit of dissent on the merits).

Shared libraries

Posted Dec 1, 2025 18:22 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

Not with the PIC (Position-Independent Code). The kernel itself does relocations when the binary is loaded.

Shared libraries

Posted Dec 1, 2025 13:16 UTC (Mon) by malmedal (subscriber, #56172) [Link]

> OS needs to do relocations

It would be good to properly measure how big this effect is. I did a very quick test running chrome under perf and immediately killing it when it finished starting up. On my machine with a hot cache ld-linux used 7% of cpu-time. This does not prove there is a problem, but it is indicative that it is worth investigating properly.

Shared libraries

Posted Nov 26, 2025 18:25 UTC (Wed) by Wol (subscriber, #4433) [Link]

That's assuming you load it!

Get the compiler/linker to segment object code into 4K blocks or whatever the figure is, map the file into ram, and only load those bits that are used.

Cheers,
Wol


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds