|
|
Log in / Subscribe / Register

Shared libraries

Shared libraries

Posted Nov 25, 2025 11:22 UTC (Tue) by LtWorf (subscriber, #124958)
In reply to: Shared libraries by mb
Parent article: APT Rust requirement raises questions

Loading time and memory usage are larger with static linking if we remember that 1 process per machine is hardly a common usecase.


to post comments

Shared libraries

Posted Nov 25, 2025 17:33 UTC (Tue) by ssokolow (guest, #94568) [Link]

You'll probably want to give Do your installed programs share dynamic libraries? a read if you haven't already.

Shared libraries

Posted Nov 25, 2025 18:01 UTC (Tue) by mb (subscriber, #50428) [Link] (39 responses)

>Loading time [..] are larger with static linking

Why would that be the case?

I remember the days twenty years ago where we had slow CPUs and we did prelink workarounds to reduce the dynamic linking overhead at runtime to reduce the startup time noticably.
Has the dynamic linking runtime overhead been reduced to almost zero since then? Yes, I know it has been reduced by some degree, so that it's not really noticeable anymore. But is it faster than static linking now? How can that be?

Shared libraries

Posted Nov 25, 2025 18:37 UTC (Tue) by joib (subscriber, #8541) [Link] (38 responses)

Perhaps faster in the sense it's more likely the pages from the shared library are already in memory and thus don't need to be paged in from disk? Particularly for some widely used library like libc; for the long tail of libraries used by a single or just a few applications less so.

Shared libraries

Posted Nov 25, 2025 19:16 UTC (Tue) by mb (subscriber, #50428) [Link] (36 responses)

dylibs are much bigger because they have to include everything and not only the bits that the application needs.
And they are scattered around in the file system which causes lookups and seeks.
I doubt that this can be faster on average.
Maybe it's faster for small applications where everything but the main binary is already in the page cache.
But I doubt it's true in a general sense.

Are there actual numbers from real life examples that show that Rust startup times are slower due to static linking?
Are uutils slower than gnu coreutils, just because they are statically linked?

And libc is dynamically linked to Rust programs.

These days in practice probably neither dynamic not static linking is slow in the days of extremely fast CPUs and SSDs.

Shared libraries

Posted Nov 26, 2025 4:35 UTC (Wed) by collinfunk (subscriber, #169873) [Link] (35 responses)

Actually that is a great example. The way that Ubuntu builds them at least, uutils has much slower start up time than GNU coreutils.

Here is an example:

$ podman run --rm -it ubuntu:25.10
$ time for i in $(seq 10000); do /lib/cargo/bin/coreutils/true; done
real 0m21.966s
$ time for i in $(seq 10000); do gnutrue; done
real 0m7.110s

This is quite important since these commands are executed very frequently. I believe they are looking into improving it.

Shared libraries

Posted Nov 26, 2025 5:42 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

See this comment[1] where coreutils is "just as slow" when using a multi-call binary. The claim is that it is due to the monolithic library needing to load far more (usually unused) dynamic dependencies.

[1] https://lwn.net/Articles/1043239/

Shared libraries

Posted Nov 26, 2025 7:02 UTC (Wed) by mb (subscriber, #50428) [Link] (30 responses)

Yes, I know.
But the question was: Is this due to static linking?

Shared libraries

Posted Nov 26, 2025 11:21 UTC (Wed) by bluca (subscriber, #118303) [Link] (29 responses)

Yes, having huge binaries like Rust has means there is a large cost in loading them, as opposed as more efficient and smaller binaries using shared libraries that are already loaded in memory anyway. This is a well known pitfall, that for some reason a lot of people seem to have suddenly forgot about

Shared libraries

Posted Nov 26, 2025 17:08 UTC (Wed) by mb (subscriber, #50428) [Link] (27 responses)

We are talking about this:

>$ time for i in $(seq 10000); do /lib/cargo/bin/coreutils/true; done

Does this have a loading into memory cost of 1 or 10000?
I would be *very* surprised if this would read the binary from disk 10000 times.

Also please note that gnu-true doesn't use any shared library except for libc. Which is exactly the same in Rust.
Saying that gnu-true is faster than Rust-true due to dynamic linking is clear nonsense, because there is no difference w.r.t. dynamic linking between them.

If Rust-true is slower than gnu-true due to its bigger size, then this has nothing to do with dynamic or static linking.

Such small binaries are typically larger than their C counterpart, because the Rust std library is typically not rebuilt together with the program and therefore contains lots of unused code.

Shared libraries

Posted Nov 26, 2025 18:46 UTC (Wed) by bluca (subscriber, #118303) [Link] (26 responses)

> I would be *very* surprised if this would read the binary from disk 10000 times.

Depends. Do you have enough memory available and is the system otherwise idle? Or is it near capacity with no room to spare and higher priority processes running and saturating whatever cache is available?
It's the difference between real production systems and synthetic benchmarks.

> Also please note that gnu-true doesn't use any shared library except for libc. Which is exactly the same in Rust.

It is not, because in the rust case you have the many-tenctacle monster that is the rust stdlib and whatever the cat, er, cargo dragged in that morning. So for coreutils really most of the stuff is in glibc so it's all already mapped, in the uutils case you load most of it every time from scratch.

Shared libraries

Posted Nov 26, 2025 19:41 UTC (Wed) by farnz (subscriber, #17727) [Link] (25 responses)

Linux loads all ELF objects (binaries and libraries alike) on demand. If you run it in a loop, and it doesn't have enough memory to load the parts used for true and keep it in page cache, then that's the same whether it's dynamically linked or statically linked.

In both cases, all the binaries involved are simply mmaped into place, and normal demand paging takes care of reading them as needed.

Shared libraries

Posted Nov 26, 2025 20:46 UTC (Wed) by bluca (subscriber, #118303) [Link] (24 responses)

> In both cases, all the binaries involved are simply mmaped into place, and normal demand paging takes care of reading them as needed.

The point is, once again, that glibc and libssl and other core system libraries will always already be loaded on any given system. Your 100MB chunky rust binary won't.

Shared libraries

Posted Nov 26, 2025 20:51 UTC (Wed) by farnz (subscriber, #17727) [Link] (23 responses)

Not necessarily - any page that includes a relocation is not shared. Add in that any shared page of any core system library that's not actively in use can get dropped from the page cache (since the kernel knows it can reload it on demand), and you could well find (and indeed, I see on my laptop) that glibc pages get paged in when you start a dynamically linked binary because they're not in cache already.

Shared libraries

Posted Nov 26, 2025 21:15 UTC (Wed) by bluca (subscriber, #118303) [Link] (22 responses)

And? It's theoretically possible sure, but it's not a very interesting argument in the real world

Shared libraries

Posted Nov 26, 2025 21:17 UTC (Wed) by farnz (subscriber, #17727) [Link] (21 responses)

Well, you'd have to ask this commenter why they thought the Rust binary would be read in 10000 times, but the C binary would not.

Shared libraries

Posted Nov 26, 2025 21:54 UTC (Wed) by bluca (subscriber, #118303) [Link] (20 responses)

Because there are ~0% chances your 100MB rust binary is identical to any other binary/library already loaded. The chances of glibc being already loaded are ~100%. Of course if one wants to be pointlessly pedantic they can set up pointless artificial experiments triggering the opposite, but it doesn't really matter, spherical chickens in a vacuum are fun but nothing more beyond that

Shared libraries

Posted Nov 27, 2025 0:44 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> Because there are ~0% chances your 100MB rust binary is identical to any other binary/library already loaded.

Unless it needs an nss module. Or it needs to be relocated.

Shared libraries

Posted Nov 27, 2025 9:06 UTC (Thu) by farnz (subscriber, #17727) [Link] (18 responses)

Firstly, as has already been said, I don't load the whole 100 MB at a time - the kernel demand pages the bits I need.

Secondly, and confirmed experimentally, significant chunks of glibc contain relocations, at least under Debian/aarch64. I find that with an eMMC, demand paging a 100 MB statically linked binary is faster than handling the relocations needed to dynamically link a library.

And note that this tradeoff would have been very different in the days when a 5400 RPM HDD was "fast". But technology has changed - and the speed with which I can demand page a binary is much higher than the speed with which my embedded CPU can process the relocations and associated page faults, along with also having to demand page in the unshared pages that it needs to do that.

Shared libraries

Posted Nov 27, 2025 10:21 UTC (Thu) by LtWorf (subscriber, #124958) [Link] (2 responses)

This sounds like a machine purpose built to win this benchmark honestly. Is it a device that one can buy? How many of this item exist? Should we optimise for this?

Shared libraries

Posted Nov 27, 2025 10:28 UTC (Thu) by farnz (subscriber, #17727) [Link]

It's an embedded Linux system - cheap parts throughout.

But, FWIW, this also applied to big servers at a hyperscaler - the cost of processing relocations outweighed the cost of paging in a bit more from the application binary from an SSD. Systems with HDDs only benefited speed-wise from dynamic linking, systems with SSDs benefited from static linking.

Shared libraries

Posted Nov 27, 2025 12:14 UTC (Thu) by malmedal (subscriber, #56172) [Link]

> Is it a device that one can buy?

Don't know which device farnz is talking about, but the description matches things like NanoKVM.

Shared libraries

Posted Nov 27, 2025 12:56 UTC (Thu) by bluca (subscriber, #118303) [Link] (14 responses)

It's the exact opposite. With an eMMC going to the disk means you are dead in the water. There's no universe in which using a slow eMMC (eg something half-duplex single-channel) it's better to go to disk than doing memory operations. I have no idea what kind of weird device you have where loading stuff from disk is slower than loading stuff from ram.

Shared libraries

Posted Nov 27, 2025 13:04 UTC (Thu) by farnz (subscriber, #17727) [Link] (13 responses)

This is measured on an Alliance eMMC device. Taking the same C program, and statically linking it, makes it faster to load than dynamically linking it, on the same processor.

I have no idea why you think that loading from eMMC (which is needed in the dynamically linked case, too, because the relocations have already been unshared, and the original data has to be reloaded), then doing the relocations, then doing more loading, is faster than just loading.

Shared libraries

Posted Nov 27, 2025 17:11 UTC (Thu) by bluca (subscriber, #118303) [Link] (12 responses)

Then... don't unload it? "Doctor, it hurts when I do this"

Shared libraries

Posted Nov 28, 2025 5:28 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (11 responses)

If you don't keep touching an executable (and multicore binaries are bad, mmkay?), then it'll likely be paged out by something. A multicall static binary is more likely to stay full in cache, because it keeps getting touched.

Shared libraries

Posted Nov 28, 2025 10:25 UTC (Fri) by bluca (subscriber, #118303) [Link] (10 responses)

You _really_ need to go out of your way to get glibc and other core system libraries paged out, on a system that is actually getting used. Of course if it's all sitting idle doing nothing then anything hardly matters.

Shared libraries

Posted Nov 28, 2025 10:43 UTC (Fri) by farnz (subscriber, #17727) [Link] (5 responses)

You really, really don't if you're RAM constrained (welcome to embedded!).

The kernel is not operating at the level of entire files; it's operating at the page level. Pages that contain relocations are only shared up until the relocation is overwritten by the dynamic linker; pages where the only data used at runtime is relocation metadata are only used up until all the relocations have been handled, at which point they become unused pages.

You can thus have 90% of glibc in RAM, but the critical parts for process startup are not, and you have to do small I/Os to get the missing pages.

Shared libraries

Posted Nov 28, 2025 11:01 UTC (Fri) by bluca (subscriber, #118303) [Link] (4 responses)

That can only happen if you somehow have a system that is both incredibly busy, but also incredibly idle, so that resources are fully saturated and things get aggressively evicted from the page cache, but somehow the system is completely static otherwise. IE, you need to go out of your way to artificially create such a situation. On a normal, busy Linux systems with socket activated services, timer activated services, dbus activated services, etc etc, there are pretty much always processes being started and stopped. If new processes are started so _rarely_ that you don't even have glibc in the page cache, then obviously starting processes is not a bottleneck and it doesn't matter anyway one way or the other.

I'm not really sure why you are trying to conjure up such a contrived and unlikely example just to prove a point? It's not really working

Shared libraries

Posted Nov 28, 2025 11:08 UTC (Fri) by farnz (subscriber, #17727) [Link] (3 responses)

No - it happens when the system is continuously busy, but new process starting is on the "once every 15 minutes" scale, not "every few seconds".

And I do have glibc in the page cache - I just don't have the stuff that's only loaded on process starting in cache.

But I get it, my experience and yours don't match, so you're going to tell me I'm wrong, rather than address a real situation.

Shared libraries

Posted Nov 28, 2025 13:40 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (2 responses)

I think that even if it does happen on some specially built devices with some very specific workloads, it would be counterproductive to optimise for that and make the common case slower for everyone else in the world instead.

Shared libraries

Posted Nov 29, 2025 20:53 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

But we still need to support obsolete hardware!

FWIW, static binaries help with responsiveness for most users. I encourage everyone here to try the fully static distro that I mentioned ( . It really feels more snappy.

Shared libraries

Posted Nov 29, 2025 22:08 UTC (Sat) by LtWorf (subscriber, #124958) [Link]

It's not "obsolete", it's just custom built. And it does work just fine with shared libraries for most workloads except whatever weird thing farnz is doing with it.

Feeling more snappy at 2 minutes after boot doesn't necessarily mean it's more snappy 5 days later.

Shared libraries

Posted Nov 30, 2025 1:28 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> You _really_ need to go out of your way to get glibc and other core system libraries paged out

Nope. glibc is loaded at a random location, and when it needs to be linked into an executable, the OS needs to do relocations to resolve the addresses. This information can be easily paged out, especially for rarely used binaries.

glibc by itself is not too large, but when you add other libraries like libstdc++, libz, libsystemd, and others it starts adding up. This is compounded by libraries that use NSS plugins, because dlopen() requires new relocations each time (AFAIR?).

Shared libraries

Posted Dec 1, 2025 11:10 UTC (Mon) by paulj (subscriber, #341) [Link] (1 responses)

Does this imply the static binaries you advocate for are a lot less secure, cause - other than perhaps the initial base address of where it is loaded - all the other addresses are a known entity? Does it not significantly reduce security, by largely negating ALSR? (So far as ALSR provides security - I'm aware there is the odd bit of dissent on the merits).

Shared libraries

Posted Dec 1, 2025 18:22 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

Not with the PIC (Position-Independent Code). The kernel itself does relocations when the binary is loaded.

Shared libraries

Posted Dec 1, 2025 13:16 UTC (Mon) by malmedal (subscriber, #56172) [Link]

> OS needs to do relocations

It would be good to properly measure how big this effect is. I did a very quick test running chrome under perf and immediately killing it when it finished starting up. On my machine with a hot cache ld-linux used 7% of cpu-time. This does not prove there is a problem, but it is indicative that it is worth investigating properly.

Shared libraries

Posted Nov 26, 2025 18:25 UTC (Wed) by Wol (subscriber, #4433) [Link]

That's assuming you load it!

Get the compiler/linker to segment object code into 4K blocks or whatever the figure is, map the file into ram, and only load those bits that are used.

Cheers,
Wol

Shared libraries

Posted Nov 26, 2025 8:34 UTC (Wed) by joib (subscriber, #8541) [Link]

true and false are probably the ones that show the largest difference in startup cost since they are so trivial, but OTOH they are both shell builtins so the actual /usr/bin/{true,false} are probably almost never used.

For more complex binaries I suppose it comes down to more open()'s, and more but smaller memory mappings for dynamic linking vs fewer bigger memory mappings for the static linking case.

Shared libraries

Posted Nov 26, 2025 8:37 UTC (Wed) by taladar (subscriber, #68407) [Link] (1 responses)

This can't be due to load times of the binary since that would only apply to the first of your 10000 iterations unless you want to make the claim that the filesystem cache discards it every time.

Shared libraries

Posted Nov 26, 2025 13:57 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

As I pointed out in a sibling comment, it seems to be that a multi-call `true` loads up libraries needed for (say) `sha256sum` even if they're not used because even coreutils behaves this way with a multi-call binary.

Shared libraries

Posted Nov 25, 2025 19:32 UTC (Tue) by bluca (subscriber, #118303) [Link]

Precisely this - most likely you are only loading your process executable in memory, all its dependencies are already paged in. It's not something you noticed on an x86 desktop with fast storage, but it's definitely something you notice on a resource strapped arm machine with a crappy emmc. And even on x86, it's definitely something you notice when the resources you are using are resources you can't give to paying customers (eg: a virtualization host). At that point, running a bunch of static linked massive binaries and all the extra memory they take starts actually costing you money.

Shared libraries

Posted Nov 25, 2025 18:10 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

> 1 process per machine is hardly a common usecase.

I would argue that "few processes per machine" is one of the most common use-cases, because BusyBox systems are like that. And they probably outnumber all other Linuxes except Android.

Shared libraries

Posted Nov 25, 2025 21:52 UTC (Tue) by Wol (subscriber, #4433) [Link] (3 responses)

Given that the majority of computers are single-user, with said user running a single app full-screen, I'd say IN PRACTICE pretty much all computers are only running one app at a time from the user's PoV.

Cheers,
Wol

Shared libraries

Posted Nov 26, 2025 23:16 UTC (Wed) by LtWorf (subscriber, #124958) [Link] (2 responses)

Uh?

Processes in linux keep running even if they are not owning the windows which has focus.

I don't understand what you're trying to say here.

Shared libraries

Posted Nov 27, 2025 8:03 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

Speaking on behalf of the users - "What's a process?"

All the *user* cares about is the *single* application which is currently running.

I've lost the context, but iirc it was something about how computers that only run one program at a time are very much a minority. From our PoV as computer guys looking at all the stuff going on in the background that may be true, but from the user's PoV pretty much *every* computer is *single use*.

It's glaringly obvious as soon as you realise, when you're talking about processes, that your typical computer user won't have a clue what you're talking about.

Cheers,
Wol

Shared libraries

Posted Nov 27, 2025 8:59 UTC (Thu) by LtWorf (subscriber, #124958) [Link]

They might not know what a process is, but they will know if their music stops when they aren't looking at the player, or if the chat notifications stop working.

Anyway, we all know here what a process is. Why do we care if some people who aren't participating in the discussion aren't aware of that? How is that relevant?

Multi-user vs. single-user

Posted Nov 28, 2025 10:01 UTC (Fri) by geert (subscriber, #98403) [Link]

It depends on the use case.
E.g. DEC Ultrix did not have shared libraries, but it did have demand paging and sharing of binaries. On a typical multi-user system at that time, all users ran sh, vi, matlab, and mosaic, which thus could be shared.
On a modern single-user system, the user runs a few large applications, which cannot share much, unless they use the same shared libraries.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds