A look at dynamic linking

By Daroc Alden
February 13, 2024

The dynamic linker is a critical component of modern Linux systems, being responsible for setting up the address space of most processes. While statically linked binaries have become more popular over time as the tradeoffs that originally led to dynamic linking become less relevant, dynamic linking is still the default. This article looks at what steps the dynamic linker takes to prepare a program for execution.

Invoking the linker

When the Linux kernel is asked to execute a program, it looks at the start of the file to determine what kind of program it is. The kernel then consults both its built-in rules for shell scripts and native binaries and the binfmt_misc settings (a feature of the kernel allowing users to register custom program interpreters) to determine how to handle the program. A file beginning with "#!" is identified as the input of the interpreter named on the rest of the line. This faculty is what lets the kernel appear to execute shell scripts directly — in reality executing an interpreter and passing the script as an argument. A file starting with "\x7fELF", on the other hand, is recognized an an ELF file. The kernel first looks to see whether the file contains a PT_INTERP element in the program header. When present, this element indicates that the program is dynamically linked.

The PT_INTERP element also specifies which dynamic linker the program expects to be linked by — also called the interpreter, confusingly. On Linux, the dynamic linker is usually stored at an architecture-specific path such as /lib64/ld-linux-x86-64.so.2. It is not allowed to itself be dynamically linked, to avoid an infinite regress. Once the dynamic linker has been found, the kernel sets up an initial address space for it in the same fashion as any other statically linked executable and gives it an open file descriptor pointing to the program to be executed.

Relocation

The dynamic linker's job is to arrange for the process's address space to contain both the main executable itself, and all of the libraries upon which it depends. For modern executables, that almost certainly means loading position-independent code, which is designed to be run from a non-fixed base address. Many parts of the code do still need to know where the different sections of memory are located, however, so position-independent executables contain a list of "relocations": specific places in the binary that the dynamic linker needs to patch with the actual address of various components in memory. In most programs, the majority of relocations are patches to the Global Offset Table (GOT), which contains pointers to global variables or loaded sections and link-time constants.

Giving the dynamic linker the flexibility to load different components at different addresses serves several purposes. The oldest form of position-independent code was on architectures where memory addresses were accessed relative to a base register, allowing for multiple copies of a single program to reside in memory at different addresses. The invention of dynamic address translation obviated that motivation. Modern toolchains use this flexibility for address-space layout randomization (ASLR), by allowing the dynamic linker to add an element of randomness to the chosen locations. Statically linked programs with position-independent code, including the dynamic linker itself, are loaded at a random address chosen by the kernel. Therefore, the first major task of the dynamic linker is to read its own program header and apply relocations to itself. This process includes patching the subsequent code with the location of global structures and functions — prior to applying these relocations, the linker is careful not to call any functions from other compilation units or access global variables.

Once the dynamic linker has applied relocations itself, it can then perform OS-specific setup by calling the functions for its particular platform. On Linux, it calls brk() to set up the process's data segment, including allocating some space there for its own state, and informing malloc() to put allocations in the newly-allocated space (which is separate from the program's eventual heap). At this point, malloc() refers to a temporary implementation, which cannot even free memory, that is used while the dynamic linker is identifying and applying relocations to necessary dependencies.

The first step in identifying dependencies is to set up the link map — the structure that records information for use by dlinfo(). Once that is done, the dynamic linker finds where the kernel mapped the vDSO, a shared object includes code that can service some system calls without needing to switch to the kernel, a technique that is most often used for the gettimeofday() system call. The dynamic linker places the vDSO in the link map, so that the same code that handles linking other dependencies can handle shared objects that call into the vDSO.

By this point, it may seem as though the stage is set to actually read and link the shared objects upon which the program depends, but there is one more step to complete. Users can override functions at run time by specifying shared objects in the LD_PRELOAD environment variable. Overriding functions in this way can be used for many purposes, such as to debug an application, to use an alternate allocator such as the Boehm-Demers-Weiser conservative garbage collector or jemalloc, or to fake the time and date for the application using libfaketime, for example. The dynamic linker resolves preloaded libraries first, so that when it is linking later libraries it can direct them to the overridden function definitions in one pass.

Now the dynamic linker has everything that it needs to finish arranging the address space of the program. Starting with the program being loaded, the linker looks for DT_NEEDED declarations in the program's header, which indicate that the program depends on another shared object. The linker searches for these DT_NEEDED declarations recursively, building a list of all shared objects that will be required by any of the program's transitive dependencies. DT_NEEDED entries can include absolute paths, but they can also include relative paths that are resolved by consulting the directories in the LD_LIBRARY_PATH environment variable, or a default set of directories. The dynamic linker then traverses this list of dependencies backward, so that a shared object's dependencies will be loaded before it is.

For each shared object, the linker opens the resolved file, loads it into a newly allocated (and, with ASLR, random) location in the address space, and then performs the set of relocations listed in the shared object's header. Then it adds the shared object to the link map.

The exception is the dynamic linker itself. Programs are allowed to depend on the dynamic linker as a library, to supply functions such as dlinfo(), but the linker would be broken if it applied relocations to itself in the middle of the loop, so it excludes itself from the dependency list. As described above, it has already applied relocations to itself. But the linker occasionally makes use of functions that can be overridden by preloaded libraries. Therefore, once all of the program's dependencies have been slotted into the address space, the linker applies relocations to itself one final time, so that now it will refer to the overridden version of any functions it uses. Now the malloc() implementation that the dynamic linker uses switches from the simplified implementation used so far to the generic implementation used by the rest of the program.

With all the shared libraries loaded, the dynamic linker has completed most of its work. Before jumping to the main program, however, it sets up thread-local storage (TLS) and performs any initialization required by the C library. Then it restores the state of the program's arguments and environment that the kernel supplied and jumps to the entry point of the program.

User code

One might expect that the duties of the dynamic linker end where the main program begins, but this is not so. Not only must it service requests to dlopen() new shared objects that are required at run time, but there is one more part of the dynamic linker that runs during the lifetime of the loaded program: updating the Procedure Linkage Table (PLT).

While programs could simply use relocations to directly patch CALL instructions in the text of the program, "text relocations" like this cause two performance problems. Firstly, since the number of relocations required would depend on the number of calls to the given function (which may be large), the initial application of those relocations to a shared object can be slow. Secondly, since text relocations involve dirtying the pages of memory containing a program's executable code, different processes running the same program can no longer share the same underlying memory, increasing the memory usage of the program. These performance problems mean that text relocations are largely frowned on by the maintainers of dynamic linkers.

Modern compilers and linkers work around this problem by creating a PLT — a special standalone section that contains an indirection for each externally defined function. Calls to these functions from within the program are compiled to a call to the PLT, which then contains a jump to the real location of the external function. The PLT could also directly use text relocations, but most architectures instead use indirect jumps through function pointers stored in a second GOT (separated from the program's normal GOT in its own section called ".plt.got"). Separating out the function pointers is useful on architectures that have no way to compactly encode a direct jump to an arbitrary part of the address space, but it is also useful for one final performance trick: lazy linking.

Unlike with relocations that point to data, which need to be resolved before the program starts running because the dynamic linker cannot know when they will be accessed, the relocations in the PLT's GOT don't need to be applied immediately. The dynamic linker initially fills the PLT's GOT with the address of a function from the dynamic linker itself. This function consults the link map to determine where the external symbol in question is located, and then rewrites the corresponding entry of the PLT's GOT. Linking is only performed for external functions that are actually called, and only once the program has started up. For most programs, this improves performance and makes the initial startup speed of the program faster.

Lazy linking can be turned off using the LD_BIND_NOW environment variable, or by compiling the program with the "-z now" linker option. As of the new glibc release, version 2.39, glibc's dynamic linker also supports rewriting elements of the PLT to use direct jumps instead of indirect jumps, on systems where that would have a performance benefit. The article covering the announcement claimed that the introduction of PLT rewriting was security-motivated, but a commenter pointed out that the existence of the former two methods suggests that PLT rewriting is mostly useful as a performance-tuning setting. There is one security benefit of disabling lazy linking, though: it permits "Relocation Read-Only" (RELRO), a security mitigation where the dynamic linker remaps the GOT (and the PLT's GOT) as read-only once they have been filled out, preventing an attack from overwriting them to gain control of the process's control flow.

The dynamic linker is largely invisible to most programs, but it performs a critical role in establishing the address space of every process. Most programmers will never need to interact with the exact process by which it sets up a program to be run. However, the dynamic linker's apparent stability belies a surprising amount of complexity required to ensure that programs can be prepared for execution efficiently.

A look at dynamic linking

Posted Feb 13, 2024 16:04 UTC (Tue) by abatters (✭ supporter ✭, #6932) [Link] (3 responses)

Does the linker handle indirect functions?

https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Common-Func...

A look at dynamic linking

Posted Feb 14, 2024 12:50 UTC (Wed) by nix (subscriber, #2304) [Link] (2 responses)

This too is largely the dynamic linker's job, yes: the linker proper just translates the ifuncs GCC emits into STT_GNU_IFUNC-type symbols in the final ELF object.

As an aside, there are fairly harsh restrictions on what ifuncs can do -- notably, since they're called in the middle of dynamic symbol resolution, they can't call anything else which might go through the dynamic linker, which more or less means static functions only.

A look at dynamic linking

Posted Feb 17, 2024 8:33 UTC (Sat) by fw (subscriber, #26023) [Link] (1 responses)

Some targets require relocation processing for static functions and global data access, too. The way that this is supposed to work is that objects make their dependencies explicit using DT_NEEDED. This way, the dynamic linker can compute a relocation order that processes the object containing the IFUNC before the object that invokes.

In practice, this does not always work with symbol interposition, which is not refected in the DT_NEEDED dependencies:

Bug 20188 - libpthread IFUNC resolver for vfork can lead to crash

A look at dynamic linking

Posted Feb 21, 2024 12:19 UTC (Wed) by nix (subscriber, #2304) [Link]

Ah, thanks for that: I was sure there were awful dependency-related subtleties involved but had entirely forgotten what they were! (IIRC that in the early days IFUNCs were just executed whenever, with no particular guarantees about what if anything else had been relocated. Surprisingly, even this was enough to do a lot of useful things with them.)

A look at dynamic linking

Posted Feb 13, 2024 18:08 UTC (Tue) by jengelh (guest, #33263) [Link] (3 responses)

>[PT_INTERP element in the program header.] When present, this element indicates that the program is dynamically linked.

On the contrary. You can have a PT_INTERP defined and still be statically linked, for the most prominent definitions of "static"—free of extra libraries and relocations. The interpreter could do whatever it pleases, but for brevity, I'll just let it loop:

/tmp$ cat x.s
_start:
        jmp _start
.globl _start
/tmp$ gcc -c x.s; ld -o loader x.o; ld -shared -o loader.so x.o; ld -dynamic-linker /tmp/loader -o a.out x.o /tmp/loader.so; objcopy -R .dynamic a.out b.out

(GNU ld requires at least one DT_DYN file, else it will not copy the -dynamic-linker argument into a.out. Thus we're getting rid of the DT_NEEDED:loader.so entry again with objcopy.) So here we have a static program (b.out) with an interpreter. glibc, musl and file all are readily confused about the file, too:

# /usr/bin/file b.out
b.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /tmp/loader, not stripped
# ldd b.out
        not a dynamic executable
# ld.so --list b.out
b.out: error while loading shared libraries: b.out: cannot open shared object file (?!)
# /var/lib/machines/alpine/lib/ld-musl-x86_64.so.1 --list b.out
        /tmp/loader (0x7f5d806b7000)

Well, musl got it right.

A look at dynamic linking

Posted Feb 13, 2024 19:25 UTC (Tue) by WolfWings (subscriber, #56790) [Link] (1 responses)

BTW, the way you structured several lines as 'one enormous one-liner' really blows out the formatting of the site due to your comment.

There's never any reason to have a bunch of commands on one line linked with ; like that, just use extra lines. :)

A look at dynamic linking

Posted Feb 13, 2024 20:22 UTC (Tue) by jengelh (guest, #33263) [Link]

Yeah I only noticed it afterwards. There's a simple fix that could be applied to the LWN CSS globally: `pre { white-space: pre-wrap; }`.

A look at dynamic linking

Posted Feb 15, 2024 8:51 UTC (Thu) by andi8086 (subscriber, #153876) [Link]

Thank you, that's exactly why I pay for LWN :D To learn deep technical stuff...

I used to do tons with LD_PRELOAD

Posted Feb 13, 2024 20:53 UTC (Tue) by davecb (subscriber, #1574) [Link] (1 responses)

In a previous life, interposing things between caller and recipient was a favourite task.
We used it to create programs like a low-overhead strace, a performance measurement tool that used unmodified binaries, and, as a demo, a Linux library profiler, https://github.com/davecb/libprof

I used to do tons with LD_PRELOAD

Posted Feb 21, 2024 12:22 UTC (Wed) by nix (subscriber, #2304) [Link]

Nowadays, if you want to do something like that and also explore exciting underused parts of ld.so, the thing to try is LD_AUDIT libraries. These are loaded in a new dlmopen namespace, so they can rely on other shared libraries and the like, making it a lot easier to write the things than it is to write an LD_PRELOADed profiler.

(In practice this excellent feature is rarely used, alas.

Lazy dynamic linking problem

Posted Feb 14, 2024 4:01 UTC (Wed) by jreiser (subscriber, #11027) [Link]

Lazy dynamic linking defers binding to any particular external procedure until the first actual use. This avoids work for resolving references to unused symbols, but can also plant a time bomb if there is no definition for a symbol that is used eventually. So a program might fail unexpectedly when it enters a new phase of execution that uses previously-unused symbols. This can be particularly troublesome if the dependent shared libraries are not careful enough to implement an ABI that supports software evolution through differing delivered versions. BIND_NOW can be useful during development and maintenance, for the purpose of finding such dangling references.

A look at dynamic linking

Posted Feb 14, 2024 9:29 UTC (Wed) by cyperpunks (subscriber, #39406) [Link]

Seems like the details of rpath/runpath was skipped in the description?

A look at dynamic linking

Posted Feb 14, 2024 23:42 UTC (Wed) by interalia (subscriber, #26615) [Link]

Thanks for this article. Knew a decent amount of this but it's always nice to get a well written description with some of the finer details!

A look at dynamic linking

Posted Feb 15, 2024 16:07 UTC (Thu) by jcpunk (subscriber, #95796) [Link]

This is the kind of technical content I love to see at LWN!

A look at dynamic linking

Posted Feb 19, 2024 15:14 UTC (Mon) by mechanicker (subscriber, #166579) [Link] (1 responses)

Remembering the time when I used `dldump` to generate a new executable with most relocations done to speed up start time of a popular CAD software that would load a lot of dynamic libraries.
This was inspired by the Emacs build system and their portable dumper mechanism.

A look at dynamic linking

Posted Feb 22, 2024 9:56 UTC (Thu) by Vorpal (guest, #136011) [Link]

I hadn't heard of that. Reading about it, it sounds similar to the old prelink utility. Maybe it even is what prelink used internally?