|
|
Subscribe / Log in / New account

The ABI status of ELF hash tables

By Jonathan Corbet
August 19, 2022
It is fair to say that some projects are rather more concerned about preserving ABI compatibility than others; the GNU C Library (glibc) project stands out even among those that put a lot of effort into preserving interface stability, So it may be a bit surprising that a recent glibc change is being blamed for breaking a number of applications, most of which are proprietary games. There is, it seems, a class of glibc changes that can break applications, but which are not deemed to be ABI changes.

When the dynamic linker starts a program, it must resolve all of the symbol references into shared libraries (including glibc). That can involve looking up thousands of symbols in long lists. Since this process must complete before an application can actually start running, it needs to happen quickly. Nobody likes a long delay between starting nethack and facing off against that first kobold, after all. So it is not surprising that some effort has gone into optimizing symbol lookup.

When the ELF file for a shared object is created by the linker, one of the sections stored therein contains a hash table for the symbols in that file. This hash table can be used to speed the lookup process and get the application underway. For many years, the System V standard for the format of this table has been DT_HASH; that format is supported by the toolchains on Linux. In 2006, though, the DT_GNU_HASH format was added as well; it includes a number of improvements intended to get nethack players into their dungeons even more quickly, including a better hash algorithm and a Bloom filter to short-circuit the search for missing symbols. This format is not well documented, but this 2017 blog post gives an introduction.

Since the hash table lives in its own ELF section, there is nothing preventing an ELF file from having more than one of them. Linkers on Linux systems can be told to create one format or the other — or to create both, each in its own section. Until recently, glibc has been built (by default) with a linker option explicitly requesting that both formats be created. That changed, though, with the glibc 2.36 release at the beginning of August; it contained a simple patch from Florian Weimer causing only the DT_GNU_HASH format to be generated.

At this point, glibc 2.36 has been installed onto a number of systems running the faster-moving Linux distributions, and few people have noticed the change; the DT_HASH format has not been used for anything on those systems in many years, and the only consequence of its removal is regaining a small amount of disk space. Game players, though, were not so lucky; naturally, the problem relates to a piece of proprietary software that cannot be easily changed.

The Easy Anti-Cheat (EAC) system is a proprietary tool from EPIC intended to prevent game players from cheating by way of modifications to the game executable itself. As one might expect, the actual heuristics used are not documented anywhere and the code is secret, but one of the techniques involved appears to be looking at the symbols in the game executable and ensuring that they match the expected values. To do this, EAC goes rooting through the DT_HASH tables; if an expected table isn't present, EAC proudly proclaims that it has caught a cheater and does not allow the game to run.

Gamers, it seems, find this behavior disappointing. Arkadiusz Hiler, for example, blogged:

I think this whole situation shows why creating native games for Linux is challenging. It’s hard to blame developers for targeting Windows and relying on Wine + friends. It’s just much more stable and much less likely to break and stay broken.

Hiler reported the problem in the glibc bug tracker; the discussion later moved to the project's mailing list as well. The report stated that the change "breaks SysV ABI compatibility", suggesting that it should be reverted. The responses from the project were not entirely sympathetic to that cause, though. Weimer answered that: "Any tool that performs symbol lookups needs to support DT_GNU_HASH these days". Adhemerval Zanella said: "I am not sure this characterizes as an ABI break since the symbol lookup information would be indeed provided (albeit in a different format)"

Carlos O'Donell agreed that this change was not an ABI break:

Software that is an ELF consumer on Linux has had 16 years to be updated to handle the switch from DT_HASH to DT_GNU_HASH (OS-specific).

While I'm sympathetic to application developers and their backwards compatibility requirements, this specific case is about an ELF consumer and such a consumer needs to track upstream Linux ELF developments.

He went on to say that characteristics of the generated ELF file are not part of the glibc ABI, even if changes there break applications, and suggested that the bug should be closed as "won't fix". He also requested that the EAC developers provide reasons for why DT_HASH should be retained by default. As of this writing, the bug remains open.

Blaming the EAC developers for not keeping up with Linux ELF hash-table formats might not be entirely fair. The DT_HASH format is mandated by the System V ABI specification, the DT_GNU_HASH format is undocumented, and there has been no deprecation campaign to get users to move on. Chances are those developers are as surprised as anybody and haven't just been ignoring the "switch to DT_GNU_HASH" entry languishing in their issue tracker for the last decade or so. Regardless of blame, though, something needs to be done to solve this problem and save gamers from the prospect of having to get some actual work done.

If EAC were free software, of course, chances are there would already be a patch circulating to deal with the problem. As it is, only its owner can deal with this problem directly. Meanwhile, though, there is another workaround available: distributors can easily patch the glibc build to restore the DT_HASH section and make the problem go away for now. Doing that and giving EAC (along with a few other programs) some time to move to DT_GNU_HASH seems like the best solution from just about any point of view.


to post comments

The ABI status of ELF hash tables

Posted Aug 19, 2022 14:34 UTC (Fri) by josh (subscriber, #17465) [Link] (15 responses)

If EAC were Free Software (and still for some mystifying reason existed at all), someone could have observed before now that it used DT_HASH. Since it isn't, there was neither an issue tracker to file the bug in nor a straightforward means for someone to notice in order to file one.

The ABI status of ELF hash tables

Posted Aug 19, 2022 14:40 UTC (Fri) by immibis (subscriber, #105511) [Link] (14 responses)

These kinds of anti-cheat systems fundamentally cannot be free software; rather, a free-software version of the game could not come with a useful anti-cheat system at all.

The ABI status of ELF hash tables

Posted Aug 19, 2022 17:13 UTC (Fri) by flussence (guest, #85566) [Link] (13 responses)

Doing anticheat with free software isn't impossible, it just needs to be thought of differently; you could use the flight computer quorum model where each client does sanity checking and reporting on the input received from others and the server offlines anything that consistently runs out of spec.

The ABI status of ELF hash tables

Posted Aug 19, 2022 17:35 UTC (Fri) by tux3 (subscriber, #101245) [Link] (5 responses)

The problem is not just manipulated output from a client. If the server is not sufficiently fine-grained with the info it sends, some cheats can reveal people hiding behind walls, disperse fog of war, or reveal other such byte patterns in memory that are not normally rendered.

Anticheats are in the business of preventing you from controlling the code that runs on your machine, so that you can't decide to run code that would give you an unfair advantage.

Which, short of imposing remote attestation hardware DRMs, tends to be strongly at odds with a free software system.

The ABI status of ELF hash tables

Posted Aug 20, 2022 13:31 UTC (Sat) by gray_-_wolf (subscriber, #131074) [Link]

> If the server is not sufficiently fine-grained with the info it sends, some cheats can reveal people hiding behind walls, disperse fog of war, or

Just to add to this, sometimes being fine-grained enough is simply not possible due to latency requirements. Sometimes in very fast paced games the client need to have more information than user (player) should see in order to be able to quickly react to inputs.

The ABI status of ELF hash tables

Posted Aug 21, 2022 2:57 UTC (Sun) by bartoc (guest, #124262) [Link] (3 responses)

You could try and detect that the player was seeing more info than they should behaviorally, but then you could start detection "star sense" as cheaters, which is not great.

Ultimately an open source anti-cheat probably could catch a lot of people, anti-cheat by it's nature tends not to catch the kind of people that can decipher anti-cheat source code. It's more to catch people using known cheats I think.

The ABI status of ELF hash tables

Posted Aug 21, 2022 20:33 UTC (Sun) by jkingweb (subscriber, #113039) [Link] (2 responses)

> You could try and detect that the player was seeing more info than they should behaviorally, but then you could start detection "star sense" as cheaters, which is not great.

What is "star sense"?

The ABI status of ELF hash tables

Posted Aug 21, 2022 21:53 UTC (Sun) by bartoc (guest, #124262) [Link] (1 responses)

sometimes (probably more often) called "game sense" it's the ability to basically use very limited information to predict what your opponent is doing. Basically knowing the game the metagame so well you can basically tell what your opponent is doing by just thinking "what would I do" and evaluating what actions they could do would be worst for you.

In any event really good game sense can result in you taking actions to respond to stuff before you can see it. You see this in Starcraft a lot, it can look like someone is using maphacks but in reality they are just really experienced.

The ABI status of ELF hash tables

Posted Oct 26, 2022 17:54 UTC (Wed) by sammythesnake (guest, #17693) [Link]

So, "intuitive minimax" then?

https://en.m.wikipedia.org/wiki/Minimax

To be fair, I use this kind of thing (along with other reasonable understandings of less deterministic but still somewhat predictable behaviour - something more akin to a Bayesian model, but again more intuitive than rigorous) all the time while doing such boringly quotidian things as driving - it's not at all rare that I twig somebody's planning to execute a turn a fair time before they remember that indicators are a thing, for example (assuming they ever actually do - BMW/Audi/etc. stereotypes notwithstanding) It's literally saved my life on a few occasions...

The ABI status of ELF hash tables

Posted Aug 19, 2022 17:45 UTC (Fri) by tshow (subscriber, #6411) [Link] (5 responses)

The flight computer quorum model doesn't have to deal with the possibility that any or all of the systems may have malicious intent. Consider an esports context where an entire team's computers (and thus, half the computers in the game) could be colluding.

It's a far thornier problem.

That said, the companies that make anti-cheat software haven't exactly covered themselves in glory. A lot of what's in the field look suspiciously like what would happen if you got the enthusiastic intern to write a rootkit.

The ABI status of ELF hash tables

Posted Aug 19, 2022 20:39 UTC (Fri) by camhusmj38 (subscriber, #99234) [Link] (1 responses)

I’m suspicious of anything that requires me to run a driver in kernel mode just to play a game.

The ABI status of ELF hash tables

Posted Sep 2, 2022 13:45 UTC (Fri) by Darkstar (guest, #28767) [Link]

> I’m suspicious of anything that requires me to run a driver in kernel mode just to play a game.

You probably wouldn't play any competitive First-Person Shooter games anyway, I assume?

Because if you did, and you knew that everyone else could be cheating on you without you knowing, I guess you would be more open to having a kernel-based anti-cheat driver.

The ABI status of ELF hash tables

Posted Aug 21, 2022 9:30 UTC (Sun) by jengelh (guest, #33263) [Link] (1 responses)

>Consider an esports context where an entire team's computers (and thus, half the computers in the game) could be colluding.

Any 2-team n-vs-n player setup reduces to a 1-vs-1 problem, because the other team could *also* be colluding.
Playground piece of wisdom: If you don't like the rules or how they are applied by the other player(s) and/or the umpire (if there is one), stop participating.

The ABI status of ELF hash tables

Posted Aug 22, 2022 7:12 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

The rules prohibit such collusion, by assumption (or else it would not be brought up in a discussion of cheating in the first place). So your suggestion boils down to "don't play if you don't like other people cheating." Which is terrible for obvious reasons.

The ABI status of ELF hash tables

Posted Aug 22, 2022 19:58 UTC (Mon) by flussence (guest, #85566) [Link]

> Consider an esports context where an entire team's computers (and thus, half the computers in the game) could be colluding.

If it's an entirely remote event then the problem is interesting, but at a live event I'd imagine a client-server setup where one team's clients are raising red flags with the server and the others are silent would quickly become embarrassing to the cheaters.

FWIW, even proprietary anti-cheat software hasn't stopped people cheating all the way to finals on stage and then getting caught there.

The ABI status of ELF hash tables

Posted Aug 19, 2022 17:50 UTC (Fri) by excors (subscriber, #95769) [Link]

That won't help with even the most basic cheats like aimbots (where the player's inputs follow all the normal rules and are processed perfectly legitimately, but they don't come directly from the player's hand) or wall hacks (where players can see through walls, detecting enemies who ought to be hard to see).

You need to either lock down the client hardware so thoroughly that you can be certain it's only running the software provided by the game developer (as with consoles), or engage in an endless arms race of updating detection for cheats which are then updated to circumvent that detection. Game developers will never completely win, but in practice they usually do a good enough job to stop the game being made unplayable by a flood of cheaters. They'd find it much harder if they freely shared their detection algorithms with the cheat developers.

(You can also limit yourself to developing games where players don't gain an unfair advantage by cheating, but a lot of people enjoy competitive PvP action games and it'd be a shame to rule them out.)

The ABI status of ELF hash tables

Posted Aug 19, 2022 15:21 UTC (Fri) by fratti (guest, #105722) [Link] (4 responses)

I'm surprised EAC runs on Linux at all, last I heard that was a blocker for many games on the Steam Deck. Though I'm glad game developers are still engaging in the Sisyphean task of trying to establish trust on an untrusted client. The hours spent playing cat-and-mouse with zit-adorned teenagers who are telling a game server that they can in fact fly through the air seem well invested.

On a more serious note, is there any reason why glibc shouldn't keep DT_HASH around? kilobytes in disk space are hardly a reason to break an application that users are running, even if said application is not one the glibc maintainers like.

> [...] something needs to be done to solve this problem and save gamers from the prospect of having to get some actual work done.

While it may seem silly, there is a large industry of online content creators for whom playing video games is an integral part of work. Besides, we must be wary of glibc maintainers. First they came for Flash Player's use of memcpy, but I did not speak for I was not trying to watch the youtubes. Next, they came for gamers, and I did not speak for I was not a gamer. Once they come for people who spend too much time reading e-mails, there will be no one left to speak for us.

The ABI status of ELF hash tables

Posted Aug 19, 2022 20:55 UTC (Fri) by etra0 (guest, #160378) [Link] (3 responses)

> On a more serious note, is there any reason why glibc shouldn't keep DT_HASH around? kilobytes in disk space are hardly a reason to break an application that users are running, even if said application is not one the glibc maintainers like.

So far the only two reasons I've read are
* People should have been aware of an *new* existing technology, albeit not being documented.
* You save 1% of disk space, "(...) which is considerable for an unused feature." [1]

I'm surprised how anyone would consider 1% of disk space considerable in modern day and age.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=29456#c9

The ABI status of ELF hash tables

Posted Aug 20, 2022 13:33 UTC (Sat) by mid-kid (guest, #160386) [Link] (1 responses)

Saving space with the most core library of them all still makes sense in 2022 as it's used in containers (where this matters a bunch), and (even if more seldomly) embedded systems.

The ABI status of ELF hash tables

Posted Sep 1, 2022 19:43 UTC (Thu) by Vipketsh (guest, #134480) [Link]

Surely you jest. You are also complaining to all those new fangled static-link only languages for wasting your disk space, right ?

On my raspberry, which still has the DT_HASH entry, the section is less than 16kb. So, the glibc developers decided that compatibility is just not worth 16kb, in an otherwise 2mb big binary. It's not like the developers need to put in work just to keep backwards compatibility -- it's just an option to an external program for crying out loud!

It's very elitist how the glibc developers are handling this case.

The ABI status of ELF hash tables

Posted Aug 25, 2022 4:47 UTC (Thu) by milesrout (subscriber, #126894) [Link]

The bigger that my disk is, the bigger 1% is. Of course 1% disk space is still appreciable in the modern age.

The ABI status of ELF hash tables

Posted Aug 19, 2022 15:31 UTC (Fri) by mb (subscriber, #50428) [Link] (17 responses)

> Carlos O'Donell agreed that this change was not an ABI break:
> Software that is an ELF consumer on Linux has had 16 years to be updated to handle the switch from DT_HASH to DT_GNU_HASH (OS-specific).

What if the software is not being maintained anymore? Nobody is going to fix it.

Of course this is an ABI break.
Lots of software is not going to be changed, if some random OS developer drops support for some feature.

If glibc doesn't fix this, the only solution for users will be to grab old outdated versions of glibc and throw it into the game bin.

The game should have shipped with all required .so libraries in the first place, though. Including libc.

The ABI status of ELF hash tables

Posted Aug 19, 2022 16:31 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (10 responses)

On platforms which are not Linux, it's generally considered incorrect to ship your own libc, because on such platforms, libc is usually the official syscall API (so if your libc is outdated, then your app may break with no warning, and it will be considered entirely your own fault). Even on Linux, I'd be skeptical of shipping my own libc.

The ABI status of ELF hash tables

Posted Aug 19, 2022 17:40 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (8 responses)

Any statically linked program, which includes any Go program, is essentially shipping its own libc.

The ABI status of ELF hash tables

Posted Aug 19, 2022 19:26 UTC (Fri) by int19h (guest, #159020) [Link] (4 responses)

Yes, which is why Go binaries were broken on macOS and FreeBSD for a long time, because it directly used a private API (syscalls) instead of libc. Here's an example:

https://github.com/golang/go/issues/16606

The only OS on which you can statically link libc and expect your binaries to work in future releases is Linux.

The ABI status of ELF hash tables

Posted Aug 19, 2022 20:14 UTC (Fri) by fw (subscriber, #26023) [Link] (2 responses)

Even the last part about Linux isn't totally true. The x86-64 kernel interface for getcpu and gettimeofday has already changed once for many users, breaking old glibc versions before 2.15 (and presumably statically linked binaries).

The ABI status of ELF hash tables

Posted Aug 19, 2022 21:47 UTC (Fri) by linuxrocks123 (subscriber, #34648) [Link] (1 responses)

Do you have a source for this? The kernel ABI is considered sacrosanct to the point that obsolete system calls from the early days of Linux are still around. I'd be surprised if they allowed a system call interface change that broke something as important as glibc.

The ABI status of ELF hash tables

Posted Aug 20, 2022 6:40 UTC (Sat) by izbyshev (guest, #107996) [Link]

The GP most likely refers to CONFIG_LEGACY_VSYSCALL_NONE enabled in Debian Buster[1] and some other distros (and even WSL kernels on Windows[2]), which is indeed an ABI-breaking change (e.g. gettimeofday() crashes on older glibc, breaking use cases like running containers with older distros).

But it's important to understand that it's not the upstream kernel that broke the ABI (in the default configuration) in this case. On the contrary, they introduced a new mode to improve security while preserving (most of) ABI compatibility[3].

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=852620
[2] https://github.com/microsoft/WSL/issues/4694
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...

The ABI status of ELF hash tables

Posted Aug 19, 2022 23:29 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

And were even somewhat broken on Linux if they used setuid - the system call doesn't (and isn't required to) follow the POSIX semantics, and it's up to libc to deal with that (https://github.com/golang/go/issues/1435)

statically-linked Go programs and libc

Posted Aug 19, 2022 20:14 UTC (Fri) by dkg (subscriber, #55359) [Link] (1 responses)

@pbonzini wrote:
> Any statically linked program, which includes any Go program, is essentially shipping its own libc.

Are you sure about this? gosop is a Go program, and all the Go bits are statically-linked, but it still dynamically loads libc

0 dkg@alice:~$ ldd $(which gosop)
	linux-vdso.so.1 (0x00007ffe414fd000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdedda00000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fdeddcc5000)
0 dkg@alice:~$ 

statically-linked Go programs and libc

Posted Aug 19, 2022 20:36 UTC (Fri) by dtlin (subscriber, #36537) [Link]

Depends on how it is built. cgo will use libc, but if a binary is built with CGO_ENABLED=0, then
$ ldd /usr/bin/tailscale
	not a dynamic executable

The ABI status of ELF hash tables

Posted Aug 20, 2022 23:30 UTC (Sat) by LtWorf (subscriber, #124958) [Link]

You never ran ldd on a go program uh?

They do link libc

The ABI status of ELF hash tables

Posted Aug 21, 2022 3:02 UTC (Sun) by bartoc (guest, #124262) [Link]

The problem isn't really the syscall ABI, a bigger problem is nss plugins. Have fun not being able to resolve hostnames properly because your nss plugins won't load with an old version of glibc. You can static link them, but who knows if you'll get everything you need, or in the right configuration.

Plugins in general are what really screw up applocal deployment on linux, you just have to be aware of them, and glibc loads plugins on almost all linux systems.

The ABI status of ELF hash tables

Posted Aug 19, 2022 16:49 UTC (Fri) by mokki (subscriber, #33200) [Link] (4 responses)

There is no reason to go back to older glibc versions.

The correct solution is that the game or steam has a wrapper that creates a copy of glibc (and other libraries the game scans) library and adds the missing DT_HASH section from the modern version. If that does not already exist in elf tools, it could be added.
And of course the wrapper should detect if glibc changes and do the above again when needed to keep in sync with security and OS updates.

The ABI status of ELF hash tables

Posted Aug 19, 2022 17:00 UTC (Fri) by mb (subscriber, #50428) [Link] (2 responses)

>The correct solution is that the game or steam has a wrapper that creates a copy of glibc (and other libraries the game scans) library and adds the missing DT_HASH section from the modern version.

That might work in this case.
But it won't make the breakage go away until such a solution exists.

And it would require support from the sw vendor. That breaks as soon as the system goes out of support. I would not count on the vendor to still be around, if glibc decides to introduce the next breakage 16 years from now.

Real software has been broken.
Therefore, the only real solution is to revert the breaking change.
It's that simple.

The ABI status of ELF hash tables

Posted Aug 20, 2022 19:32 UTC (Sat) by mokki (subscriber, #33200) [Link] (1 responses)

The game does not need to be changed or supported. The wrapper can be provided by steam or anyone else packaging the old game for Linux.

The ABI status of ELF hash tables

Posted Aug 20, 2022 20:27 UTC (Sat) by mb (subscriber, #50428) [Link]

>The game does not need to be changed or supported. The wrapper can be provided by steam or anyone else packaging the old game for Linux.

Yes. The wrapper can be provided by the game or platform provider.
That's the definition of support.

It is not Ok to break something and then expect somebody else to provide a workaround.

The ABI status of ELF hash tables

Posted Aug 22, 2022 21:54 UTC (Mon) by bartoc (guest, #124262) [Link]

This kinda stuff works if ALL libraries that depend on libc (which basically means all libraries) are bundled with the game. Unfortunately, that is not a good option, since you really want to use (at least) the graphics stack from the host system. Additionally, you probably do want libraries like openssl to update out from under you, either from something like the steam runtime, a flatpak runtime, or the distribution itself. Guess what: libraries like openssl and gnutls load plugins, and those plugins assume that they are being loaded into a process with a glibc at least as new as the one the openssl/gnutls was built for.

Basically, it's really bad to bundle shared libraries if they could possibly be dependencies of anything you want to load from the base system. If you really want to bundle such libraries, you should probably consider reangling their symbols, but that's not a good option for something like glibc as it would make things like threads work in really bizarre ways.

The ABI status of ELF hash tables

Posted Sep 16, 2022 8:25 UTC (Fri) by daenzer (subscriber, #7050) [Link]

> The game should have shipped with all required .so libraries in the first place, though. Including libc.

I don't think I've ever seen a game ship its own copy of libc, but I have seen lots of them ship their own copies of libstdc++ (and various other libraries).

The result was invariably that the game stopped working sooner or later, because its copies of those libraries were too old for other things (usually GPU driver components) mapped into the process. Removing the games' copies of the libraries made them work again.

libc would be even more likely to hit this, since basically everything links against it, it never changes SONAME, and it often bumps the symbol version of existing ABI symbols.

The ABI status of ELF hash tables

Posted Aug 19, 2022 18:25 UTC (Fri) by cgutman (subscriber, #110037) [Link] (1 responses)

> Adhemerval Zanella said: "I am not sure this characterizes as an ABI break since the symbol lookup information would be indeed provided (albeit in a different format)"

This line of thinking is completely crazy. Having a replacement available has absolutely nothing to do with whether something is an ABI break or not.

By this reasoning, you could argue that removing memcpy() is not an ABI break because memmove() can do the same thing already and it's been available for ages too.

> Software that is an ELF consumer on Linux has had 16 years to be updated to handle the switch from DT_HASH to DT_GNU_HASH (OS-specific).

and software has had decades to switch from using strcat()/strcpy()/gets() to something more resistant to buffer overflows like strncat()/strncpy()/fgets(), but nobody is talking about removing those APIs from glibc. Why? Because it breaks the ABI!

The ABI status of ELF hash tables

Posted Oct 26, 2022 18:36 UTC (Wed) by sammythesnake (guest, #17693) [Link]

I don't have any skin in this game, but if I were to try to justify classifying this as "not an ABI change", I think I'd say that the format of the symbol hash is an implementation detail not intended for use outside of the linker.

If the format isn't documented[0] in the documentation for developers using the library, that would be consistent with that position.

It's a weaker guarantee than the one provided by the kernel, though (using the Hyrum's Law[1] model of what constitutes an ABI) but it's a valid position to take if the hash format *is* an implementation detail.

[0] I have very little interest in reading the libc documentation, so maybe it is. The *newer* format is apparently not well documented, though, and the old format predates glibc's use of it, so it being documented for that reason doesn't necessarily torpedo this line of reasoning - it being listed inn the glibc documentation in language other than as an implementation detail would, though...

[1] https://www.hyrumslaw.com/

The ABI status of ELF hash tables

Posted Aug 19, 2022 18:32 UTC (Fri) by ndesaulniers (subscriber, #110768) [Link] (3 responses)

> When the dynamic linker starts a program, it must resolve all of the symbol references into shared libraries (including glibc).

> Since this process must complete before an application can actually start running, it needs to happen quickly.

I don't think that's generally true; it relies on the executable being linked with `-z now`, otherwise the symbols are resolved lazily (ie. the whole dance with the GOT and PLT). I suspect the implicit default for this flag is configure-able in GNU BFD though, and at this point I suspect most distros would set that as the default. This adds a layer of protection to prevent tampering with the GOT and PLT at runtime.

When symbols are resolved lazily, not "all" symbols references need to be resolved, and the process doesn't need to "complete" before the process starts running.

The ABI status of ELF hash tables

Posted Aug 19, 2022 19:31 UTC (Fri) by ndesaulniers (subscriber, #110768) [Link]

I suspect the EAC developers just implemented scanning the simpler and documented symbol table section.

That said, if glibc can break abi like this, I wonder if we can drop --hash-style=both from VDSO's in the kernel and just support DT_GNU_HASH?

(Fun fact: I once had a bug where one of these sections (DT_GNU_HASH) wasn't being produced; I could only repro on builds from the server and not locally. I had to use `dd` to slice the vdso out of a running process on device using a build from the server. Ultimately, kbuild wasn't hermetic, and the server's linker was missing support for --hash-style=both. https://stackoverflow.com/a/54797221).

The ABI status of ELF hash tables

Posted Aug 19, 2022 22:56 UTC (Fri) by jreiser (subscriber, #11027) [Link] (1 responses)

> -z now ... adds a layer of protection to prevent tampering with the GOT and PLT at runtime.

There is no protection against tampering unless (in addition) various parts of the PLT (ProgramLinkageTable) are put into a .relro section and then into a PT_LOAD segment that lacks PROT_WRITE.

Part of the original impetus for -z now was to prevent spending hours and hours during the first phase of a computation, only to have the process abort because some symbol relating only to the second phase could not be resolved due to mismatch of shared libraries or other configuration problems.

-z now can interfere with explicit dlclose(); ...; dlopen() for managing a multi-phase program in a constrained address space, or user interaction that explores optional (but known-in-advance) "heavy weight" subsystems.

The ABI status of ELF hash tables

Posted Aug 23, 2022 1:49 UTC (Tue) by bartoc (guest, #124262) [Link]

The "norm" should probably be to use -z defs -z lazy --no-allow-shlib-undefined. That is "report a link-time error if any symbols are undefined, but don't tell the loader to resolve all symbols at startup, and also report undefined symbols that are in shared libraries I linked to". For a lot of applications using "-z now" is allright, but for a lot of libraries (and some apps) -z lazy is better because the set of used symbols depends highly on the path execution takes. Think of a library like OpenImageIO or VTK that has a ton of format parsing libraries linked to it, most of which won't be used by a given application.

The ABI status of ELF hash tables

Posted Aug 19, 2022 18:38 UTC (Fri) by shentino (guest, #76459) [Link] (1 responses)

Not GNU's fault if they didn't comply with the ABI in the first place.

Using undocumented dependencies that aren't part of the ABI or API is by definition invoking undefined behavior and voiding the warranty.

And microsoft using undocumented APIs to give itself an unfair competitive edge is something they've already been shamed for, and rightly so.

If proprietary games are the ones breaking because they cheated and tried to bypass the warranty provided by the ABI then it serves them right for breaking the rules.

Unfortunately commercial interests will probably spin this to denigrate open source software and unlevel the playing field back in their own favor with proprietary crap that hides dishonest coding practices

The ABI status of ELF hash tables

Posted Aug 19, 2022 20:37 UTC (Fri) by camhusmj38 (subscriber, #99234) [Link]

But they did comply with the ABI - the new hash algorithm is not documented and is not the one mandated by the System V ABI. This is definitely an ABI break because a program that worked with an earlier version stopped working because the file format changed. The reason we have an ABI is so that existing programs continue to work without needing recompilation.

The ABI status of ELF hash tables

Posted Aug 19, 2022 19:24 UTC (Fri) by rwmj (subscriber, #5474) [Link] (2 responses)

I think the thing which confuses me most is why glibc is generating this section and not the linker (ie. binutils)?

The ABI status of ELF hash tables

Posted Aug 19, 2022 19:24 UTC (Fri) by rwmj (subscriber, #5474) [Link] (1 responses)

.. and to continue that thought, why the linker wouldn't have a -Wl,--generate-old-style-hash flag or similar.

The ABI status of ELF hash tables

Posted Aug 19, 2022 19:29 UTC (Fri) by corbet (editor, #1) [Link]

The linker is generating the hash tables; the glibc change was, at its core, a change to a linker option. That's why this problem would be a relatively easy thing for distributors to address in the short term; they can just put the --hash-style=both option back.

The ABI status of ELF hash tables

Posted Aug 19, 2022 20:34 UTC (Fri) by marcH (subscriber, #57642) [Link] (8 responses)

> Blaming the EAC developers for not keeping up with Linux ELF hash-table formats might not be entirely fair. The DT_HASH format is mandated by the System V ABI specification, the DT_GNU_HASH format is undocumented, and there has been no deprecation campaign to get users to move on

No surprise so many Linux products and so much software running on Linux try to avoid glibc. GPLv3 is the legal icing on the cake.

The ABI status of ELF hash tables

Posted Aug 19, 2022 21:13 UTC (Fri) by fratti (guest, #105722) [Link]

glibc is LGPLv2.1 though?

The ABI status of ELF hash tables

Posted Aug 19, 2022 23:29 UTC (Fri) by lkundrak (subscriber, #43452) [Link] (6 responses)

> so much software running on Linux try to avoid glibc

never heard of any

The ABI status of ELF hash tables

Posted Aug 20, 2022 7:20 UTC (Sat) by marcH (subscriber, #57642) [Link] (5 responses)

Go was already mentioned, Rust, Android...

(I stand corrected about the GPLv3)

The ABI status of ELF hash tables

Posted Aug 20, 2022 10:42 UTC (Sat) by pbonzini (subscriber, #60935) [Link] (2 responses)

Is it surprising that languages other than C avoids the C runtime library?

The main users of a C runtime library other than glibc are Alpine and Android; pretty much all other Linux distributions are "GNU/Linux" and use glibc.

The ABI status of ELF hash tables

Posted Aug 21, 2022 13:15 UTC (Sun) by jhoblitt (subscriber, #77733) [Link]

There is a package to add glibc to alpine as there is a surprising amount of software which is broken by musl. It don't use alpine as an OCI base layer as often as I used to because of the occasional random problem caused by software that depends on glibc.

For better or worse, glibc is defining a defacto ABI for "Linux".

The ABI status of ELF hash tables

Posted Aug 21, 2022 13:21 UTC (Sun) by marcH (subscriber, #57642) [Link]

> Is it surprising that languages other than C avoids the C runtime library?

Yes and no https://gankra.github.io/blah/c-isnt-a-language/

> pretty much all other Linux distributions are "GNU/Linux" and use glibc.

If you look beyond "interactive" users and at the number of instances of Linux kernels running, I suspect Linux "distributions" are still niche. Still waiting for the "Year of the Linux Desktop" and breakages like this one don't exactly help.

> Rust, of course, uses glibc.

... on Linux "distributions".

The ABI status of ELF hash tables

Posted Aug 20, 2022 11:51 UTC (Sat) by khim (subscriber, #9252) [Link]

Rust, of course, uses glibc.

Go avoids it, but not because it thinks it's particularly bad, but because it avoids all dynamic libraries by default, as mentioned above it does the same on macOS.

Android, sure, that thing tries to purge all GPL/LGPL-related parts from userspace (although not very successfully so far), again, not a technical issue.

I just find it funny that you say software running on Linux try to avoid glibc as reaction to something GLibC developers did and when asked to clarify can only bring examples which are either incorrect or license-based.

The ABI status of ELF hash tables

Posted Aug 23, 2022 1:38 UTC (Tue) by bartoc (guest, #124262) [Link]

Go kinda has to avoid glibc because of how it's coroutines work, fundamentally it has to be able to turn what would be syscalls in glibc into "submit async work then start running a new coroutine"), This is basically why I really don't love go's concurrency approach, I would rather just eat the size of the kernel stack and use a real thread.

You probably need something like io_uring to actually save syscalls with this approach, and I don't think go uses it.

The ABI status of ELF hash tables

Posted Aug 20, 2022 0:23 UTC (Sat) by ken (subscriber, #625) [Link] (2 responses)

I do not understand how somebody think it is OK to remove it when the specification say that it is MANDATORY for both executables and shared objects.

The ABI status of ELF hash tables

Posted Aug 20, 2022 11:54 UTC (Sat) by khim (subscriber, #9252) [Link] (1 responses)

Indeed. Something like that definitely require changes to the specification first. And then few years of wait time till people would adopt it.

Expecting that various third-party tools would support obscure features not present in the official specification is just crazy.

What's really surprising me is the fact that GLibC, in general, has pretty good and extensive documentation, but that particular change is not well documented.

The ABI status of ELF hash tables

Posted Aug 22, 2022 8:40 UTC (Mon) by cortana (subscriber, #24596) [Link]

Hmm

I've always found glibc's documentation patchy at best. Lots of it is documented nicely in the manual, but there are chunks (pthreads, getaddrinfo, the _l variants of the ctype/string functions) that are missing entirely!

Not surprised.

Posted Aug 20, 2022 1:55 UTC (Sat) by Subsentient (subscriber, #142918) [Link]

This isn't surprising to me. The glibc people have had a reputation for being giant dicks at least since 2010 as far as my memory serves.

If it's mandated by SysV ABI, it needs to be put back. The fact it's mandated by SysV ABI probably explains why EAC thought they could rely on it being there, even more so than the GNU version. It's not hurting anything to leave it there, so why remove it at all if it's going to break specs and cause anger like this?

Then again, I find some of the developers for projects like glibc just like making inexplicable executive decisions that everyone else who uses their code just has to bend over and take.

The ABI status of ELF hash tables

Posted Aug 20, 2022 3:46 UTC (Sat) by WolfWings (subscriber, #56790) [Link] (6 responses)

This is not the only 'breaking' change GNU LibC did in recent updates. the 2.34 change to the initialization routines in a "it's not complete but we're pushing it to production" step of merging most historically separate libraries into a single one means anything compiled on a recent distro can't run on older distro's anymore.

And since it's a change to how __libc_start_main behaves in a way that explicitly breaks being run on older linkers (because of course they had to make it feed a NULL where a valid value used to be) you can't just patch around it at compile time either.

Also as noted, the SysV ABI standard requires DT_HASH so they're out of compliance by NOT including it. Supporting not adding it is one thing, but just yeeting it by default definitely is not a good play.

The ABI status of ELF hash tables

Posted Aug 20, 2022 10:44 UTC (Sat) by pbonzini (subscriber, #60935) [Link]

> anything compiled on a recent distro can't run on older distro's anymore

That has always been true if you used a symbol that had a redefinition in newer versions of glibc. It just happened rarely.

The ABI status of ELF hash tables

Posted Aug 20, 2022 12:03 UTC (Sat) by khim (subscriber, #9252) [Link] (4 responses)

> anything compiled on a recent distro can't run on older distro's anymore.

This was never considered a breaking change and shouldn't be considered a breaking change. All others types of libc have the exact some issue and that was never promised by anyone.

If you want to run binary on older system you have to use older libc. It's as simple as that.

In Linux world that's provided by RedHat for RHEL, but not by other distributions, while on macOS and Windows old versions of libc are included in standard developer's package, but that's not a glibc fault.

The ABI status of ELF hash tables

Posted Aug 21, 2022 6:52 UTC (Sun) by comex (subscriber, #71521) [Link]

At least on macOS, there is no shipping of old versions of libc. Instead, there's a compiler flag to explicitly declare the minimum OS version you want the binary you're compiling to run on. Each function declared in OS header files includes a minimum OS version, and attempting to use an API that's too new is a compilation error. In situations like the mentioned one, where the ABI for existing APIs changes (such that newer OSes can run old binaries but not the reverse), the header files, or sometimes the compiler or linker itself, will pick the old or new ABI depending on the same declared minimum OS version.

That said, Xcode does eventually drop support for targeting old OS versions; the current lowest minimum-OS option is 10.9 from 2013.

The ABI status of ELF hash tables

Posted Aug 27, 2022 21:36 UTC (Sat) by WolfWings (subscriber, #56790) [Link]

To clarify, this is the first time I'm aware of that such a change landed that can't simply be aliased around by tacking an older version of a given function into your codebase, or even in most cases simply aliased to the old version.

They've never done a breaking change at the startup libc functions before.

The ABI status of ELF hash tables

Posted Aug 28, 2022 21:04 UTC (Sun) by HenrikH (subscriber, #31152) [Link] (1 responses)

>If you want to run binary on older system you have to use older libc. It's as simple as that.
Or you compile the binary for the old symbol with the new glibc, since glibc contains versioned symbols including all the old ones you can tell the linker exactly which version of the function that your binary wants to use. So while it is a real pain it can be done.

The ABI status of ELF hash tables

Posted Aug 29, 2022 8:56 UTC (Mon) by Wol (subscriber, #4433) [Link]

Can you do that if it's a binary you don't have source for?

Eg I've got a libc5 binary I would love to run, but it was a commercial product ...

Cheers,
Wol

The ABI status of ELF hash tables

Posted Aug 20, 2022 9:28 UTC (Sat) by drago01 (subscriber, #50715) [Link] (5 responses)

This change is an ABI break by every definition of an ABI break. Claiming otherwise is just silly.

The ABI status of ELF hash tables

Posted Aug 20, 2022 23:41 UTC (Sat) by josh (subscriber, #17465) [Link] (1 responses)

I'm normally all for "any interface people rely on is ABI", as the Linux kernel does. But there's a limit to that. Suppose something hashed the assembly code of `memcpy` and broke if it changed? Suppose something hashed ld-linux.so (the dynamic linker) and broke if it changed? Would you consider those "ABI"?

The ABI status of ELF hash tables

Posted Aug 21, 2022 5:44 UTC (Sun) by drago01 (subscriber, #50715) [Link]

No, but in that case it is even documented.

The ABI status of ELF hash tables

Posted Aug 22, 2022 14:40 UTC (Mon) by larkey (guest, #104463) [Link] (2 responses)

Yep, I often find myself on the GNU side of things (despite still being heart broken over Solaris) but this?! DT_HASH is

* standard
* universally available
* never officially deprecated (blog posts don't count)
* documented
* cross platform

The GNU alternative is

* GNU/Linux specific
* undocumented
* non-standard

Simply removing it is just absurd and the reason stated is also just sad. If I care about a *very* small foot print I wouldn't be using the GNU tooling anyway, likely :-)

I also find it appalling that they say that they are somehow "upstream ELF development" and further that everyone dealing with this need to follow all their blogs? This is just hybris.

The ABI status of ELF hash tables

Posted Aug 23, 2022 15:12 UTC (Tue) by JanC_ (guest, #34940) [Link]

DT_HASH wasn’t removed though, it’s just no longer included by default if you use the upstream build options.

The GNU/glibc people should have documented this better though (both the new table format as well as the changes to the defaults). Based on that, operating systems (distros) could then choose to maintain compatibility with SysV or not.

The ABI status of ELF hash tables

Posted Sep 6, 2022 8:52 UTC (Tue) by nix (subscriber, #2304) [Link]

> I also find it appalling that they say that they are somehow "upstream ELF development"

Looking at it practically... they are, and have been for well over a decade at this point. Perhaps 15--17 years: i.e., since about as long ago as the introduction of DT_GNU_HASH!

The ABI status of ELF hash tables

Posted Aug 20, 2022 13:33 UTC (Sat) by scientes (guest, #83068) [Link] (5 responses)

Fot all the whining and complaining in this comment section, nobody seems aware that on Linux and other ELF platforms symbol names are global, while on Windows the symbols are namespaced to the name of the dll they bind to.

And the global namespace even applies to languages with mangled symbol names.

The ABI status of ELF hash tables

Posted Aug 21, 2022 4:54 UTC (Sun) by k8to (guest, #15413) [Link]

Maybe I'm really slow, but I'm not sure how this is particularly relevant.

I don't see the internal details of the structures used by the dynamic linker as much of an ABI myself. Yeah, the old format is mandated by SysV, but I don't care if my linux distribution is SysV compliant. I don't see anything useful provided by the SysV specification at this point. Needless change is wasteful but that exact point in history doesn't seem interesting to me. I also think it's right to chide those responsible for moving us to an undocumented format. It's not something many people will need to know, but a transparent system is a fixable system.

But all that aside, I don't see how symbol namespacing really means something one way or another in this case.

The ABI status of ELF hash tables

Posted Aug 21, 2022 11:14 UTC (Sun) by jengelh (guest, #33263) [Link] (3 responses)

>on Windows the symbols are namespaced to the name of the dll they bind to, [on Linux and other ELF platforms symbol names are global]

That sounds just like RTLD_LOCAL on ELF systems. Moreover, I don't see ld.so using RTLD_GLOBAL during the initial program start, that seems to be a dlopen-exclusive. So shouldn't symbols be appropirately namespaced? (But indeed I remember symbols sharing the global scope, dunno what's going on.)

The ABI status of ELF hash tables

Posted Aug 22, 2022 17:00 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

It is not the same. RTLD_LOCAL is not usable in open-ended "I load plugins via dlopen" mechanisms. The problem is when you have:

pluginA.so -> liblang.so
pluginB.so -> liblang.so

Loading pluginA with RTLD_LOCAL makes liblang.so enter the same state. When pluginB is loaded, it fails because it is not "allowed" to access liblang.so because it is "local" to pluginA. It's probably more useful when you know plugins are standalone or you're doing some restricted delay-load mechanism.

The ABI status of ELF hash tables

Posted Aug 23, 2022 2:55 UTC (Tue) by bartoc (guest, #124262) [Link] (1 responses)

What windows does is a lot closer to dlmopen than it is to RTLD_LOCAL. In windows if you load (or link against) two dlls that both link against the same dll themselves, then they can very well use different versions of that dll.

You can actually have stuff like multiple different C runtimes in the same process (although it's super likely to result in sadness and chaos as soon as one touches the other). Still, if you are very careful it is possible. They do need to be named differently though.

Not to mention if you load C++ libraries with RTLD_LOCAL they usually can't be unloaded anyway, _AND_ some (but not all) of their symbols will actually be used to resolve references in future libraries. RTLD_LOCAL can also be promoted to RTLD_GLOBAL if the library is a dependency of an RTLD_GLOBAL library.

dlmopen provides something much more similar to windows DLLs, but you can only have 16 namespaces and according to the manual (typos from there):
> As at glibc 2.24, specifying the RTLD_GLOBAL flag when calling
dlmopen() generates an error. Furthermore, specifying
RTLD_GLOBAL when calling dlopen() results in a program crash
(SIGSEGV) if the call is made from any object loaded in a
namespace other than the initial namespace.

So it seems like it's a little useless. Someone should fix this stuff, and perhaps allow some kind of namespacing for stuff used via the linker, without dlsym. I guess if you really want that you can just use wine's loader.

The ABI status of ELF hash tables

Posted Aug 23, 2022 11:47 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

> Not to mention if you load C++ libraries with RTLD_LOCAL they usually can't be unloaded anyway

Pretty much anything with global statics is nigh unloadable already. macOS just no-ops unloading anything that mentions `thread_local` as well. I don't see much value in being able to unload libraries. Windows is a bit different because there are APIs to query libraries without actually loading them (but is not that useful to anything caring about cross-platform support).

The ABI status of ELF hash tables

Posted Aug 21, 2022 10:38 UTC (Sun) by willy (subscriber, #9762) [Link] (3 responses)

Nobody seems to have pointed out the obvious solution here -- make it in the EAC developers interests to fix the bug.

If the linker preferentially uses DT_GNU_HASH and ignores DT_HASH, and EAC only checks DT_HASH, then create cheats for popular games that include a faked DT_HASH section which pretends everything is OK.

The ABI status of ELF hash tables

Posted Aug 21, 2022 13:23 UTC (Sun) by k8to (guest, #15413) [Link]

I love it.

The ABI status of ELF hash tables

Posted Aug 21, 2022 16:41 UTC (Sun) by fw (subscriber, #26023) [Link] (1 responses)

glibc already prefers DT_GNU_HASH over DT_HASH when an object has both. I assume it's been this way since the introduction of DT_GNU_HASH.

I haven't look at the tool. It's not just consistency checks those DT_HASH lookups could be used for. Maybe they use DT_HASH lookups to find the original definition of dlsym after interposing it.

The ABI status of ELF hash tables

Posted Aug 21, 2022 23:52 UTC (Sun) by k8to (guest, #15413) [Link]

Either way, the easy anti-cheat developers just admitted that this component of their tool is easily misled, which kind of invalidates the utility of having that functionality especially now that it's public knowledge.

The ABI status of ELF hash tables

Posted Aug 22, 2022 0:19 UTC (Mon) by maskray (subscriber, #112608) [Link] (1 responses)

> Blaming the EAC developers for not keeping up with Linux ELF hash-table formats might not be entirely fair.

Probably. But DT_HASH has mostly disappeared from Fedora for 16+ years and several other distributions for similarly long years.
libc.so.6 does contain DT_HASH for a long time, but it is just a rare exception.

> The DT_HASH format is mandated by the System V ABI specification, the DT_GNU_HASH format is undocumented, and there has been no deprecation campaign to get users to move on. Chances are those developers are as surprised as anybody and haven't just been ignoring the "switch to DT_GNU_HASH" entry languishing in their issue tracker for the last decade or so. Regardless of blame, though, something needs to be done to solve this problem and save gamers from the prospect of having to get some actual work done.

I think it is fair to say that DT_GNU_HASH is de facto deprecated.
DT_GNU_HASH not in generic ABI is not an issue since In general a processor supplement ABI or an operating system ABI can replace a generic ABI feature, and we should not read too much from the generic ABI wording.

I have more to say on https://maskray.me/blog/2022-08-21-glibc-and-dt-gnu-hash

The ABI status of ELF hash tables

Posted Aug 23, 2022 15:24 UTC (Tue) by JanC_ (guest, #34940) [Link]

I think you mean DT_HASH (not DT_GNU_HASH) is de facto deprecated.

Keeping it available might be useful for some general purpose distros that want compatibility with very old binaries, but seems useless for e.g. most embedded systems or cloud images.


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds