|
|
Subscribe / Log in / New account

Ratiu: A tale of two toolchains and glibc

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 3:46 UTC (Fri) by Cyberax (✭ supporter ✭, #52523)
In reply to: Ratiu: A tale of two toolchains and glibc by Paf
Parent article: Ratiu: A tale of two toolchains and glibc

I personally hate glibc with a passion. It's a huge pile of ...legacy... that is full of bad decisions and is needlessly complicated.

Most of the complexity of glibc is entirely artificial. In the end, it doesn't actually DO that much. Musl libc is a nice counter-example, it has most of glibc features yet it's a small and nimble library that is easy to use or develop.


to post comments

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 7:19 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (27 responses)

Well what glibc does is support an amount of hardware and architectures with their weirdness.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 7:41 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (26 responses)

Yeah. It supports tons of stuff. Badly.

Here's an example of architectural support bits from musl: https://git.musl-libc.org/cgit/musl/tree/arch/powerpc64 - just under a thousand LOC for the usual stuff (atomics, syscall protocols, etc) and a bunch of constants extracted from the kernel defs. Sure, there are some additional assembly files to accelerate memory functions for some archs, but they are not essential.

Everything is nicely and logically organized. I have added musl support for Tilera (don't ask) and it required only a couple days of work, including learning its assembly. It was really that simple.

For comparison, this is the same architecture from glibc: https://github.com/bminor/glibc/tree/595c22ecd8e87a27fd19... - the amount of cruft is staggering. It's hard to even find out which parts go where.

I get it, this is a library designed in days when Hurd seemed like a good idea. There's a ton of obsolete and crufty stuff (e.g. its test suite). And it's not really getting any better.

If it were up to me, I'd put glibc into maintenance mode and start switching to musl instead.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 8:26 UTC (Fri) by immibis (subscriber, #105511) [Link] (15 responses)

You would switch everyone to a library which does not implement such a common basic function as dlclose?

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 8:38 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (14 responses)

Just for fun, I just now put a DTrace tracepoint in dlclose. Nothing is hitting it outside of Java (???).

In reality, dlclose() can NOT be implemented sanely. It's inherently racy and conflicts with things like TLS cleanup. E.g.: https://gitlab.gnome.org/GNOME/glib/-/issues/1311

And not implementing broken-by-design features is honestly why I love musl-libc.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 21:40 UTC (Fri) by immibis (subscriber, #105511) [Link] (12 responses)

Sorry, but an explicit inability to unload things you loaded is *also" broken by design. Imagine if you had fopen but no fclose. Or malloc but no free.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 22:10 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (11 responses)

Quoth POSIX[1]:

> An application writer may use dlclose() to make a statement of intent on the part of the process, but this statement does not create any requirement upon the implementation. When the symbol table handle is closed, the implementation may unload the executable object files that were loaded by dlopen() when the symbol table handle was opened and those that were loaded by dlsym() when using the symbol table handle identified by handle.
> [...]
> Although a dlclose() operation is not required to remove any functions or data objects from the address space, neither is an implementation prohibited from doing so. [...]

POSIX expressly permits dlclose to be a stub function that does nothing and returns zero. Any application which requires a different behavior is not portable. If you don't like that, go complain to the standards people.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/function...

Ratiu: A tale of two toolchains and glibc

Posted Oct 2, 2021 23:19 UTC (Sat) by iainn (guest, #64312) [Link] (10 responses)

ISO/IEC 9899 expressly allows malloc to be a stub that simply returns NULL, but you'd rightly say that's unhelpful.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 0:13 UTC (Sun) by mpr22 (subscriber, #60784) [Link] (9 responses)

Can you point to (not merely assert the existence of) any real code that people are actually using to get things done that invokes dlclose() and actually expects it to do anything?

This is not a rhetorical question; I will cheerfully accept an answer of "yes, and here it is" :)

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 8:46 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (3 responses)

It is difficult to fathom such code, for the following reasons:

* The only formal change in semantics after calling dlclose() is that the application is no longer permitted to dereference certain pointers. Perhaps it's a tad obvious, but a conformant application must not dereference those pointers. Therefore, the application is not permitted to assume that dereferencing those pointers will, say, generate SIGSEGV, trip a guard page, or have any other desired or undesired effect, because the standard flatly forbids such dereferencing in the first place.
* The caller is expressly forbidden from interpreting the handle returned by dlopen "in any way." This presumably includes comparing it for equality with other handles returned by dlopen. Therefore, a conformant implementation may return the same handle every time you dlopen the same file, and keep an internal reference count (which the non-normative section of the dlclose standard explicitly calls out as a thing that implementations may do). If dlclose does nothing, then you just omit the reference count.
* Conformant implementations are also permitted to reuse closed handles, and a conformant implementation could even keep track of which object files were opened in the past and conspire to reuse their handle values if they are ever reopened in the future. Of course, if dlclose does nothing, then that's not really much of a "conspiracy."
* Maybe you're short on memory and trying to reclaim it? Well, that's not a very good reason at all. The pages which dlclose would free are backed by an object file on disk. If those pages are not in active use, the kernel should drop them automatically under memory pressure.
* Maybe you're trying to implement some crazy mechanism where you can replace object files without stopping and restarting the applications which are using them? Eh, that's probably a pipe dream anyway. Stopping and restarting your app is way easier than carefully shutting down an entire module of your program and then starting it up again. Also, the stop/restart dance is a general pattern, well supported by tools such as APT and systemd.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 12:26 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (1 responses)

> * Maybe you're trying to implement some crazy mechanism where you can replace object files without stopping and restarting the applications which are using them? Eh, that's probably a pipe dream anyway. Stopping and restarting your app is way easier than carefully shutting down an entire module of your program and then starting it up again. Also, the stop/restart dance is a general pattern, well supported by tools such as APT and systemd.

Generally, I would think that the programming environment would need to explicitly support hot reloading of code. Something like Erlang comes to mind. Python attempts to support it with `reload()`, but things get…weird and I wouldn't really recommend it without a long list of caveats. Native code-targeting systems usually don't have the safety rails needed for such things.

Ratiu: A tale of two toolchains and glibc

Posted Oct 9, 2021 8:31 UTC (Sat) by sionescu (subscriber, #59410) [Link]

We rely on hot-reloading of C libraries when developing Common Lisp FFI wrappers: https://github.com/cffi/cffi/blob/master/src/libraries.li....

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 23:09 UTC (Sun) by immibis (subscriber, #105511) [Link]

The really obvious thing to do with dlclose is to unload a plugin that's no longer in use. Especially if you have a long-running server application that may be reconfigured with SIGHUP.

Beyond that, I know one application that uses it for hot software upgrade - specifically UnrealIRCD.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 12:57 UTC (Sun) by iainn (guest, #64312) [Link] (3 responses)

No, sorry, I was being a bit facetious.

But dlclose being unusable genuinely baffles me, coming from a high level (e.g. .NET) perspective. *Obviously* you want to be able to unload a plugin when you're done with it. In .NET you just use an AssemblyLoadContext.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 13:50 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

It's funny that you mention .NET.

Because it does NOT support unloading of individual assemblies. You can "unload" the whole context but not individual assemblies.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 14:31 UTC (Sun) by iainn (guest, #64312) [Link] (1 responses)

I don't get what's so funny. You can spin an isolated AssemblyLoadContext, for an individual plugin.

You later Release() the whole context, which also cleans up any dependencies. That's a good thing.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 22:27 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

Well, AssemblyLoadContext is a fairly new feature (starting from .NET 3), before that there was no way to unload assemblies at all.

Second, ALC can not be unloaded forcefully. If it's in use, then "unload" method simply does nothing. This wholly depends on GC being able to enumerate all the references to the context.

Ratiu: A tale of two toolchains and glibc

Posted Oct 9, 2021 8:28 UTC (Sat) by sionescu (subscriber, #59410) [Link]

Ratiu: A tale of two toolchains and glibc

Posted Oct 9, 2021 8:25 UTC (Sat) by sionescu (subscriber, #59410) [Link]

Lots of dynamic languages use dlclose(), all Common Lisp implementations that I know, and probably many Scheme ones too.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 9:01 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (4 responses)

Ah yes musl… for those who like debugging all sorts of weird bugs and incompatibilities happening in their libc!

https://github.com/iron-io/dockers/issues/42#issuecomment...

https://bugs.python.org/issue32307

And let's not forget the slower memory allocation!

https://news.ycombinator.com/item?id=23080290

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 12:21 UTC (Fri) by ibukanov (subscriber, #3942) [Link] (2 responses)

The first link is really a Python bug that assumed something in a cross-platform code that was not true in general. The second issue was fixed. And the third link explains that Musl allocator was optimized for minimal memory usage and robustness, not speed. If the allocator speed important, the application should link against jemalloc which in general is faster than glibc one.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 14:35 UTC (Fri) by Paf (subscriber, #91811) [Link] (1 responses)

“ If the allocator speed important, the application should link against jemalloc which in general is faster than glibc one.”
Greater allocator speed is not the only consideration, so this is awfully glib.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 17:25 UTC (Fri) by Wol (subscriber, #4433) [Link]

But the OP was responding to a comment he considered glib - "musl isn't fast because speed wasn't an important optimisation criterium".

When faced with competing priorities, don't moan becasue someone else's priorities are different to yours ...

Cheers,
Wol

Ratiu: A tale of two toolchains and glibc

Posted Oct 4, 2021 22:02 UTC (Mon) by gps (subscriber, #45638) [Link]

Thanks for the pointer to the CPython issue, I can move that forward.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 10:08 UTC (Fri) by jsm28 (subscriber, #104141) [Link]

Actually there has been a great deal of work over the past ten years on replacing architecture-specific code in glibc with architecture-independent code, with as little duplication between architectures as possible, and this is ongoing, and the checklist for new architecture ports - https://sourceware.org/glibc/wiki/NewPorts - includes using generic code where possible. This work isn't mentioned in the NEWS file because that's about user-visible changes, not internal improvements.

Ratiu: A tale of two toolchains and glibc

Posted Oct 2, 2021 13:20 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)

> There's a ton of obsolete and crufty stuff (e.g. its test suite).

You'd prefer a library with... fewer tests? On the grounds of an argument by (as far as I can see) pure assertion that some or all of the tests are "obsolete and crufty"? Even though tests by their very nature don't intrude on the library itself, so who honestly cares if they're crufty (if they even are: yes, some are complex: that's the nature of good tests).

Seriously, the glibc test suite is so good in some areas (particularly threading) that it gets used routinely to find bugs in *other libcs*.

There's no way ditching the testsuite is on the cards. If anything, other libcs need lots more.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 22:33 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> You'd prefer a library with... fewer tests?

No. I want a library with a test suite that is not a mess.

Ratiu: A tale of two toolchains and glibc

Posted Oct 4, 2021 12:03 UTC (Mon) by nix (subscriber, #2304) [Link]

Oh good, you'll be happy that it's being cleaned up, then? Because from your comments earlier you appear to be completely unaware of all the changes in the testsuite over the last few years (common test skeletons, increasing use of containerization to ensure that tests run in an environment closer to that on a real system, user namespaces to test stuff as root without being root...).

Yes, some of the tests do look a bit all over the place and unsystematic. Have you ever looked at GCC's testsuite? Or Rust's? Or LLVM's? All testsuites have a lot of unsystematic tests in them, because that is the subset of tests derived directly from observed regressions, which are by their nature all over the place because users do all sorts of strange things. This is *good*.

Ratiu: A tale of two toolchains and glibc

Posted Apr 20, 2022 17:28 UTC (Wed) by prideauxx (guest, #158112) [Link]

>>
I have added musl support for Tilera (don't ask)
<<

Please forgive the ask (realize that was the advice). I am interested in Tilera myself and a port of musl supporting it would be quite useful. Have you made public the effort (git), or would you consider it? I would be happy to contribute to the effort if desired/helpful.

Thank you!

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 8:22 UTC (Fri) by immibis (subscriber, #105511) [Link] (15 responses)

Musl famously doesn't implement dlclose. You can unref libraries, but they will never leave!

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 8:24 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Yeah, and so what? It's not an issue in practice.

It also doesn't implement the whole iconv morass and other half-baked ideas from glibc.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 13:49 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (13 responses)

Yeah, I don't think this is a problem in practice. Unloading libraries is nothing but a huge mess if there are any kinds of callback registrations or global registries involved. macOS just refuses to unload if the library so much as looks at thread-local storage. I also tend to just say "don't do that then" if issues are filed against the projects I work on upon closing the library. *Really* low-level libraries might be able to be unloaded, but even that I wouldn't really want to get too involved with the issues that can arise.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 8:25 UTC (Sun) by k8to (guest, #15413) [Link] (12 responses)

It's a practical problem for some software designs. People have leaned heavily into plugin-style applications do actually unload their plugins at time. They often have phased operation where the plugins are clearly not in use after some point.

Granted, I am not at all a fan of the plug-in pattern. But it's a case where this lack would be harmful.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 12:28 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (11 responses)

Some have come to the project(s) I work on and have asked why unloading doesn't work. The fundamental problem is tracking resources by the library they live in is generally impossible (in C++). The project ends up registering to static registries across library boundaries and callbacks can live in other libraries as well. There's not really any support to help track these things and untangle what static global initialization ends up doing, so all I can say is "sorry, that use case was not considered in the design and there's not much we can do today; just don't even try to unload the library".

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 13:58 UTC (Sun) by HelloWorld (guest, #56129) [Link] (1 responses)

> Some have come to the project(s) I work on and have asked why unloading doesn't work. The fundamental problem is tracking resources by the library they live in is generally impossible (in C++). The project ends up registering to static registries across library boundaries and callbacks can live in other libraries as well.
That's not an argument against unloading a library but against static registries.

> There's not really any support to help track these things and untangle what static global initialization ends up doing,
That is, again, not an argument against unloading a library but against side effects in static global initialization.

> so all I can say is "sorry, that use case was not considered in the design and there's not much we can do today; just don't even try to unload the library".
No, it just means you need to take care in your design and not rely on bad ideas like side effects in static global initialization, global registries etc.. It's really not that hard to not use global variables! And it's not that hard to clean up after yourself properly in C++ where you have destructors that really do compose quite nicely (unlike half-baked solutions like e. g. Java's finally blocks).

Ratiu: A tale of two toolchains and glibc

Posted Oct 4, 2021 1:01 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

Alas, this library is already bad with passing around information (e.g., every class is intrusively refcounted because it was started back before the STL was a reliable thing). I'm just as against static registries as anyone else, but the designs available at the time did not afford simple solutions on the consuming side (e.g.
#if MPI_ENABLED
if (mpi_is_being_used)
  use_mpi_aware_subclass();
else
#endif
  use_parent_class();
If, instead, the mere use of the parent class could use the MPI-aware one automatically, things are just nicer for everyone at the cost of not being able to unload the library (not a big cost even in my mind today). This is done by the MPI-aware subclass hooking its subclass in at library load time to return it instead of the baseclass from its "constructor" (since actual constructors are not used in this library). Note that there are additional APIs available to MPI-aware subclasses, so it's not always feasible to just host the functionality in the baseclass itself (though it would itself become a rats nest of #if to support it most likely not to mention the fun everyone has with conditional dependencies/functionality in build systems).

Ratiu: A tale of two toolchains and glibc

Posted Oct 4, 2021 9:46 UTC (Mon) by immibis (subscriber, #105511) [Link] (8 responses)

Believe it or not, it *is* possible to write a plugin that unregisters in its shutdown function everything that it registers in its startup function. Nobody said C[++] programming was easy, or not full of footguns. Sure, the language doesn't provide fault isolation. Not every application that uses plugins uses them for fault isolation.

Ratiu: A tale of two toolchains and glibc

Posted Oct 4, 2021 14:41 UTC (Mon) by HelloWorld (guest, #56129) [Link] (7 responses)

Actually this is the kind of thing that C++ is quite good at because of RAII. For every function that needs to be called when loading the plug-in, there should be a class that will call said function in its constructor and perform whatever cleanup is necessary in its destructor. Or better yet, the application you're writing a plugin for could provide these classes as the *only* way to call these functions.

Then you just make a class that has a bunch of members of these class types, and when the plugin is loaded, an instance of this class is created, running all those constructors and performing whatever initialization is necessary. Before unloading the plugin, the object is destroyed, running its destructor and cleaning everything up, after which it can safely be unloaded.
This is how you get things like that to work correctly *by construction*. It's basically impossible to get it wrong when you do it this way...

Ratiu: A tale of two toolchains and glibc

Posted Oct 4, 2021 16:56 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (6 responses)

Alas, C and C++ do not perform escape analysis and such pointers end up embedded everywhere at runtime. Instances of those classes from factories in the library are not tracked, static data pointers aren't tracked (e.g., strings), callbacks sometimes lack proper context data handling facilities ("I'm copying this" and "I'm done, please destroy"). One would want to have each of these things twiddle a refcount to block the `dlclose` call if any still exist (more APIs could revoke such things, but that is quite invasive). Not to mention that the system needs to track the library dependency chain and hold references from loading libraries being open too. But this is *way* expensive for the much more common case of "loaded via DT_NEEDED reference" where the system controls it all anyways. I just don't see the benefit being worth the complexity cost here.

FWIW, I don't think even Rust can do it because any interaction with the loaded library's APIs need an additional `'dlopen` lifetime attached. I don't know of any way to do this where the library is not "leaked" to be `'static` or done at the top-level rather than as some inner routine. Not to mention that APIs would now need to consider such lifetimes. My function taking a `fn() -> u32` now needs a lifetime attached to consider that it could have been loaded at runtime. Similar for any `&'static str` API which might take static data from a loaded library. Or can loaded APIs just not be used like that? Who is going to go and backfill all of this? Or are loaded libraries just going to be mostly useless?

Ratiu: A tale of two toolchains and glibc

Posted Oct 4, 2021 22:35 UTC (Mon) by immibis (subscriber, #105511) [Link] (5 responses)

Believe it or not, it's possible to create a plug-in architecture where everything loaded from the plug-in is known to be associated with the plug-in. This might involve *not* passing around plug-in objects literally everywhere. GIMP, for example, should have no problem unloading pluggable filters as the filter will disappear from the menu; the "last used filter" variable will be reset if that filter was from that plugin, and no other references will be retained.

Yes, you have to be careful when holding a reference to something that could be from a plugin. Like I said, C[++] never claimed to be free of footguns.

Microsoft COM does it the other way around; the plugin has a function which returns whether the plugin still has any references to its refcountable objects, and the equivalent of dlclose is not called until this function says there are none.

Ratiu: A tale of two toolchains and glibc

Posted Oct 5, 2021 14:12 UTC (Tue) by HelloWorld (guest, #56129) [Link] (4 responses)

> Believe it or not, it's possible to create a plug-in architecture where everything loaded from the plug-in is known to be associated with the plug-in.
My point exactly. To me it increasingly sounds like mathstuf is thinking of some specific application that he would like to load code into at runtime, but he can't make it work because of architectural problems and thus concludes that it must be infeasible for everybody else as well.

Ratiu: A tale of two toolchains and glibc

Posted Oct 5, 2021 15:16 UTC (Tue) by immibis (subscriber, #105511) [Link] (3 responses)

I also had that thought. It is easy to get stuck in the trap of assuming something can't possibly work if you've never seen a successful example of it. Perhaps they have simply never had the luck to stumble across a well-working plugin system.

Ratiu: A tale of two toolchains and glibc

Posted Oct 5, 2021 15:24 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (2 responses)

That could very well be true, but it is just the nature of the kind of project I'm thinking of where bits in different libraries have to interact with each other in ways that just cannot be enumerated and locked down without compromising other goals of the project.

And I know that it may be possible to do such things in principle; I'm arguing that `dlclose` being a no-op is a reasonable design decision given the difficulty of Doing It Right™ in practice. And if you have examples of well-designed plugin systems in C or C++, I'd greatly appreciate pointers to them.

Ratiu: A tale of two toolchains and glibc

Posted Oct 5, 2021 18:11 UTC (Tue) by HelloWorld (guest, #56129) [Link] (1 responses)

He mentioned Gimp.

Ratiu: A tale of two toolchains and glibc

Posted Oct 7, 2021 12:10 UTC (Thu) by immibis (subscriber, #105511) [Link]

GIMP was just one hypothetical scenario. I'm not sure how GIMP plugins are *actually* implemented, but they *could* be implemented in a way that allows them to be unloaded. Most of them register tools whose code only runs when they are explicitly activated by the user.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds