Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Posted Oct 1, 2021 3:46 UTC (Fri) by Cyberax (✭ supporter ✭, #52523)In reply to: Ratiu: A tale of two toolchains and glibc by Paf
Parent article: Ratiu: A tale of two toolchains and glibc
Most of the complexity of glibc is entirely artificial. In the end, it doesn't actually DO that much. Musl libc is a nice counter-example, it has most of glibc features yet it's a small and nimble library that is easy to use or develop.
Posted Oct 1, 2021 7:19 UTC (Fri)
by LtWorf (subscriber, #124958)
[Link] (27 responses)
Posted Oct 1, 2021 7:41 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (26 responses)
Here's an example of architectural support bits from musl: https://git.musl-libc.org/cgit/musl/tree/arch/powerpc64 - just under a thousand LOC for the usual stuff (atomics, syscall protocols, etc) and a bunch of constants extracted from the kernel defs. Sure, there are some additional assembly files to accelerate memory functions for some archs, but they are not essential.
Everything is nicely and logically organized. I have added musl support for Tilera (don't ask) and it required only a couple days of work, including learning its assembly. It was really that simple.
For comparison, this is the same architecture from glibc: https://github.com/bminor/glibc/tree/595c22ecd8e87a27fd19... - the amount of cruft is staggering. It's hard to even find out which parts go where.
I get it, this is a library designed in days when Hurd seemed like a good idea. There's a ton of obsolete and crufty stuff (e.g. its test suite). And it's not really getting any better.
If it were up to me, I'd put glibc into maintenance mode and start switching to musl instead.
Posted Oct 1, 2021 8:26 UTC (Fri)
by immibis (subscriber, #105511)
[Link] (15 responses)
Posted Oct 1, 2021 8:38 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (14 responses)
In reality, dlclose() can NOT be implemented sanely. It's inherently racy and conflicts with things like TLS cleanup. E.g.: https://gitlab.gnome.org/GNOME/glib/-/issues/1311
And not implementing broken-by-design features is honestly why I love musl-libc.
Posted Oct 1, 2021 21:40 UTC (Fri)
by immibis (subscriber, #105511)
[Link] (12 responses)
Posted Oct 1, 2021 22:10 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link] (11 responses)
> An application writer may use dlclose() to make a statement of intent on the part of the process, but this statement does not create any requirement upon the implementation. When the symbol table handle is closed, the implementation may unload the executable object files that were loaded by dlopen() when the symbol table handle was opened and those that were loaded by dlsym() when using the symbol table handle identified by handle.
POSIX expressly permits dlclose to be a stub function that does nothing and returns zero. Any application which requires a different behavior is not portable. If you don't like that, go complain to the standards people.
[1]: https://pubs.opengroup.org/onlinepubs/9699919799/function...
Posted Oct 2, 2021 23:19 UTC (Sat)
by iainn (guest, #64312)
[Link] (10 responses)
Posted Oct 3, 2021 0:13 UTC (Sun)
by mpr22 (subscriber, #60784)
[Link] (9 responses)
This is not a rhetorical question; I will cheerfully accept an answer of "yes, and here it is" :)
Posted Oct 3, 2021 8:46 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
* The only formal change in semantics after calling dlclose() is that the application is no longer permitted to dereference certain pointers. Perhaps it's a tad obvious, but a conformant application must not dereference those pointers. Therefore, the application is not permitted to assume that dereferencing those pointers will, say, generate SIGSEGV, trip a guard page, or have any other desired or undesired effect, because the standard flatly forbids such dereferencing in the first place.
Posted Oct 3, 2021 12:26 UTC (Sun)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Generally, I would think that the programming environment would need to explicitly support hot reloading of code. Something like Erlang comes to mind. Python attempts to support it with `reload()`, but things get…weird and I wouldn't really recommend it without a long list of caveats. Native code-targeting systems usually don't have the safety rails needed for such things.
Posted Oct 9, 2021 8:31 UTC (Sat)
by sionescu (subscriber, #59410)
[Link]
Posted Oct 3, 2021 23:09 UTC (Sun)
by immibis (subscriber, #105511)
[Link]
Beyond that, I know one application that uses it for hot software upgrade - specifically UnrealIRCD.
Posted Oct 3, 2021 12:57 UTC (Sun)
by iainn (guest, #64312)
[Link] (3 responses)
No, sorry, I was being a bit facetious.
But dlclose being unusable genuinely baffles me, coming from a high level (e.g. .NET) perspective. *Obviously* you want to be able to unload a plugin when you're done with it. In .NET you just use an AssemblyLoadContext.
Posted Oct 3, 2021 13:50 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Because it does NOT support unloading of individual assemblies. You can "unload" the whole context but not individual assemblies.
Posted Oct 3, 2021 14:31 UTC (Sun)
by iainn (guest, #64312)
[Link] (1 responses)
You later Release() the whole context, which also cleans up any dependencies. That's a good thing.
Posted Oct 3, 2021 22:27 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Second, ALC can not be unloaded forcefully. If it's in use, then "unload" method simply does nothing. This wholly depends on GC being able to enumerate all the references to the context.
Posted Oct 9, 2021 8:28 UTC (Sat)
by sionescu (subscriber, #59410)
[Link]
Posted Oct 9, 2021 8:25 UTC (Sat)
by sionescu (subscriber, #59410)
[Link]
Posted Oct 1, 2021 9:01 UTC (Fri)
by LtWorf (subscriber, #124958)
[Link] (4 responses)
https://github.com/iron-io/dockers/issues/42#issuecomment...
https://bugs.python.org/issue32307
And let's not forget the slower memory allocation!
Posted Oct 1, 2021 12:21 UTC (Fri)
by ibukanov (subscriber, #3942)
[Link] (2 responses)
Posted Oct 1, 2021 14:35 UTC (Fri)
by Paf (subscriber, #91811)
[Link] (1 responses)
Posted Oct 1, 2021 17:25 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
When faced with competing priorities, don't moan becasue someone else's priorities are different to yours ...
Cheers,
Posted Oct 4, 2021 22:02 UTC (Mon)
by gps (subscriber, #45638)
[Link]
Posted Oct 1, 2021 10:08 UTC (Fri)
by jsm28 (subscriber, #104141)
[Link]
Posted Oct 2, 2021 13:20 UTC (Sat)
by nix (subscriber, #2304)
[Link] (2 responses)
You'd prefer a library with... fewer tests? On the grounds of an argument by (as far as I can see) pure assertion that some or all of the tests are "obsolete and crufty"? Even though tests by their very nature don't intrude on the library itself, so who honestly cares if they're crufty (if they even are: yes, some are complex: that's the nature of good tests).
Seriously, the glibc test suite is so good in some areas (particularly threading) that it gets used routinely to find bugs in *other libcs*.
There's no way ditching the testsuite is on the cards. If anything, other libcs need lots more.
Posted Oct 3, 2021 22:33 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
No. I want a library with a test suite that is not a mess.
Posted Oct 4, 2021 12:03 UTC (Mon)
by nix (subscriber, #2304)
[Link]
Yes, some of the tests do look a bit all over the place and unsystematic. Have you ever looked at GCC's testsuite? Or Rust's? Or LLVM's? All testsuites have a lot of unsystematic tests in them, because that is the subset of tests derived directly from observed regressions, which are by their nature all over the place because users do all sorts of strange things. This is *good*.
Posted Apr 20, 2022 17:28 UTC (Wed)
by prideauxx (guest, #158112)
[Link]
Please forgive the ask (realize that was the advice). I am interested in Tilera myself and a port of musl supporting it would be quite useful. Have you made public the effort (git), or would you consider it? I would be happy to contribute to the effort if desired/helpful.
Thank you!
Posted Oct 1, 2021 8:22 UTC (Fri)
by immibis (subscriber, #105511)
[Link] (15 responses)
Posted Oct 1, 2021 8:24 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
It also doesn't implement the whole iconv morass and other half-baked ideas from glibc.
Posted Oct 1, 2021 13:49 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link] (13 responses)
Posted Oct 3, 2021 8:25 UTC (Sun)
by k8to (guest, #15413)
[Link] (12 responses)
Granted, I am not at all a fan of the plug-in pattern. But it's a case where this lack would be harmful.
Posted Oct 3, 2021 12:28 UTC (Sun)
by mathstuf (subscriber, #69389)
[Link] (11 responses)
Posted Oct 3, 2021 13:58 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (1 responses)
> There's not really any support to help track these things and untangle what static global initialization ends up doing,
> so all I can say is "sorry, that use case was not considered in the design and there's not much we can do today; just don't even try to unload the library".
Posted Oct 4, 2021 1:01 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
Posted Oct 4, 2021 9:46 UTC (Mon)
by immibis (subscriber, #105511)
[Link] (8 responses)
Posted Oct 4, 2021 14:41 UTC (Mon)
by HelloWorld (guest, #56129)
[Link] (7 responses)
Then you just make a class that has a bunch of members of these class types, and when the plugin is loaded, an instance of this class is created, running all those constructors and performing whatever initialization is necessary. Before unloading the plugin, the object is destroyed, running its destructor and cleaning everything up, after which it can safely be unloaded.
Posted Oct 4, 2021 16:56 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link] (6 responses)
FWIW, I don't think even Rust can do it because any interaction with the loaded library's APIs need an additional `'dlopen` lifetime attached. I don't know of any way to do this where the library is not "leaked" to be `'static` or done at the top-level rather than as some inner routine. Not to mention that APIs would now need to consider such lifetimes. My function taking a `fn() -> u32` now needs a lifetime attached to consider that it could have been loaded at runtime. Similar for any `&'static str` API which might take static data from a loaded library. Or can loaded APIs just not be used like that? Who is going to go and backfill all of this? Or are loaded libraries just going to be mostly useless?
Posted Oct 4, 2021 22:35 UTC (Mon)
by immibis (subscriber, #105511)
[Link] (5 responses)
Yes, you have to be careful when holding a reference to something that could be from a plugin. Like I said, C[++] never claimed to be free of footguns.
Microsoft COM does it the other way around; the plugin has a function which returns whether the plugin still has any references to its refcountable objects, and the equivalent of dlclose is not called until this function says there are none.
Posted Oct 5, 2021 14:12 UTC (Tue)
by HelloWorld (guest, #56129)
[Link] (4 responses)
Posted Oct 5, 2021 15:16 UTC (Tue)
by immibis (subscriber, #105511)
[Link] (3 responses)
Posted Oct 5, 2021 15:24 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
And I know that it may be possible to do such things in principle; I'm arguing that `dlclose` being a no-op is a reasonable design decision given the difficulty of Doing It Right™ in practice. And if you have examples of well-designed plugin systems in C or C++, I'd greatly appreciate pointers to them.
Posted Oct 5, 2021 18:11 UTC (Tue)
by HelloWorld (guest, #56129)
[Link] (1 responses)
Posted Oct 7, 2021 12:10 UTC (Thu)
by immibis (subscriber, #105511)
[Link]
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
> [...]
> Although a dlclose() operation is not required to remove any functions or data objects from the address space, neither is an implementation prohibited from doing so. [...]
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
* The caller is expressly forbidden from interpreting the handle returned by dlopen "in any way." This presumably includes comparing it for equality with other handles returned by dlopen. Therefore, a conformant implementation may return the same handle every time you dlopen the same file, and keep an internal reference count (which the non-normative section of the dlclose standard explicitly calls out as a thing that implementations may do). If dlclose does nothing, then you just omit the reference count.
* Conformant implementations are also permitted to reuse closed handles, and a conformant implementation could even keep track of which object files were opened in the past and conspire to reuse their handle values if they are ever reopened in the future. Of course, if dlclose does nothing, then that's not really much of a "conspiracy."
* Maybe you're short on memory and trying to reclaim it? Well, that's not a very good reason at all. The pages which dlclose would free are backed by an object file on disk. If those pages are not in active use, the kernel should drop them automatically under memory pressure.
* Maybe you're trying to implement some crazy mechanism where you can replace object files without stopping and restarting the applications which are using them? Eh, that's probably a pipe dream anyway. Stopping and restarting your app is way easier than carefully shutting down an entire module of your program and then starting it up again. Also, the stop/restart dance is a general pattern, well supported by tools such as APT and systemd.
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Greater allocator speed is not the only consideration, so this is awfully glib.
Ratiu: A tale of two toolchains and glibc
Wol
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
I have added musl support for Tilera (don't ask)
<<
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
That's not an argument against unloading a library but against static registries.
That is, again, not an argument against unloading a library but against side effects in static global initialization.
No, it just means you need to take care in your design and not rely on bad ideas like side effects in static global initialization, global registries etc.. It's really not that hard to not use global variables! And it's not that hard to clean up after yourself properly in C++ where you have destructors that really do compose quite nicely (unlike half-baked solutions like e. g. Java's finally blocks).
Alas, this library is already bad with passing around information (e.g., every class is intrusively refcounted because it was started back before the STL was a reliable thing). I'm just as against static registries as anyone else, but the designs available at the time did not afford simple solutions on the consuming side (e.g.
Ratiu: A tale of two toolchains and glibc
#if MPI_ENABLED
if (mpi_is_being_used)
use_mpi_aware_subclass();
else
#endif
use_parent_class();
If, instead, the mere use of the parent class could use the MPI-aware one automatically, things are just nicer for everyone at the cost of not being able to unload the library (not a big cost even in my mind today). This is done by the MPI-aware subclass hooking its subclass in at library load time to return it instead of the baseclass from its "constructor" (since actual constructors are not used in this library). Note that there are additional APIs available to MPI-aware subclasses, so it's not always feasible to just host the functionality in the baseclass itself (though it would itself become a rats nest of #if to support it most likely not to mention the fun everyone has with conditional dependencies/functionality in build systems).
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
This is how you get things like that to work correctly *by construction*. It's basically impossible to get it wrong when you do it this way...
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
My point exactly. To me it increasingly sounds like mathstuf is thinking of some specific application that he would like to load code into at runtime, but he can't make it work because of architectural problems and thus concludes that it must be infeasible for everybody else as well.
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
Ratiu: A tale of two toolchains and glibc
