|
|
Subscribe / Log in / New account

Ratiu: A tale of two toolchains and glibc

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 7:19 UTC (Fri) by LtWorf (subscriber, #124958)
In reply to: Ratiu: A tale of two toolchains and glibc by Cyberax
Parent article: Ratiu: A tale of two toolchains and glibc

Well what glibc does is support an amount of hardware and architectures with their weirdness.


to post comments

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 7:41 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (26 responses)

Yeah. It supports tons of stuff. Badly.

Here's an example of architectural support bits from musl: https://git.musl-libc.org/cgit/musl/tree/arch/powerpc64 - just under a thousand LOC for the usual stuff (atomics, syscall protocols, etc) and a bunch of constants extracted from the kernel defs. Sure, there are some additional assembly files to accelerate memory functions for some archs, but they are not essential.

Everything is nicely and logically organized. I have added musl support for Tilera (don't ask) and it required only a couple days of work, including learning its assembly. It was really that simple.

For comparison, this is the same architecture from glibc: https://github.com/bminor/glibc/tree/595c22ecd8e87a27fd19... - the amount of cruft is staggering. It's hard to even find out which parts go where.

I get it, this is a library designed in days when Hurd seemed like a good idea. There's a ton of obsolete and crufty stuff (e.g. its test suite). And it's not really getting any better.

If it were up to me, I'd put glibc into maintenance mode and start switching to musl instead.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 8:26 UTC (Fri) by immibis (subscriber, #105511) [Link] (15 responses)

You would switch everyone to a library which does not implement such a common basic function as dlclose?

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 8:38 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (14 responses)

Just for fun, I just now put a DTrace tracepoint in dlclose. Nothing is hitting it outside of Java (???).

In reality, dlclose() can NOT be implemented sanely. It's inherently racy and conflicts with things like TLS cleanup. E.g.: https://gitlab.gnome.org/GNOME/glib/-/issues/1311

And not implementing broken-by-design features is honestly why I love musl-libc.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 21:40 UTC (Fri) by immibis (subscriber, #105511) [Link] (12 responses)

Sorry, but an explicit inability to unload things you loaded is *also" broken by design. Imagine if you had fopen but no fclose. Or malloc but no free.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 22:10 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (11 responses)

Quoth POSIX[1]:

> An application writer may use dlclose() to make a statement of intent on the part of the process, but this statement does not create any requirement upon the implementation. When the symbol table handle is closed, the implementation may unload the executable object files that were loaded by dlopen() when the symbol table handle was opened and those that were loaded by dlsym() when using the symbol table handle identified by handle.
> [...]
> Although a dlclose() operation is not required to remove any functions or data objects from the address space, neither is an implementation prohibited from doing so. [...]

POSIX expressly permits dlclose to be a stub function that does nothing and returns zero. Any application which requires a different behavior is not portable. If you don't like that, go complain to the standards people.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/function...

Ratiu: A tale of two toolchains and glibc

Posted Oct 2, 2021 23:19 UTC (Sat) by iainn (guest, #64312) [Link] (10 responses)

ISO/IEC 9899 expressly allows malloc to be a stub that simply returns NULL, but you'd rightly say that's unhelpful.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 0:13 UTC (Sun) by mpr22 (subscriber, #60784) [Link] (9 responses)

Can you point to (not merely assert the existence of) any real code that people are actually using to get things done that invokes dlclose() and actually expects it to do anything?

This is not a rhetorical question; I will cheerfully accept an answer of "yes, and here it is" :)

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 8:46 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (3 responses)

It is difficult to fathom such code, for the following reasons:

* The only formal change in semantics after calling dlclose() is that the application is no longer permitted to dereference certain pointers. Perhaps it's a tad obvious, but a conformant application must not dereference those pointers. Therefore, the application is not permitted to assume that dereferencing those pointers will, say, generate SIGSEGV, trip a guard page, or have any other desired or undesired effect, because the standard flatly forbids such dereferencing in the first place.
* The caller is expressly forbidden from interpreting the handle returned by dlopen "in any way." This presumably includes comparing it for equality with other handles returned by dlopen. Therefore, a conformant implementation may return the same handle every time you dlopen the same file, and keep an internal reference count (which the non-normative section of the dlclose standard explicitly calls out as a thing that implementations may do). If dlclose does nothing, then you just omit the reference count.
* Conformant implementations are also permitted to reuse closed handles, and a conformant implementation could even keep track of which object files were opened in the past and conspire to reuse their handle values if they are ever reopened in the future. Of course, if dlclose does nothing, then that's not really much of a "conspiracy."
* Maybe you're short on memory and trying to reclaim it? Well, that's not a very good reason at all. The pages which dlclose would free are backed by an object file on disk. If those pages are not in active use, the kernel should drop them automatically under memory pressure.
* Maybe you're trying to implement some crazy mechanism where you can replace object files without stopping and restarting the applications which are using them? Eh, that's probably a pipe dream anyway. Stopping and restarting your app is way easier than carefully shutting down an entire module of your program and then starting it up again. Also, the stop/restart dance is a general pattern, well supported by tools such as APT and systemd.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 12:26 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (1 responses)

> * Maybe you're trying to implement some crazy mechanism where you can replace object files without stopping and restarting the applications which are using them? Eh, that's probably a pipe dream anyway. Stopping and restarting your app is way easier than carefully shutting down an entire module of your program and then starting it up again. Also, the stop/restart dance is a general pattern, well supported by tools such as APT and systemd.

Generally, I would think that the programming environment would need to explicitly support hot reloading of code. Something like Erlang comes to mind. Python attempts to support it with `reload()`, but things get…weird and I wouldn't really recommend it without a long list of caveats. Native code-targeting systems usually don't have the safety rails needed for such things.

Ratiu: A tale of two toolchains and glibc

Posted Oct 9, 2021 8:31 UTC (Sat) by sionescu (subscriber, #59410) [Link]

We rely on hot-reloading of C libraries when developing Common Lisp FFI wrappers: https://github.com/cffi/cffi/blob/master/src/libraries.li....

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 23:09 UTC (Sun) by immibis (subscriber, #105511) [Link]

The really obvious thing to do with dlclose is to unload a plugin that's no longer in use. Especially if you have a long-running server application that may be reconfigured with SIGHUP.

Beyond that, I know one application that uses it for hot software upgrade - specifically UnrealIRCD.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 12:57 UTC (Sun) by iainn (guest, #64312) [Link] (3 responses)

No, sorry, I was being a bit facetious.

But dlclose being unusable genuinely baffles me, coming from a high level (e.g. .NET) perspective. *Obviously* you want to be able to unload a plugin when you're done with it. In .NET you just use an AssemblyLoadContext.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 13:50 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

It's funny that you mention .NET.

Because it does NOT support unloading of individual assemblies. You can "unload" the whole context but not individual assemblies.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 14:31 UTC (Sun) by iainn (guest, #64312) [Link] (1 responses)

I don't get what's so funny. You can spin an isolated AssemblyLoadContext, for an individual plugin.

You later Release() the whole context, which also cleans up any dependencies. That's a good thing.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 22:27 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

Well, AssemblyLoadContext is a fairly new feature (starting from .NET 3), before that there was no way to unload assemblies at all.

Second, ALC can not be unloaded forcefully. If it's in use, then "unload" method simply does nothing. This wholly depends on GC being able to enumerate all the references to the context.

Ratiu: A tale of two toolchains and glibc

Posted Oct 9, 2021 8:28 UTC (Sat) by sionescu (subscriber, #59410) [Link]

Ratiu: A tale of two toolchains and glibc

Posted Oct 9, 2021 8:25 UTC (Sat) by sionescu (subscriber, #59410) [Link]

Lots of dynamic languages use dlclose(), all Common Lisp implementations that I know, and probably many Scheme ones too.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 9:01 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (4 responses)

Ah yes musl… for those who like debugging all sorts of weird bugs and incompatibilities happening in their libc!

https://github.com/iron-io/dockers/issues/42#issuecomment...

https://bugs.python.org/issue32307

And let's not forget the slower memory allocation!

https://news.ycombinator.com/item?id=23080290

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 12:21 UTC (Fri) by ibukanov (subscriber, #3942) [Link] (2 responses)

The first link is really a Python bug that assumed something in a cross-platform code that was not true in general. The second issue was fixed. And the third link explains that Musl allocator was optimized for minimal memory usage and robustness, not speed. If the allocator speed important, the application should link against jemalloc which in general is faster than glibc one.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 14:35 UTC (Fri) by Paf (subscriber, #91811) [Link] (1 responses)

“ If the allocator speed important, the application should link against jemalloc which in general is faster than glibc one.”
Greater allocator speed is not the only consideration, so this is awfully glib.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 17:25 UTC (Fri) by Wol (subscriber, #4433) [Link]

But the OP was responding to a comment he considered glib - "musl isn't fast because speed wasn't an important optimisation criterium".

When faced with competing priorities, don't moan becasue someone else's priorities are different to yours ...

Cheers,
Wol

Ratiu: A tale of two toolchains and glibc

Posted Oct 4, 2021 22:02 UTC (Mon) by gps (subscriber, #45638) [Link]

Thanks for the pointer to the CPython issue, I can move that forward.

Ratiu: A tale of two toolchains and glibc

Posted Oct 1, 2021 10:08 UTC (Fri) by jsm28 (subscriber, #104141) [Link]

Actually there has been a great deal of work over the past ten years on replacing architecture-specific code in glibc with architecture-independent code, with as little duplication between architectures as possible, and this is ongoing, and the checklist for new architecture ports - https://sourceware.org/glibc/wiki/NewPorts - includes using generic code where possible. This work isn't mentioned in the NEWS file because that's about user-visible changes, not internal improvements.

Ratiu: A tale of two toolchains and glibc

Posted Oct 2, 2021 13:20 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)

> There's a ton of obsolete and crufty stuff (e.g. its test suite).

You'd prefer a library with... fewer tests? On the grounds of an argument by (as far as I can see) pure assertion that some or all of the tests are "obsolete and crufty"? Even though tests by their very nature don't intrude on the library itself, so who honestly cares if they're crufty (if they even are: yes, some are complex: that's the nature of good tests).

Seriously, the glibc test suite is so good in some areas (particularly threading) that it gets used routinely to find bugs in *other libcs*.

There's no way ditching the testsuite is on the cards. If anything, other libcs need lots more.

Ratiu: A tale of two toolchains and glibc

Posted Oct 3, 2021 22:33 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> You'd prefer a library with... fewer tests?

No. I want a library with a test suite that is not a mess.

Ratiu: A tale of two toolchains and glibc

Posted Oct 4, 2021 12:03 UTC (Mon) by nix (subscriber, #2304) [Link]

Oh good, you'll be happy that it's being cleaned up, then? Because from your comments earlier you appear to be completely unaware of all the changes in the testsuite over the last few years (common test skeletons, increasing use of containerization to ensure that tests run in an environment closer to that on a real system, user namespaces to test stuff as root without being root...).

Yes, some of the tests do look a bit all over the place and unsystematic. Have you ever looked at GCC's testsuite? Or Rust's? Or LLVM's? All testsuites have a lot of unsystematic tests in them, because that is the subset of tests derived directly from observed regressions, which are by their nature all over the place because users do all sorts of strange things. This is *good*.

Ratiu: A tale of two toolchains and glibc

Posted Apr 20, 2022 17:28 UTC (Wed) by prideauxx (guest, #158112) [Link]

>>
I have added musl support for Tilera (don't ask)
<<

Please forgive the ask (realize that was the advice). I am interested in Tilera myself and a port of musl supporting it would be quite useful. Have you made public the effort (git), or would you consider it? I would be happy to contribute to the effort if desired/helpful.

Thank you!


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds