C library system-call wrappers, or the lack thereof

By Jonathan Corbet
November 12, 2018

User-space developers may be accustomed to thinking of system calls as direct calls into the kernel. Indeed, the first edition of The C Programming Language described read() and write() as "a direct entry into the operating system". In truth, user-level "system calls" are just functions in the C library like any other. But what happens when the developers of the C library refuse to provide access to system calls they don't like? The result is an ongoing conflict that has recently flared up again; it shows some of the difficulties that can arise when the system as a whole has no ultimate designer and the developers are not talking to each other.

Calling into the kernel is not like calling a normal function; a special trap into the kernel must be triggered with the system-call arguments placed as the kernel expects. At a minimum, the system-call "wrapper" provided by the C library must set up this trap. In many cases, more work than that is required; the functionality provided by the kernel does not always exactly match what the application (or the relevant standards) will expect. Features like POSIX threads further complicate the situation. The end result is that a lot of work can be happening between the application and the kernel when a system call is made. Doing that work is, in most cases, delegated to the C library.

System calls in glibc

Many Linux systems use the GNU C Library (glibc) in this role; glibc is often thought of as the Linux C library. When the kernel developers add a new system call, it is thus natural to expect that a corresponding wrapper will show up in glibc, but there is no guarantee that this will ever happen. The addition of wrappers to glibc is often slow and, in some cases, the glibc developers have refused to add the wrappers at all. In such cases, user-space developers must fall back on syscall() to access that functionality, an approach that is both non-portable and error-prone.

Recently, frustration with this situation led Daniel Colascione to ask:

Now that glibc is basically not adding any new system call wrappers, how about publishing an "official" system call glue library as part of the kernel distribution, along with the uapi headers? I don't think it's reasonable to expect people to keep using syscall(__NR_XXX) for all new functionality, especially as the system grows increasingly sophisticated capabilities (like the new mount API, and hopefully the new process API) outside the strictures of the POSIX process.

It is worth noting, as Michael Kerrisk did, that it's not really true that glibc is no longer adding wrappers; quite a few have found their way into recent releases. But there are some notable exceptions, the most glaring of which is probably gettid(), which has been under discussion for over a decade with no real resolution in sight. Kerrisk suggested that, in most cases, the problem was simply a lack of developers on the glibc side and said that kernel developers should take more responsibility for the creation of glibc wrappers for new system calls:

A converse question that one could ask is: why did a culture evolve whereby kernel developers don't take responsibility for working with the major libc to ensure that wrappers are added as part of the job of adding each new system call?

Glibc developer Florian Weimer stated clearly that it's not just a matter of developer time, though: "It's not a matter of resources or lack thereof". In another message he explained why many system calls lack glibc wrappers, with a number of specific examples. "A lot of the new system calls lack clear specifications or are just somewhat misdesigned". In other cases, new system calls — such as renameat2() — use names that glibc had already used for other functions. Reasons vary, but the end result is the same for a number of system calls: no glibc wrappers to go along with them.

According to Colascione (and others like Willy Tarreau), the proper answer is to provide low-level system-call wrappers with the kernel itself:

These objections illustrate my point. glibc development is not the proper forum for raising post-hoc objections to system call design. Withholding wrappers will not un-ship these system calls. Applications are already using them, via syscall(2). Developers and users would be better served by providing access to the system as it is, with appropriate documentation caveats, than by holding out for some alternate and more ideal set of system calls that may or may not appear in the future.

In this view, glibc would retain all of the higher-level C-library functions, ceding only the system-call wrappers to this new library. But, according to Weimer, it's not so simple: circumventing glibc for system calls would break features, many associated with threading. Or, as Zack Weinberg put it: "The trouble is that 'raw system call wrappers and arcane kernel-userland glue' turns out to be a lot more code, with a lot more tentacles in both directions, than you might think". It's not just a matter of breaking things out into a separate library.

Overcoming the impasse

Arguably, what this whole discussion is really showing is that there need to be better lines of communication between kernel and C-library developers. It takes developers from both groups to actually make a feature available to user space after all; it would make sense for kernel developers — who are not always known for the best API designs — to talk more with the library developers who must actually support an API for application developers. Those communications are better now than they were some years ago, but one could argue that this is a low bar that has not been surmounted by much.

One complication there is that glibc is not the only C library that runs over the Linux kernel; it's not even the most popular one if one looks at the number of installed copies — that title surely belongs to the bionic library used by Android. The Linux community would be well served by a forum where developers from all C libraries could interact with kernel developers to address API problems before they are set into stone. The linux-api mailing list ostensibly serves that purpose now, but it is underused even before considering the absence of C-library developers there.

Once upon a time, all operating systems had an overall architect who would be responsible for ensuring coordination between the various layers of the system, but Linux lacks such a person. So developers have to find ways to coordinate on their own. Arguably, one place where this should be happening is the Linux Plumbers Conference, which starts November 13 in Vancouver. There is indeed a relevant session on the agenda, but it's not clear how many of the necessary developers will be there.

Free-software projects tend to value their independence; their developers have little time for others who would tell them what to do. But few projects truly stand alone. Whenever developers decide to cooperate more fully with related projects, the result tends to be better software for the community as a whole. The design and delivery of system calls would appear to be one of those places where a higher level of communication and cooperation would be a healthy thing. That, rather than trying to absorb low-level wrappers into the kernel project, seems like the proper long-term solution to this problem.

Index entries for this article
Kernel	Development model/User-space ABI

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 0:50 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (149 responses)

The correct answer here is to start deprecating glibc - it's just legacy crap that holds the industry back at this point.

The replacement should be:
1) Extremely simple, with most complicated stuff being string functions.
2) No getent, no NSS, no name resolution. All of this should be moved into a separate daemon with a simple RPC protocol.
3) No iconv either.
4) No support for anything but Linux. So no support for C89 compilers running on Tru64 on TI-86 to complicate things.

And finally, NO SYMBOL VERSIONING. At all. This was THE most braindead decision in the history of Linux.

musl is almost perfect right now, but it misses the NSS separation.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 0:53 UTC (Tue) by Paf (subscriber, #91811) [Link] (93 responses)

Why no symbol versioning?

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 1:29 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (92 responses)

Because your binaries stop being backwards compatible with libc. I.e. if you compile stuff on Ubuntu 18.04, you can't run it on Ubuntu 16.04 even if you don't use anything advanced from libc.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 2:55 UTC (Tue) by josh (subscriber, #17465) [Link] (72 responses)

That's not caused by symbol versioning, that's caused by doing symbol versioning *wrong*. You don't bump the version unless you make the syscall incompatible.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 11:35 UTC (Tue) by nix (subscriber, #2304) [Link] (71 responses)

It's not like this sort of thing is even fixable: symbol versioning never *claimed* to allow you to build on newer libraries and then run seamlessly on older ones. The case it solves is the opposite: reducing flag days when you have to rebuild everything because an soname has changed. (It can't eliminate it: shared data structures whose representations have changed will still sometimes force flag days unless very carefully managed. But it's a hell of a lot better than the converse.)

I would not want to live in Cyberax's world, even if I *didn't* make heavy use of NSS. We need *some* kind of pluggable name resolution system, though one that depends on dynamic linking is probably a bad idea, in hindsight, if only because of the nightmare of making it work at all with statically linked binaries.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 17:47 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (70 responses)

> It's not like this sort of thing is even fixable: symbol versioning never *claimed* to allow you to build on newer libraries and then run seamlessly on older ones. The case it solves is the opposite: reducing flag days when you have to rebuild everything because an soname has changed.
I have a simple program that uses ages-old epoll/read/print functions. Yet I can't compile it on recent distributions and run on RHEL6. This makes no sense whatsoever.

> I would not want to live in Cyberax's world, even if I *didn't* make heavy use of NSS. We need *some* kind of pluggable name resolution system
That's why it needs to live in a separate daemon.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 18:32 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (33 responses)

> Yet I can't compile it on recent distributions and run on RHEL6.

The alternative without symbol versioning is that binaries compiled on RHEL6 can't run on more recent versions of RHEL using system libraries linked against the latest glibc soname, which would be worse. You'd effectively need a RHEL6 container with all the transitive library dependencies to run older binaries, at which point you might as well just run RHEL6 in a container (and use that to build a properly symbol-versioned binary which runs natively on RHEL6 as well as modern versions).

The same problem would still apply in reverse, of course: anything built on modern RHEL without symbol versioning wouldn't run on RHEL6 using its system libraries since those libraries expect the older glibc soname and the binary expects the newer version. You can't reasonably link multiple versions of glibc into the same application. This is the problem that symbol versioning solves, if only in one direction (older binaries on newer systems). Absent symbol versioning you'd be forced to rebuild everything to match the system it's actually running on, not just the oldest system you support.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 18:38 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (32 responses)

> The alternative without symbol versioning is that binaries compiled on RHEL6 can't run on more recent versions of RHEL using system libraries linked against the latest glibc soname, which would be worse.
Yet somehow it works with other libraries, like libz. It also works on Windows which lacks the symbol versioning in the core libraries.

> Absent symbol versioning you'd be forced to rebuild everything to match the system it's actually running on, not just the oldest system you support.
Please stop spreading nonsense.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 18:40 UTC (Tue) by TomH (subscriber, #56149) [Link]

You do know that libz uses symbol versions right? Quite a few in fact:

Version definitions:
1 0x01 0x09d5f4e1 libz.so.1
2 0x00 0x0827e5c0 ZLIB_1.2.0
3 0x00 0x07e5cb32 ZLIB_1.2.0.2
ZLIB_1.2.0
4 0x00 0x07e5cb38 ZLIB_1.2.0.8
ZLIB_1.2.0.2
5 0x00 0x0827e5c2 ZLIB_1.2.2
ZLIB_1.2.0.8
6 0x00 0x07e5cd33 ZLIB_1.2.2.3
ZLIB_1.2.2
7 0x00 0x07e5cd34 ZLIB_1.2.2.4
ZLIB_1.2.2.3
8 0x00 0x07e5ce33 ZLIB_1.2.3.3
ZLIB_1.2.2.4
9 0x00 0x07e5ce34 ZLIB_1.2.3.4
ZLIB_1.2.3.3
10 0x00 0x07e5ce35 ZLIB_1.2.3.5
ZLIB_1.2.3.4
11 0x00 0x07e5d031 ZLIB_1.2.5.1
ZLIB_1.2.3.5
12 0x00 0x07e5d032 ZLIB_1.2.5.2
ZLIB_1.2.5.1
13 0x00 0x07e5c231 ZLIB_1.2.7.1
ZLIB_1.2.5.2
14 0x00 0x0827e5c9 ZLIB_1.2.9
ZLIB_1.2.7.1

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 22:35 UTC (Tue) by klossner (subscriber, #30046) [Link] (29 responses)

It also works on Windows which lacks the symbol versioning in the core libraries.

Google "DLL hell".

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 22:40 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (28 responses)

There has never been DLL hell with core Windows libraries. They care about that stuff.

DLL hell happened when dependencies tried to install themselves into a central location.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 4:17 UTC (Wed) by Hello71 (subscriber, #103412) [Link] (27 responses)

That's crap. Windows is far, far worse in this regard. MSVCRT.DLL, the most similar thing to libc.so that Windows has, has no official stable ABI at all. Microsoft instead recommends that you use the versioned runtimes, which are basically like SONAMEs except you have to manually change your compile configuration every time you upgrade, you have to bundle all of them with your program, and new libraries don't work on (sufficiently) old versions of the OS.

The specific subset of DLL hell you are referring to is equivalent to programs overwriting the central libc.so, which is a separate problem entirely.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 6:12 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (26 responses)

> That's crap. Windows is far, far worse in this regard. MSVCRT.DLL, the most similar thing to libc.so
Incorrect. MSVCRT is just an ordinary application-level library, a runtime for a particular compiler.

The libc analog is kernel32.dll and user32.dll. They have the WinAPI definitions that are most direct libc equivalents. For example: HeapAlloc ( https://docs.microsoft.com/en-us/windows/desktop/api/heap... ), lstrcpy ( https://docs.microsoft.com/en-us/windows/desktop/api/Winb... ) and so on. These libraries have stayed stable for two _decades_.

The central difference between kernel32/user32 and msvcrt.dll is that it's near impossible to avoid using kernel32/user32, just like it's almost impossible to avoid using libc on Linux unless you compile every dependency in parallel.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 8:33 UTC (Wed) by pbonzini (subscriber, #60935) [Link] (16 responses)

It's certainly possible to write a program that compiles on the latest Visual Studio and doesn't run on say Vista due to missing kernel32 symbols.

Are you sure that the problem is due to symbol versioning, and not just to the usage of symbols that are absent in the RHEL6 libc?

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 8:35 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (15 responses)

> It's certainly possible to write a program that compiles on the latest Visual Studio and doesn't run on say Vista due to missing kernel32 symbols.
Of course, there are no questions about that.

> Are you sure that the problem is due to symbol versioning, and not just to the usage of symbols that are absent in the RHEL6 libc?
Yes, there were changes to memcpy and some other core functions that pretty much everything is using.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 12:47 UTC (Wed) by nix (subscriber, #2304) [Link] (14 responses)

> Yes, there were changes to memcpy and some other core functions that pretty much everything is using.

Yes -- and the reason those symbols were versioned is that the new memcpy had changed semantics for improved performance on some hardware, which was still within the language standard but which some older buggy programs were not happy with. So symbol versioning meant that they didn't suddenly painfully break just because you upgraded your glibc.

(And as for bumping glibc soname frequently as you have also proposed -- this leads, during the rebuild, to a long period in which you have lots of binaries using two libc sonames at once, which means two malloc heaps, which means *disaster*. I get the impression that you've never tried the things you propose, or you'd *know* what terrible ideas they are and wouldn't be proposing them. Those of us who have tried it, and got burned, know not to try again.)

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 18:58 UTC (Wed) by quotemstr (subscriber, #45331) [Link] (4 responses)

I wish we were more tolerant of the use of multiple heaps. There are some advantages to heap splitting --- e.g., better concurrency, better memory-use attribution, and the ability to use different heap policies for different components. The use of multiple heaps is completely normal on Windows, and packages like jemalloc have internal APIs for taking blocks of memory and using them as heaps. The only real downside (besides some slight fragmentation overhead) is disaster when an allocation from heap A is freed in heap B. But proper API design alleviates this problem: if a library exports a function for allocating a resource, it must export a function for freeing that resource.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 19:48 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link] (2 responses)

There's no inherent problem with "using multiple heaps".

Eg, I'm maintaining a suricata fork which suppports a client/server operating mode where the server runs on the 'host' of a system providing any number of (lxc-based) virtualized VPN servers which all have a local suricata instance for traffic analysis. The per-container instance is created by forking the server on the host after initialization so that all clients (and the server) end up sharing most of the (huge) read-only-after-init datastructures of the program. Until after initialisation, a custom heap implementation is being used, afterwards, the program switches to using the glibc malloc heap. The idea behind this is that allocations by clients-instances don't end up dirtying pages allocated during init so that they remain shared.

But using multiple implementations of a set of functions called malloc, calloc, etc in the same program is obviously a recipe for disaster.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 20:44 UTC (Wed) by quotemstr (subscriber, #45331) [Link] (1 responses)

Sure, but what I meant is that I don't think it's even necessary for malloc and free and operator new and so on to refer to the same functions throughout the program. If module A wants to link with heap implementation X and module B wants to link against heap implementation Y, that should be okay. It's a special case of my support for macOS-like two-level symbol namespaces.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 16:29 UTC (Thu) by madscientist (subscriber, #16861) [Link]

I'm not really sure what you mean by "module". You can do this within a shared library boundary, by making the local malloc and free hidden symbols. We do this in one of our shared libraries which links jemalloc statically by compiling with the -fvisibility=hidden option; this ensures that programs that use our shared library don't also use our allocator. Doing it for each compilation unit is a whole other thing though.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 21:35 UTC (Wed) by excors (subscriber, #95769) [Link]

The biggest heap problem I've had on Windows is C++ libraries that think returning STL types like std::string and std::vector is proper API design. It *is* good design within an application, because RAII is good, and it usually works fine on Linux, but it's nasty for external library interfaces. Even if the STL implementations are compatible between library and application, the string is doing some hidden allocations in the library's heap and freeing them in the application's heap, so you get surprising crashes (if you're lucky) or memory corruption.

I think proper API design for C++ is to export malloc/free wrappers in your library's API, and write your own simple STL-ish RAII classes that use the library's heap functions, so it's guaranteed to work correctly across the interface - there's no way for the application to accidentally use the wrong heap, plus it will still avoid all leaks and double-frees, unlike a pure C interface. That's not hard to do; but it is kind of annoying that library interfaces are exactly the place where a standardised collection of data structures would be very useful, yet they're exactly the place where STL won't work.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 19:26 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

> And as for bumping glibc soname frequently as you have also proposed
Stop inventing nonsense. I never said that.

I said that new symbols should be added to libc. Want newer memcpy with different behavior? Add memcpy_fast or whatever, but don't touch the existing memcpy.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 20:42 UTC (Wed) by nybble41 (subscriber, #55106) [Link] (4 responses)

> Want newer memcpy with different behavior? Add memcpy_fast or whatever, but don't touch the existing memcpy.

No one is going to switch every memcpy call in their application to some platform-specific memcpy_fast which only exists in the first place because some non-standards-compliant software made incorrect assumptions about how memcpy handles overlapping regions. The only question is whether we let those non-compliant applications break (barring workarounds like LD_PRELOAD) or use symbol versioning to maintain compatibility with existing binaries.

Since you've been holding up Windows as a model of an operating system which doesn't touch existing ABIs, you should have a look at C:\Windows\AppPatch some time. Instead of letting applications simply declare which version of a symbol they need, Windows attempts to track the signatures of the various applications which have been broken by Windows ABI changes over time (for example, because they depended on undocumented internal behavior) and when it detects such an application it modifies the behavior of the system DLLs to maintain compatibility. The effect is exactly the same as symbol versioning—unless your application isn't listed in Microsoft's database, in which case it will just break.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 20:45 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> No one is going to switch every memcpy call in their application to some platform-specific memcpy_fast
#define memcpy memcpu_fast

> Since you've been holding up Windows as a model of an operating system which doesn't touch existing ABIs, you should have a look at C:\Windows\AppPatch some time.
Windows ABI covers a LOT of ground, way more than any versioned API in Linux.

The core API (kernel32/user32) hasn't changed in ages. It's almost perfectly compatible all the way to Win32S.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 22:01 UTC (Wed) by dtlin (subscriber, #36537) [Link] (2 responses)

> #define memcpy memcpu_fast
And how is that any different than symbol versioning? Compile with newer glibc, you link with memcpy_fast, and can't run on a system with older memcpy only.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 22:13 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

New memcpy_fast can be safely backported and the migration to memcpy_fast can be made over the period of many years. It's also easy to control:

#ifdef API_LEVEL >= 20180101
#define memcpy memcpy_fast
#endif

That's basically how migration to long files went without causing many issues.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 23:24 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> New memcpy_fast can be safely backported

There is nothing to prevent someone from backporting new versions of select glibc functions into older libraries. It just isn't necessary: If you were to transplant a modern build of glibc into RHEL6 all the old applications would still work, thanks to symbol versioning. Why maintain backports when you can just upgrade glibc?

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 16:37 UTC (Thu) by plugwash (subscriber, #29694) [Link] (2 responses)

> So symbol versioning meant that they didn't suddenly painfully break just because you upgraded your glibc.

Instead they suddenly painfully break when you next recompile them after you upgraded your glibc.

I'm not convinced rescuing old binaries (but not old sourcecode) from their own bugs is worth the price of being unable to run binaries built on new systems on older ones.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 18:52 UTC (Thu) by farnz (subscriber, #17727) [Link] (1 responses)

The advantage of the current system becomes clear when you have a split between "system administrators" (SAs) whose expertise does not lie in maintaining complex software, and "developers" (Ds) whose time is considered valuable by management. SAs don't build new binaries - they maintain systems on which binaries from Ds are run. Ds work with the latest and greatest systems at the time they build software, and can debug anything that breaks.

In the current system, SAs can upgrade glibc, knowing that binaries that Ds have stopped maintaining will still work. If a mission-critical binary has no Ds working on it, it'll keep giong until it's replaced by the new thing. If you take that away, SAs get told to not apply updates to systems, even critical security fixes, because the tradeoff between a potential security hole and a guaranteed failure of business-critical systems is rarely in favour of the security hole being fixed. OTOH, if the binary is being maintained (i.e. Ds are working on it - no other process produces new binaries), the Ds working on it will debug the failures with the new libraries.

In a good organisation, it's a non-issue either way round - you build from source regularly, and you debug anything that stops working after an update. System libraries advance whether you like it or not, and you just have to keep up. The question then is what sort of dysfunctional organisation is more common; one in which there's no-one available to debug an old codebase, but there are binaries that you can run, or one in which there's more issues with a rebuild from source being broken than old binaries suddenly ceasing to work?

C library system-call wrappers, or the lack thereof

Posted Nov 23, 2018 3:29 UTC (Fri) by j16sdiz (guest, #57302) [Link]

I think the current trend is to use docker / container that SAs cannot upgrade for security fix.
Application developers want full-controls.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 20:25 UTC (Wed) by rghetta (subscriber, #39444) [Link]

Well, I think is incorrect to call MSVCRT just 'just an ordinary application-level library'.
Almost everyone developing windows commercial applications use microsoft visual studio, thus msvcrt; you can use other compilers, but vstudio is THE standard way of building windows applications, and this is true even for many high profile open source project.
So while you can surely build something without touching msvcrt, in practice this is akin a system library, except you need to ship it yourself, because every compiler release has it version and using the wrong one lands you quickly in trouble.
And think of the manifest files, or the need to have the right version .net component dlls. BTW, to me this seems just another way to achieve semantic versioning, but more complex and obscure.
In fact, nowadays windows maintains huge directories full of copies of every dll it sees, plus copies of the installers, just to survive system updates (!), package installations and so on. A sort of system level flatpak.
To me neither linux nor windows are perfect, but on the whole I find the linux system more palatable.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 22:59 UTC (Wed) by wahern (guest, #37304) [Link] (4 responses)

MSVCRT is where routines like printf and malloc are defined; it's absolutely the closest analog to libc. And it's versioning strategy was absolutely at the root of DLL hell, causing problems like an inability to free a pointer from one DLL that had been malloc'd from another DLL, each DLL being linked to different versions of the C runtime. Microsoft admitted as much when it recently pivoted to long-term CRT stability. See, e.g., https://blogs.msdn.microsoft.com/vcblog/2014/06/10/the-gr... and https://blogs.msdn.microsoft.com/vcblog/2015/03/03/introd...

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 23:16 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> MSVCRT is where routines like printf and malloc are defined; it's absolutely the closest analog to libc.
No. The native Win32 API analog is HeapAlloc, which is interoperable across multiple libraries. MSVCRT is merely a particular runtime for one of the compilers.

You can use alternative compilers like mingw that don't use MSVCRT.

> Microsoft admitted as much when it recently pivoted to long-term CRT stability. See, e.g., https://blogs.msdn.microsoft.com/vcblog/2014/06/10/the-gr... and https://blogs.msdn.microsoft.com/vcblog/2015/03/03/introd...
Yeah, MS basically admitted that bumping the version numbers is a bad idea.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 0:09 UTC (Thu) by rschroev (subscriber, #4164) [Link]

> > MSVCRT is where routines like printf and malloc are defined; it's absolutely the closest analog to libc.
> No. The native Win32 API analog is HeapAlloc, which is interoperable across multiple libraries. MSVCRT is merely a particular runtime for one of the compilers

Perhaps, but it *is* the most used C runtime library on Windows, and in that aspect it is much like libc.

> You can use alternative compilers like mingw that don't use MSVCRT.

I'm afraid mingw *does* use MSVCRT. From http://mingw.org/: "MinGW provides a complete Open Source programming tool set which is suitable for the development of native MS-Windows applications, and which do not depend on any 3rd-party C-Runtime DLLs. (It does depend on a number of DLLs provided by Microsoft themselves, as components of the operating system; most notable among these is MSVCRT.DLL, the Microsoft C runtime library. Additionally, threaded applications must ship with a freely distributable thread support DLL, provided as part of MinGW itself)."

The Embarcadero family of development tools (previously Codegear, previously Borland) indeed do not depend on MSVCRT: they have their own C runtime.

I don't know what other compilers do.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 0:51 UTC (Thu) by wahern (guest, #37304) [Link]

My argument was ambiguous. Substitute MSVCRT with MSCV* in my previous post. The point is that A.DLL might link to MSVCRx.DLL and B.DLL might link to MSVCRy.DLL, which AFAIU would happen by default if compiling A.DLL and B.DLL with two different versions of Visual Studio without *explicitly* linking *both* to the same CRT, such as the system CRT, MSVCRT.DLL. In other words, the default behavior of Visual Studio is to link against the compiler-specific CRT, not the system CRT. DLL Hell comes from the fact that two vendors installing FOO.DLL into a global directory can easily break applications *even* *if* FOO.DLL was built from the exact same source code.

My point about malloc stands. If you know about DLL Hell you can change your programming style, such as using proprietary memory management interfaces. So what? The primary reason Microsoft finally relented on the issues of C99, C11, and CRT compatibility is because the majority of FOSS code did not use those alternative interfaces and was unlikely to ever be refactored to do so.

> Yeah, MS basically admitted that bumping the version numbers is a bad idea.

What it's belatedly admitting is that it needs to provide backward and cross compatibility between CRTs because if you support dynamic linking at all there's no avoiding the need to implement both the policies and mechanisms for reliably resolving inevitable conflicts; there are too many ways to be shot in the foot and were too few ways to avoid being shot in the foot.

One way to accomplish that is to settle on a singular internal implementation that *never* changes its ABI or data structures. That's obviously not viable because the various CRTs were *manifestly* too unstable. Another is to ensure that multiple runtimes will choose one internal implementation and proxy to it. One way to accomplish the latter is by using symbol versioning, which makes it trivial to provide backwards compatible stub functions that can forward requests to the newest implementation without having to recompile external, dependent code.

Of course, you could just resort to static linking, a la Go and Rust. But dynamic linking came about for legitimate reasons, and they weren't all based on storage space concerns. So if you support dynamic linking, good vendors can and should provide all the tools necessary for writing robust libraries. Symbol versioning in some incarnation is an obvious solution. That these mechanisms can be misused and abused is beside the point.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 7:20 UTC (Thu) by smurf (subscriber, #17840) [Link]

> Yeah, MS basically admitted that bumping the version numbers is a bad idea.

In Linux terms, they admitted that bumping the soname of a core library (i.e. any library used by other libraries) is a bad idea. I am not aware of anybody who disagrees here.

These articles do not state how they intend to deal with possibly-incompatible updates to built-ins. I have heard that they use a blacklist of programs which rely on some buggy behavior or other – which only works if you're buddies with M$, which open source developers generally are not.

This reminds me of the credits of Switcher (an early application-switcher for the Macintosh) which contained, somewhere at the bottom:

Incompatibilities: Our developers
Special Effects: Microsoft

C library system-call wrappers, or the lack thereof

Posted Nov 29, 2018 23:05 UTC (Thu) by njs (subscriber, #40338) [Link] (2 responses)

Windows recently switched to providing a single stable C runtime (the "UCRT") with the goal of eventually getting all programs to use it, just like glibc. And the UCRT uses symbol versioning just like glibc.

The implementation is a little convoluted, because there's no first-class symbol versioning in PE, but the effect is the same.

On Windows, symbols are looked up by (dll name, symbol name) pairs. Of course, C source files only include the symbol name, so when you're building, the linker needs to figure out what dll name to use for each symbol name. This is done using '.lib' files, which tell the linker which dll name to use for each symbol.

So the way the UCRT works is: they provide a 'ucrt.lib', which tells you the dll name for functions like memset. Currently, it says that memset lives in the file "api-ms-win-crt-string-l1-1-0.dll". (Notice that name has a version number in it.) So when you build your program, you end up with a binary that references the symbol (api-ms-win-crt-string-l1-1-0.dll, memset).

This file does not actually exist. But when you run the program, there's some sort of magic in the loader that recognizes that pair, and gives you the correct version of memset.

If they need to make a backwards-incompatible change to memset, they'll bump the version number in the dll name, and then update 'ucrt.lib' so that the next time you compile your program, it'll start using to the new version of memset by default.

AFAICT the end result is completely identical to glibc's symvers.

C library system-call wrappers, or the lack thereof

Posted Dec 4, 2018 15:16 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

This file does not actually exist. But when you run the program, there's some sort of magic in the loader that recognizes that pair, and gives you the correct version of memset.
If they need to make a backwards-incompatible change to memset, they'll bump the version number in the dll name, and then update 'ucrt.lib' so that the next time you compile your program, it'll start using to the new version of memset by default.
AFAICT the end result is completely identical to glibc's symvers.

... but nothing other than the C library can use it, because it required special-purpose hacks in the loader. That's disgusting.

C library system-call wrappers, or the lack thereof

Posted Dec 5, 2018 9:35 UTC (Wed) by njs (subscriber, #40338) [Link]

If it makes you feel any better, I believe that the magic is purely an optimization, and you *could* create a set of real dlls that produced the same results.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 22:54 UTC (Tue) by madscientist (subscriber, #16861) [Link]

IMO it's not worthwhile arguing with Cyberax about this particular hobbyhorse. For many years I've been rebutting this claim that he makes every so often, and pointing out that in fact I've been shipping commercial, production systems that will run on any system with an environment at or newer than Red Hat EL 6.3 (by this I mean the glibc, etc. versions that came with that system, it doesn't have to be Red Hat: works fine with Ubuntu 12.04LTS through 18.04LTS as well) and it works just great. Note RHEL 6.3 is just my current choice for an oldest supported system: I've done the same with RHEL 5.x etc. in the past.

I can build this software on any Linux distribution (we regularly upgrade our build farm servers and we never worry about compiler or system library compatibility, and we don't have to mandate what distribution developers must use), and I don't use any containers or virtual machines, either. All I need is an unpacked version of the RHEL 6.3 system headers and libraries (I extract these from RPMs and put them into Git but there're plenty of equivalent ways to do it), and some simple flags added to the compiler/linker.

It definitely works, with no hassles. This is all possible precisely _because_ libc uses symbol versioning.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 18:44 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link] (31 responses)

I have a simple program that uses ages-old epoll/read/print functions. Yet I can't compile it on recent distributions and run on RHEL6. This makes no sense whatsoever.

You believe this makes no sense.

And the people who introduced versioned symbols think this belief is mistaken: The compiled program was compiled (and supposedly tested) on a system where at least one glibc function needed by it has a behaviour which is not compatible with the behaviour of the glibc function of the same name on RHEL6. Hence, the program can't run there because a glibc version it's compatible with is not available. For this scenario, that's no different from linking with a library whose soname was libc.so.7 and trying to run the program on a system with a libc.so.6.

But there are two other scenarios

A program compiled on the newer system which doesn't use symbols not available on the older one can run on the older system. This wouldn't be possible if the soname had changed.
All programs compiled on the older system can run on the newer system without being recompiled. This wouldn't be possible if the soname had changed.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 18:49 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (30 responses)

> And the people who introduced versioned symbols think this belief is mistaken: The compiled program was compiled (and supposedly tested) on a system where at least one glibc function needed by it has a behaviour which is not compatible with the behaviour of the glibc function of the same name on RHEL6.
This is a fucking libc. The functions that I'm using have not materially changed for 15 years or so.

> A program compiled on the newer system which doesn't use symbols not available on the older one can run on the older system. This wouldn't be possible if the soname had changed.
You don't change the user-visible behavior of functions in THE FREAKING LIBC.

You can add NEW functions, and in this case software compiled on newer libcs will just fail to link with older libcs lacking the new functions.

> All programs compiled on the older system can run on the newer system without being recompiled. This wouldn't be possible if the soname had changed.
Wrong. Even in your model.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 19:09 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link] (29 responses)

> This is a fucking libc. The functions that I'm using have not materially changed for 15 years or so.

If you think so, you should determine which symbol was wrongly flagged as different and file a bug against glibc.

> You don't change the user-visible behavior of functions in THE FREAKING LIBC.

That's a policy you may consider sensible but the people who work "on the freaking libc" apparently don't. Eg, bugs are user-visible behaviour. Some people want them fixed.

Again, each individual change may or may not be appropriate and/or people might have different opinions about this. Without a specific example, this cannot be assessed.

>> All programs compiled on the older system can run on the newer system without being recompiled. This wouldn't be possible if the soname had changed.
>Wrong. Even in your model.

This is not "my model" as I'm not the GNU C library but that's how the model of the GNU C library is supposed to work: A program linked against version a.b.c of symbol frxblz[*] will use version a.b.c of symbol frxblz even if a newer version d.e.f is also available. Which wouldn't be possible if the soname had been changed instead.

Whether or not this works in a specific case is a different question. But that's what it's supposed to provide.

[*] coming up with a combination of letters which isn't some sort of possibly offensive insult in American English is surprisingly difficult :->

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 19:16 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (28 responses)

> That's a policy you may consider sensible but the people who work "on the freaking libc" apparently don't. Eg, bugs are user-visible behaviour. Some people want them fixed.
That's because glibc is written by very, very misguided people. Seriously.

There are non-idiotic libcs out there and somehow they work just fine without breaking stuff. Then we have Linux itself that somehow manages to preserve even more complicated ABI than glibc's.

> This is not "my model" as I'm not the GNU C library but that's how the model of the GNU C library is supposed to work: A program linked against version a.b.c of symbol frxblz[*] will use version a.b.c of symbol frxblz even if a newer version d.e.f is also available. Which wouldn't be possible if the soname had been changed instead.
You can add new symbols if you must. Windows does this (WriteFile, WriteFileEx, etc). But nothing requires you to change the user-visible behavior of existing symbols.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 20:06 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (27 responses)

> You can add new symbols if you must. Windows does this (WriteFile, WriteFileEx, etc).

That's just ad-hoc symbol versioning. Programs which use the new symbols still won't run on older systems. It's strictly worse than what glibc does.

As for your "simple program that uses ages-old epoll/read/print functions", the symbol versions for these APIs in glibc 2.27 are as follows:

epoll_create, epoll_wait, epoll_ctl - GLIBC_2.3.2
read, printf - GLIBC_2.2.5

That means any application using *only* these symbols would be able to run without modification on RHEL-3.9 (glibc 2.3.2). RHEL6 should not be a problem.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 21:18 UTC (Tue) by ldarby (guest, #41318) [Link] (26 responses)

The common problem that I suspect Cyberax is actually moaning about is if software uses other calls like memcpy() which on centos 7 gets a version of GLIBC_2.14:

readelf -a foo | grep memcpy
000000601020 000300000007 R_X86_64_JUMP_SLO 0000000000000000 memcpy@GLIBC_2.14 + 0
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND memcpy@GLIBC_2.14 (3)
55: 0000000000000000 0 FUNC GLOBAL DEFAULT UND memcpy@@GLIBC_2.14

and this doesn't work on centos 6:

ldd ./foo
./foo: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./foo)

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 22:12 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link] (22 responses)

That was eight years ago.

It's also unquestionable that increasing the symbol version was justified as this was a change which did cause "really existing software" to break.

But this is really a problem with no good solution, only different tradeoffs which all end up being detrimental to someone. Either no code which was released must ever be changed in a new version as applications will depend on undocumented and even on unintentional properties of it. Or loads of existing binaries suddenly break in interesting ways and there's the very real possibilty that "recompile the code" is not an option. Or people developing new code who can recompile that because they have both the code and the necessary tools/ environment available must restrict themselves to actually using systems said to be supported.

As to the change in question: My opinion on this is that this here

void *memcpy(void *dst, void *src, size_t n)
{
    char *d, *s;

    d = dst;
    s = src;
    while (n) --n, d[n] = s[n];
    
    return dst;
}

is a nice, simple and portable implementation of memcpy and instead of terror-optimizing this algorithm using whatever "latest and greatest" CPU support for absurdly large block memory copies happens to be available, one should stick to something like this as default implementation and leave it to people to whom absurdly large block memory copies are a real performance problem to figure out how they can either avoid these or speed them up.

But that's just my opinion and certainly not a universal consensus.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 22:42 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (16 responses)

> It's also unquestionable that increasing the symbol version was justified as this was a change which did cause "really existing software" to break.
This just means you need to create another function - "memcpy_fast" or whatever.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 0:19 UTC (Wed) by nybble41 (subscriber, #55106) [Link] (15 responses)

> This just means you need to create another function - "memcpy_fast" or whatever.

There is already a function with the semantics of "memcpy_fast" included in the C standards. It's called "memcpy". There is approximately zero chance that a new, non-portable alternative to memcpy() would have been introduced just to maintain compatibility with applications abusing memcpy() in situations where they should have employed memmove().

Even if they did take that approach programs would need to be modified to use the new API, which would be a colossal undertaking as well as a major step backward in terms of portability. The main performance benefit of eliminating the extra branch required for memmove() semantics comes from the myriad *small* memory copies performed throughout any non-trivial application. The change would be mostly pointless if it didn't automatically encompass all existing memcpy() users.

Without symbol versioning the most likely resolution would have been to simply let the non-compliant programs break.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 0:33 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (14 responses)

> There is already a function with the semantics of "memcpy_fast" included in the C standards. It's called "memcpy". There is approximately zero chance that a new, non-portable alternative to memcpy() would have been introduced just to maintain compatibility with applications abusing memcpy() in situations where they should have employed memmove().
This is EXACTLY why a new function should have been introduced. Handling of overlapping became an implicit part of the memcpy interface. It shouldn't have but it did.

So the ONLY correct choice is to stick with it and improve the language standard to include a new function that explicitly defines the handling of overlaps.

> Even if they did take that approach programs would need to be modified to use the new API,
And that's fine. Nobody died because one optimization of memcpy became impossible.

> which would be a colossal undertaking as well as a major step backward in terms of portability.
Nonsense. Other systems either have the implicit overlap handling guarantees of memcpy or the software is already non-portable to them.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 12:51 UTC (Wed) by nix (subscriber, #2304) [Link] (13 responses)

> This is EXACTLY why a new function should have been introduced. Handling of overlapping became an implicit part of the memcpy interface. It shouldn't have but it did.

Ah, but introducing new functions is *also* a compatibility problem, because there is only one namespace for symbols, and now you will collide, either at compile time or runtime, with other programs already using the function you chose. (When getline() was introduced, it broke compilation of a *lot*. But only compilation, because glibc was not written by idiots: at runtime it still worked, where your proposals would fail utterly. It appears you don't know the ELF symbol resolution rules.)

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 19:00 UTC (Wed) by quotemstr (subscriber, #45331) [Link] (12 responses)

We should move to a two-level namespace like macOS and Windows use. Instead of importing symbol X, you import symbol X *from some specific library* Y. This way, you only need (X, Y) pairs to be unique, not X generally. This approach resolves a lot of weird symbol conflict issues. It breaks LD_PRELOAD-style symbol interposition, but I think that's a misfeature anyway.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 21:07 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (10 responses)

Note that for things like plugins that can be loaded by multiple application binaries (e.g., Python modules), you should not directly link to an implementation of the plugin API. In the case of Python, the application provides the core Python API symbols and the module should just expect them to exist at runtime. Otherwise you have the problem of a Python module being compiled against the macOS libPython.dylib and then it's useless for macports' Python since then you're mixing Python interpreters. Fun for everyone involved when you have to recompile Python modules to use a different application.

The problem is spelled out here:

https://blog.tim-smith.us/2015/09/python-extension-module...

where it is hacked around by just using `-undefined dynamic_lookup` and not linking libPython.dylib so that it's just found at runtime. But, this also means that all missing symbols are ignored until runtime. I have this patch:

https://github.com/mathstuf/ld64/commit/4eebe0c07e8ab706e...

which is a better fix, but I don't know how to convince Xcode to build the damn project and I need to figure out how/where to send the patch to get it reviewed by Apple.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 10:51 UTC (Thu) by lgerbarg (guest, #57988) [Link] (9 responses)

It is nice to see someone working on ld64 patches, but ld64 can already build a bundle with the semantics you are trying to achieve. From the ld(1) man page on macOS:

-bundle_loader executable
This specifies the executable that will be loading the bundle output file being linked. Undefined symbols from the bundle are checked
against the specified executable like it was one of the dynamic libraries the bundle was linked with.

This will find exported symbols in the main executable, encode the binds such that dyld will apply them to whatever the main executable in the current process is. This has the same runtime semantics that the linked patch is trying to achieve, except in the error case that occurs if a newer version of the interpreter accidentally removes an exported symbol that you bundle needs. Using -bundle_loader it will fail immediately after searching the main executable, instead of continuing to search other images in the process. Given that you know the symbols are supposed to be provided by the interpreter that is probably a preferable behavior.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 18:15 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (8 responses)

Hmm. I'll have to experiment with that. Thanks for the pointer.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 19:07 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (7 responses)

Using `-bundle_loader` doesn't work (at least as you've said):

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -Dmod_EXPORTS -I/System/Library/Frameworks/Python.framework/Versions/2.7/Headers -fPIC -MD -MT CMakeFiles/mod.dir/mod.c.o -MF CMakeFiles/mod.dir/mod.c.o.d -o CMakeFiles/mod.dir/mod.c.o -c ../mod.c

makes this:

$ otool -L mod.so
mod.so:
/System/Library/Frameworks/Python.framework/Versions/2.7/Python (compatibility version 2.7.0, current version 2.7.10)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1213.0.0)

which still references a specific Python and is wrong anyways:

$ /opt/local/bin/python2.7 -m mod # Use a MacPorts Python
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python: No code object available for mod

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 23:25 UTC (Thu) by lgerbarg (guest, #57988) [Link] (6 responses)

It definitely works the way I described, and has since 10.1, it is how plugins for apps like photoshop are built and usable across multiple revisions of those app. I can’t tell you exactly what is going on in this case, because the line you pasted is not the linker invocation, it is a compiler invocation (it creates a .o from a .c file, it does create the final MH_BUNDLE, and does not actually have the -bundle_loader flag).

If you can find the actual linker invocation (either the driver with -W,-bundle_loader, or the actual call through to ld64) I can probably tell you what’s going wrong with the invocation, though off the top of my head I ha w no idea out how to get CMake to pass through the correct flags.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 23:33 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (5 responses)

Oops, sorry. I did mean to grab the link line. Neither of these work (both come back with the same "no code object" error message):

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -bundle -Wl,-headerpad_max_install_names -o mod.so CMakeFiles/mod.dir/mod.c.o -bundle_loader /System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -bundle -Wl,-headerpad_max_install_names -o mod.so CMakeFiles/mod.dir/mod.c.o -Wl,-bundle_loader,/System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib

C library system-call wrappers, or the lack thereof

Posted Nov 16, 2018 21:53 UTC (Fri) by lgerbarg (guest, #57988) [Link] (4 responses)

No problem. You are pointing -bundle_loader at a dylib, it should be pointed at an MH_EXECUTABLE. Maybe I misunderstood, but I thought you said the symbols were export from the python executable itself?

C library system-call wrappers, or the lack thereof

Posted Nov 16, 2018 22:06 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (3 responses)

The symbols are provided to the module by loading a libpython.dylib first. The executable has barely any symbols in it at all. Giving the executable complains that the symbols aren't defined.

C library system-call wrappers, or the lack thereof

Posted Nov 16, 2018 22:08 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (2 responses)

And it is libpython.dylib that does the dlopen, not the executable. I guess the flag is meant for static compilation?

C library system-call wrappers, or the lack thereof

Posted Nov 18, 2018 21:12 UTC (Sun) by lgerbarg (guest, #57988) [Link] (1 responses)

The flag is meant to allow plugins to refer back to symbols exported by an application in order to support a plugin API. Where the dlopen() happens doesn't really impact it, but the fact that theAPI is exported by a dylib rather than the interpreter itself is what complicates the situation. I could go into a bunch of historical reasons for why it behaves that way, but the short answer is that it dates back to classic macOS and the plugin mechanisms used there.

Now that I understand what you are trying to do a bit more clearly (sorry about the confusion) -bundle_loader is not appropriate unless you can re-export the symbols from the main executable. I think that would be the best option from a technical perspective, but it is probably unreasonable since it would require changes and back ports to all of the python interpreters.

Assuming that you need to get this to work without making changes to python itself. I think you should pass ld "-undefined dynamic_lookup." It is kind of gross, but it should work for your use case. If there is ever a desire to improve this behavior for future pythons there changes to the interpreter that would make modules work better:

1) Reexporting the symbols from libpython out of the python interpreter itself and using -bundle_loader
or
2) Renaming or symlinking the library exporting the API to have an unversioned name (like "libpython.dylib") and then having python set an LC_RPATH to the directory containing the dylib/symlink. Then modules could link to them via "@rpath/libpython.dylib" .

C library system-call wrappers, or the lack thereof

Posted Nov 20, 2018 15:16 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

Hmm. OK. So how is this meant to work with something like ParaView which supports different applications (for different use cases) and uses shared libraries to hold the actual API (and I mean that there can be 100+ of them)? What's the best way to reexport symbols of all linked libraries from an executable?

> I think you should pass ld "-undefined dynamic_lookup." It is kind of gross

That works for today. And projects which don't only target macOS are almost never going to do what seems like a bad legacy behavior. The "gross" part of this solution is what led to the patch to ld64.

> Renaming or symlinking the library exporting the API to have an unversioned name (like "libpython.dylib") and then having python set an LC_RPATH to the directory containing the dylib/symlink. Then modules could link to them via "@rpath/libpython.dylib" .

The name of the file doesn't matter. The binary must be edited using `install_name_tool -id` to change what gets embedded in the linking binary. And RPATH on macOS is awful. It only applies if explicitly requested and if something says "use me via rpath", no tools (I know of) add the paths automatically and they must be manually added.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 16:13 UTC (Thu) by nix (subscriber, #2304) [Link]

You can do that already with DT_GROUP -- but that wouldn't help in this case, because the conflict was often at *compile-time*: two C identifiers named getline() with different semantics (sometimes differing prototypes, sometimes one of them wasn't a function at all but a variable or something).

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 0:29 UTC (Wed) by ldarby (guest, #41318) [Link] (3 responses)

> It's also unquestionable that increasing the symbol version was justified as this was a change which did cause "really existing software" to break.

Not sure I agree with that. If Linus "We don't break userspace" Torvalds had won the argument over Ulrich "buggy programs cannot be allowed to prevent a correct implementation" Drepper (sorry, paraphrasing), they would have just made a 2nd change of aliasing memcpy to memmove, un-breaking the buggy programs and not had any of this version hassle to deal with now.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 16:38 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link] (2 responses)

Again, a solution to this is surprisingly simple: Create a file with the following content:

#include <sys/types.h>

void *memmove(void *, void *, size_t);
int puts(char *);

void *memcpy(void *d, void *s, size_t n)
{
    puts("honk");
    return memmove(d, s, n);
}

compile it with

gcc -fpic -shared -o x.so x.c

and use it together with another compiled program calling memcpy via LD_PRELOAD. Voila --- memcpy aliased to memmove without penalizing code not relying on undefined behaviour (the puts obviously just exists to demonstrate that the function is indeed being called).

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 19:54 UTC (Wed) by ldarby (guest, #41318) [Link] (1 responses)

Not sure what you're trying to achieve here. That fixes niether the "GLIBC_2.14' not found" error or the general case of random buggy software that's used by non-technical users, who wouldn't have a damn clue what this "surprisingly simple" soluition even means.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 20:01 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link]

This would transparently cause such a "random buggy binary" to call memmove instead of memcpy, hence sidestepping the issue when it exists instead of forcing all users of memcpy to call memmove instead.

The "non-technical users" also "don't have a damn clue" how the other software they're using works. But as they're just using it and not developing it, this should be ok, don't you think so?

C library system-call wrappers, or the lack thereof

Posted Nov 18, 2018 10:07 UTC (Sun) by roblucid (guest, #48964) [Link]

But the idea of simple C default universal implementation, defeats the idea of a library providing added value, by superior implementation or simply because it can be compiled and installed knowing what exact hardware and OS it is running on.

Applications were in past often shared over a network, so the executables ran on different CPUs and OS versions. Even in Linux distros would install differing versions of glibc, to allow the library to use Pentium 2 features whilst the packages installed were universal i386.

From my experience, this problem raised is moot, the error experienced is simply due to an improper build environment which needs to produce executables linked for the oldest system. There was on OpenSuSE a way of installing LCD library versions to build packages against. Later a build service could build native packages for various systems making it unnecessary to trouble with that locally.

If an application stops working with a new version of a library it's typically a bug relying on undocumented features. The mentioned example of applications unportably calling memcpy, rather than memmove is such a case. Where libraries change the ABI, their names change eg) KDE3 to KDE4 or big changes to GTK. Calling functions not present in older target systems is again a portability error.

memcpy re-versioning

Posted Nov 14, 2018 12:41 UTC (Wed) by smurf (subscriber, #17840) [Link] (2 responses)

Happiness.

IMHO, any program that depends on *documented* nonsense like overlapping ranges in memcpy() (the manpage says "must not", dammit) deserves to die a fiery death and should suffer a segfault (triggered by a check in memcpy), instead of saddling everybody with a symbol version uptick that blocks backwards compatibility.

memcpy re-versioning

Posted Nov 14, 2018 16:48 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> should suffer a segfault (triggered by a check in memcpy)

That check would incur just as much overhead as aliasing memcpy to memmove, thus eliminating the undefined behavior altogether. The performance advantage of memcpy comes from *not* checking for overlapping regions.

It could be a useful debug aid, though—perhaps a validating memcpy implementation could be substituted during testing with LD_PRELOAD.

memcpy re-versioning

Posted Nov 15, 2018 16:03 UTC (Thu) by nix (subscriber, #2304) [Link]

The dependency is almost always unintentional: the author of some function that calls memcpy() doesn't realise that sometimes it accepts pointers that are in aliased ranges, and the caller doesn't realise that the function they're calling calls memcpy().

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 18:55 UTC (Tue) by mm7323 (subscriber, #87386) [Link]

I have a simple program that uses ages-old epoll/read/print functions. Yet I can't compile it on recent distributions and run on RHEL6. This makes no sense whatsoever.

I completely agree. We can compile code to older standards (c89, c99), even switch the architecture (-m32). But as you say, building a simple program with age old library calls to an earlier standard just isn't allowed, even though the same code can be built on an older distro and then ran on something newer.

In the past this promoted static linking, bad for security, until that was made hard too by the removal of static packages - and nss.

So now we have Docker, Snaps, Flatpak etal as a completely crazy way around this (among other benefits). I've often thought that the rise of VMs are symptomatic of the failure of Operatimg Systems providing effective multi-process separation. In a similar vein the rise of cheap containerisation is sympotomic of the failure of distros and libraries to offer compatible execution environments.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 15:43 UTC (Wed) by lambda (subscriber, #40735) [Link] (1 responses)

I have a simple program that uses ages-old epoll/read/print functions. Yet I can't compile it on recent distributions and run on RHEL6. This makes no sense whatsoever.

The problem is not symbol versioning, it's just that the linker always opts to use the latest version of symbols and doesn't give you a way to opt in to a particular compatibility level. If you could opt in to using older symbols when you want to compile on a newer distro and have compatibility with RHEL6, then you could have your cake and eat it too.

That's why it needs to live in a separate daemon.

This is definitely true. NSS as dynamic libraries that get called by your process when it's doing name resolution or calling getent was a terrible idea, and really needs to be replaced.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 15:04 UTC (Thu) by cortana (subscriber, #24596) [Link]

> If you could opt in to using older symbols when you want to compile on a newer distro and have compatibility with RHEL6, then you could have your cake and eat it too.

I think you can do this with the .symver directive?

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 14:56 UTC (Thu) by cortana (subscriber, #24596) [Link]

I have a simple program that uses ages-old epoll/read/print functions.

If you're missing the declaration of these functions, copy them into your source code.

If you're unable to link, use .symver to tell the linker that "when I say epoll_create, give me epoll_create1@GLIBC_2.9 rather than the default".

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 14:58 UTC (Tue) by hkario (subscriber, #94864) [Link]

It's insane to think that libraries provide forward compatibility. No library I know of does that, and glibc definitely does not provide it.

You can't run on 16.04 stuff compiled on 18.04 because it is not something that is safe, even if it works. And more often than not, that breakage will be subtle and hidden deep, if the application actually runs.

Not talking hypotheticals here, I know of examples in which use of old library, when compiled against new API, would result in uninitialised pointer dereference.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 21:28 UTC (Tue) by mchapman (subscriber, #66589) [Link] (17 responses)

> Because your binaries stop being backwards compatible with libc. I.e. if you compile stuff on Ubuntu 18.04, you can't run it on Ubuntu 16.04 even if you don't use anything advanced from libc.

It seems to me that this could be solved by linker config saying something along the lines of "hide all glibc symbols whose version is >= GLIBC_x.y.z". That way you could compile a program as if it you had an older glibc, yet still have it work on that glibc plus all following ones (as old versions of symbols should never be dropped).

So I don't see this as a problem with symbol versioning. I see it more as a problem that it's not too easy to specify exactly which version of a symbol you want when you're linking.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 22:48 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> "hide all glibc symbols whose version is >= GLIBC_x.y.z"
It won't work unless you do this with all the dependent libraries.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 7:01 UTC (Wed) by mjthayer (guest, #39183) [Link]

It works for me (though I handle each problem symbol - fortunately there are not too many - case by case). The dependent libraries are generally dynamically linked, so the versions on your target system do not use the new symbols. If you are building the dependent libraries yourself, no additional issue. I seem to recall that you can handle static libraries too somehow, as the symbol version is fixed in the final link, but I will not re-check the details of that now, so I might be wrong.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 12:44 UTC (Wed) by smurf (subscriber, #17840) [Link] (2 responses)

That unfortunately does not work in the general case, because some C macros-or-whatever require the newer version. Simply restricting the linker is too late.

If you want to be able to run your program on old systems, install the development packages from an old system in a chroot. It's simple, really.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 16:10 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link] (1 responses)

It should be sufficient to have the old headers available and select a "library compatibility version" one wants to use. This whole issue really strikes me as more of a problem of poor/ nonexistant support for using non-default symbol versions. And that's not even so poor. Assuming that the file the C library resides in is /lib/x86_64-linux-gnu/libc.so.6, the command below

readelf --dyn-syms /lib/x86_64-linux-gnu/libc.so.6 | perl -ne '/(\S+[^@])\@GLIBC_2\.2\.5$/ and printf("__asm__(\".symver %s, %s\@GLIBC_2.2.5\");\n", $1, $1)' >glibc-2.2.5.i

creates a file glibc-2.2.5.i which can be included into a .c source file and causes that to use the 2.2.5 version of all symbols whose default version differs from 2.2.5. Compared to the mountains of complaints about this on the internet, this isn't much (code) text :-).

C library system-call wrappers, or the lack thereof

Posted Nov 18, 2018 11:35 UTC (Sun) by rweikusat2 (subscriber, #117920) [Link]

Coming to think of this, this approach is too simplistic: The older library could use an older version as default symbol which could have been replaced by a newer version in a later library. One would need to use a list of all symbols in the older library to generate the include file.

The output of

readelf --dyn-syms /lib/x86_64-linux-gnu/libc.so.6

could be piped into

my (%v, $v, $n, @l);

while (<>) {
    /^ *\d+:/ or next;
    
    @l = split;
    
    $v = $l[7];
    $v =~ s/.*@// or next;
    next if $v eq 'GLIBC_PRIVATE';

    ($n) = $l[7] =~ /([^@]+)@/;
    push(@{$v{$v}}, $n);
}

for $v (sort keys %v) {
    print("__asm__ (\".symver $_, $_\@$v\");\n") for @{$v{$v}};
}

to accomplish this.

Just in case someone doesn't recognize this: This is written in a much maligned programming language called Perl. But it's also very useful.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 15:22 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (11 responses)

Symbol versions are just arbitrary strings. There's no order associated with them. You'd rather want a whitelist with priority order for the symbol versions to search.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 15:34 UTC (Wed) by TomH (subscriber, #56149) [Link] (10 responses)

That's not exactly true - each version has a parent version (or dependency) that it inherits from though I'm not clear if anything actually uses that - one source I just found suggests it is only there because Solaris had it and that linux doesn't actually use it.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 15:58 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (9 responses)

Huh. Looking in the library, there's a `.gnu.version_d` section which reifies the linker script hierarchy. So yes, there is an order then.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 16:41 UTC (Wed) by TomH (subscriber, #56149) [Link] (8 responses)

Yes it gets encoded in the ELF but I'm not sure it has any practical effect. I believe the linker just links to the undecorated version of the symbol which points at the most recent version and then includes that version as the required version for the symbol in the linked program. So it doesn't actually have to walk back up the hierarchy to find a match. To go back to the memcpy example, the static symbol table has:

000000000008bc80 l   i   .text	00000000000000c2              memcpy
000000000008bc80 g   i   .text	00000000000000c2              memcpy@@GLIBC_2.14
00000000000a4b30 g     F .text	000000000000002c              memcpy@GLIBC_2.2.5

So the undecorated memcpy is at the same address as the most recent (and default) version which there is also an older version. The dynamic symbol table used for resolving references at run time is missing the undecorated version:

000000000008bc80 g   iD  .text	00000000000000c2  GLIBC_2.14  memcpy
00000000000a4b30 g    DF .text	000000000000002c (GLIBC_2.2.5) memcpy

Because by the time that is used the linker has marked the reference in the executable with the required version.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 20:55 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (7 responses)

This subthread is in the context of "hide all glibc symbols whose version is >= GLIBC_x.y.z" which is possible using that table because it defines a ordering on the symbol versions. This would allow the linker to have a `--max-symbol-version GLIBC_2.2.5` flag and if a symbol has a version which is greater than that, it either finds an older one or errors that the function is not available. Note that symbol versions which don't have a relation to any given max-symbol-version would be allowed (e.g., LIBSTDCXX_3.4.25).

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 13:22 UTC (Thu) by rweikusat2 (subscriber, #117920) [Link] (6 responses)

I don't think the idea of hard-coding a particular naming convention used by glibc in the linker would fly.

But this really isn't difficult to deal with at all if the library uses symbol versions consistently: Assuming the version symbol associated with the library version supposed to be targetted is YYZ. The dynamic symbol table of some newer version of the library will contain symbols labelled as @YYZ if and only if a newer, incompatible version exists (the default symbol version is denoted with @@). Hence, it's (as shown above) pretty trivial to create an include file fixing all changed symbols at version YYZ automatically.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 16:10 UTC (Thu) by nix (subscriber, #2304) [Link] (5 responses)

> I don't think the idea of hard-coding a particular naming convention used by glibc in the linker would fly.

You don't need to do that. You simply walk the symbol version table and note which symbol versions are ordered later than the requested maximum: then discard any symbol whose version is in that blacklisted set, and pick the latest version which is not blacklisted (determined from the same ordering). There's no dependency on the naming convention used by glibc at all in that, just a use of the already-recorded symbol version hierarchy.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 17:27 UTC (Thu) by rweikusat2 (subscriber, #117920) [Link] (4 responses)

There is no such ordering: Version tags are just text.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 18:14 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (3 responses)

On their own, true, but there is a section of binaries which contains the hierarchy of symbol versions in the library. That is what gives the ordering, not the text of the symbol version itself.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 18:49 UTC (Thu) by rweikusat2 (subscriber, #117920) [Link] (2 responses)

Version nodes don't need to have any kind of hierarchy.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 18:57 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

Need? No. But glibc does provide it and I don't see why it couldn't be used.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 22:22 UTC (Thu) by nix (subscriber, #2304) [Link]

Quite. If they don't provide a hierarchy in their version script, I think it's reasonable to assume that there is no ordering over the versions (which is a legitimate use of symbol versioning, though glibc doesn't use it where it should: GLIBC_PRIVATE is declared to depend upon the highest version currently in use rather than standing alone as it probably should.)

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 1:55 UTC (Tue) by fartman (guest, #128226) [Link] (15 responses)

For NSS, I've thought of two things:

*) There's a NSSS project already from Laurent that provides NSS compatibility for it https://github.com/skarnet/nsss

*) Another is using FUSE modules that provide a NSS API to existing modules and on read requests show a merged version of everything in the specific files (for example, a fuse-passwd could be mounted on top of the regular /etc/passwd and any requests to read() mean it will supply a merged version of all user database and then the libc function just parses from it as usual).

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 2:10 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

NSSS is the correct approach. FUSE is not going to work, a lot of NSS modules can't enumerate entries, only resolve them.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 11:36 UTC (Tue) by nix (subscriber, #2304) [Link]

Even there, you could expose such databases as directories with the execute bit turned off. (But... nothing would know how to read such things, and you could indeed not export it as /etc/passwd, so you lose backward-compatibility, and while very Plan 9, I'm not sure that 'could' in this case corresponds to 'should'.)

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 15:41 UTC (Tue) by quotemstr (subscriber, #45331) [Link] (1 responses)

What if my user database contains tens of thousands of entries? Why force programs to read (and parse!) all of those entries just to maintain the fiction that the user database is wholly described by a text file called /etc/passwd? NSS's goal is the right one. I'd implement NSS out-of-process instead of loading a DSO, but that's an implementation detail. A virtual text file is absolutely the wrong approach.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 16:28 UTC (Tue) by smurf (subscriber, #17840) [Link]

> What if my user database contains tens of thousands of entries?

That's benign. The really interesting part is when your user database is not enumerable, e.g. because you don't have the rights to do so.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 0:34 UTC (Wed) by sbaugh (guest, #103291) [Link] (10 responses)

glibc already provides an IPC-based API for NSS requests: nscd. If you disable caching, it's exactly what you want. nscd is backwards compatible to 1998 and is rock-solid. A number of platforms which allow shipping packages using a different glibc version from the "base platform" are already using nscd to achieve this separation: Nix, Guix, and some container platforms.

If you are enthusiastic about replacing NSS with an IPC-based API, please, promote use of nscd by default in Debian and Fedora! IMO, that's the only realistic route to achieve the goal.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 3:09 UTC (Wed) by quotemstr (subscriber, #45331) [Link]

Thanks for the reminder. I'd completely forgotten about nscd! I agree that deprecating all in-process NSS modules except for the nscd client would be a great path forward.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 6:16 UTC (Wed) by drag (guest, #31333) [Link] (8 responses)

> 1998 and is rock-solid.

That's news to me.

I probably had to restart nscd thousands of times across hundreds of machines. It was flakier then ntpd, which is saying a lot.

Nobody should be installing nscd anymore. It has always been terrible and it's always going to be terrible.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 12:42 UTC (Wed) by nix (subscriber, #2304) [Link] (7 responses)

It's been running nonstop for me without incident for many years (I use it to reduce the overhead of a big /etc/services so I only need to parse it once).

It is clearly not unreliable for many people, given that glibc upstream has been talking about replacing nss with nscd by default for some time now.

I think you need to investigate more....

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 12:53 UTC (Wed) by smurf (subscriber, #17840) [Link] (6 responses)

It's been flaky in the past. I wouldn't use the nscd from 199x in any kind of production environment.

Today? no problem IMHO, and it does speed things up (a lot, for some installations).

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 15:25 UTC (Wed) by zdzichu (subscriber, #17118) [Link] (3 responses)

Except when it doesn't work, which is a daily occurrence:

https://github.com/systemd/systemd/issues/10740

When allocating a dynamic user, a lookup is done in systemd, which fails (because the user doesnt exist, and systemd is going to allocate a dynamic uid for it) but then that answer is cached and after the dynamic user is set up, nscd will still say the user isn't created.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 16:40 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

You can disable caching for some or all services in /etc/nscd.conf.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 19:01 UTC (Wed) by quotemstr (subscriber, #45331) [Link]

That's just an argument for moving nscd functionality into systemd. It's a coordination problem, not a conceptual problem with the protocol.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 16:05 UTC (Thu) by nix (subscriber, #2304) [Link]

systemd should flush the nscd cache when necessary (via nscd -i) if DynamicUser= is in use. This is not exactly subtle: it's half the contents of the nscd manpage in the man-pages package. :)

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 17:05 UTC (Wed) by drag (guest, #31333) [Link] (1 responses)

I would rather just use sssd then any variant of nscd.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 16:07 UTC (Thu) by nix (subscriber, #2304) [Link]

nscd has one advantage which I don't know if sssd replicates: efficiency. Programs can query the cached names without actually talking to nscd at all, because it exports an mmapped region which programs can just read from as needed. Zero context switches! :)

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 2:59 UTC (Tue) by mangix (guest, #126006) [Link] (8 responses)

Unfortunately because of systemd, this will not happen.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 3:02 UTC (Tue) by fartman (guest, #128226) [Link]

Yeah, and there's the recent dbus-broker project too that hard depends on glibc now (the launcher, not the message broker binary), which will probably end up being as integral as systemd in the coming days (when it replaces the reference implementation).

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 6:44 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

Why?

systemd depends on glibc but it's not a hard dependency, you can compile it with musl with only minor changes (like adding strndupa).

It also seems to be the perfect place for a bus1-broker.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 11:04 UTC (Tue) by judas_iscariote (guest, #47386) [Link] (5 responses)

Systemd needs a C library that provides what glibc does, and one that BEHAVES like glibc, two different requirements.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 12:22 UTC (Tue) by smurf (subscriber, #17840) [Link] (3 responses)

Those problems can be fixed – either by systemd or by the non-glibc library.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 19:44 UTC (Tue) by jccleaver (guest, #127418) [Link] (2 responses)

> Those problems can be fixed – either by systemd or by the non-glibc library.

If you think systemd gives a whit about anyone-besides-its-own-developers' needs, I have some land in Florida you might be interested in.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 19:47 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

systemd fixed (some) of the issues raised by people trying to port it to musl, so your hate is misdirected.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 11:55 UTC (Wed) by hummassa (subscriber, #307) [Link]

Almos every hatred directed at systemd is misdirected. Not that it is a perfect piece of software (it isn't) but because it's seldom really justified.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 17:39 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Uhm. No?

systemd can be compiled with musl right now, with only a couple of tweaks.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 11:45 UTC (Tue) by sorokin (guest, #88478) [Link] (16 responses)

> 2) No getent, no NSS, no name resolution. All of this should be moved into a separate daemon with a simple RPC protocol.

How moving name resolution into a separate daemon makes things better? Won't it be everything that glibc has now plus a RPC protocol?

> 3) No iconv either.

What is wrong with having iconv in glibc?

> And finally, NO SYMBOL VERSIONING. At all. This was THE most braindead decision in the history of Linux.

Although I'm not an expert in symbol versioning at all, this statement sounds too categorical for me. What is the alternative? Sometimes you just need to adjust the behavior of a function for new code while preserving old behavior for already build programs.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 14:46 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

> How moving name resolution into a separate daemon makes things better? Won't it be everything that glibc has now plus a RPC protocol?

Principally, it means that we don't need to do dlopen()s or other shared library operations when NSS ops are requested, which removes a whole pile of complexity from the dynamic linker since right now people assume that they can do name lookups in statically linked programs as well. This is slated to be removed soonish, but I do wonder how much this will break. (There are hardly any statically linked programs left, not against glibc anyway, so perhaps not much.)

This does add extra complexity of its own, notably that one has to track the lifetime of this process without disturbing the caller's waitpid()dery, signalfds or SIGCHLD dispatch, and that one must arrange to communicate with this process without the extra fds one must hold open to do so disrupting the rest of the process (which tends to assume that libc holds no fds and does nothing that might affect signals outside a single function call unless explicitly requested). Solving these problems in a backward-compatible way seems likely to be difficult, but it would benefit not only libc but every other library out there which needs to do similar things.

> What is wrong with having iconv in glibc?

For a long time it was outdated enough that anything that actually cared about localization had to use another library. This isn't true any more, though: iconv maintenance is one of many areas where the new glibc governance model has really shone.

(I agree with you that ditching symbol versioning is madness. We need much *more* of it in the Linux world, not much less.)

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 18:14 UTC (Tue) by pm215 (subscriber, #98099) [Link]

One fairly common statically linked against glibc program is QEMU, specifically in its "user-mode emulator" version -- it's usual to statically link the qemu-arm or similar executable so you can drop it into a foreign-architecture chroot without also needing to drop in a pile of dynamic libraries which might even conflict with those of the chroot.

(As an aside, I really should get round to filing a bug against glib (no 'c'!) for putting utility functions we need in our static build in the some source file as functions which do NSS lookups -- this results in spurious linker warnings for a static build because the linker pulls in the whole .o file.)

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 16:23 UTC (Tue) by MarcB (subscriber, #101804) [Link] (5 responses)

> How moving name resolution into a separate daemon makes things better? Won't it be everything that glibc has now plus a RPC protocol?

One problem with NSS is sandboxing, for example via seccomp.

A program could suddenly require completely different syscalls and/or access completely different files due to NSS.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 12:34 UTC (Wed) by nix (subscriber, #2304) [Link] (4 responses)

Though seccomp has bigger problems than that. I have previously noted here how BIND's named suddenly hung at boot on upgrade to glibc 2.25 because it was sensitive to the removal of the pid cache! (It wasn't expecting its seccomped process to call the getpid() syscall because the pid cache in glibc was satisfying it -- until that cache went away.)

seccomp suddenly means that every program that uses it is sensitive to the set of syscalls made by all its libraries, and that is a level of compatibility that *no* library has ever guaranteed.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 19:04 UTC (Wed) by quotemstr (subscriber, #45331) [Link] (3 responses)

Right, which is why I believe security through system call whitelisting is fragile and unnecessary. Access, IMHO, should be controlled with capabilities and resources. Just as in concurrency programming: protect data, not code.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 1:34 UTC (Thu) by wahern (guest, #37304) [Link] (1 responses)

It should be but Linux still lacks any Capsicum support. Linux has seccomp and a complex eBPF subsystem, while support for one of Capsicum's simplest features, pdfork() (via a new CLONE_FD flag to clone()) remains unmerged AFAICT, even though it has signifiant general utility. (See https://lwn.net/Articles/638613/ for history.) Unless some organization steps up with serious cash and developer time to strongly push Capsicum (or Capsicum-light?) the future doesn't look very bright.

C library system-call wrappers, or the lack thereof

Posted Nov 16, 2018 9:34 UTC (Fri) by fartman (guest, #128226) [Link]

A process descriptor patchset is in works, you should see a patch set posted soon.

C library system-call wrappers, or the lack thereof

Posted Nov 18, 2018 21:16 UTC (Sun) by thestinger (guest, #91827) [Link]

The intended purpose of seccomp-bpf is reducing kernel attack surface. The kernel is the weakest point in decent sandbox implementations and it offers a way to substantially reduce the exposed functionality. It wasn't designed and implementation as a way to implement the main layer of the sandbox. It was provided as a way to reinforce existing sandboxes built on semantic-based sandboxing features like namespaces, SELinux and the POSIX permission model.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 17:56 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

> How moving name resolution into a separate daemon makes things better? Won't it be everything that glibc has now plus a RPC protocol?
Currently NSS can silently load arbitrary libraries (including SSL and LDAP implementations), read files anywhere on the system and in general do all kinds of mayhem. Ditto for PAM.

This is just a bad design, it can introduce blocking in unexpected places and foul the seccomp filters and so on. Oh, and NSS is also synchronous.

It also severely limits what is available. For example, you can't easily validate password against /etc/shadow without having access to it.

Preventing static linking is just a cherry on top.

> What is wrong with having iconv in glibc?
Lots of crap with an unusable interface. It should either be removed or re-done correctly.

> Although I'm not an expert in symbol versioning at all, this statement sounds too categorical for me. What is the alternative? Sometimes you just need to adjust the behavior of a function for new code while preserving old behavior for already build programs.
Don't adjust user-visible behavior of existing functions in the freaking glibc.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 18:42 UTC (Tue) by fartman (guest, #128226) [Link] (3 responses)

> Ditto for PAM.

and this is going to be a lot harder to fix, because some PAM modules (your very own pam_systemd and pam_selinux) exploit the property that it runs in the same address space and context as that of the caller.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 18:47 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> and this is going to be a lot harder to fix, because some PAM modules (your very own pam_systemd and pam_selinux) exploit the property that it runs in the same address space and context as that of the caller.
This should probably replaced by special hooks in SSH or other software that starts user sessions.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 12:37 UTC (Wed) by nix (subscriber, #2304) [Link]

We have those special hooks. They are called PAM.

Or are you seriously proposing to go back to the old world where every new authentication mechanism required independently hacking every program that needed it? Because that is not going to fly with anyone with remotely complex needs (forget enterprise LDAP, even people with yubikeys will be calling for your demise :) )

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 15:13 UTC (Thu) by cortana (subscriber, #24596) [Link]

Remember that PAM is used for authentication (auth), authorization (account), changing passwords (passwd) and generally monkeying with the invoking process (session). We can imagine a world where we make an API/RPC/whatever call out for the first three, and keep the last one in-process.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 12:35 UTC (Wed) by nix (subscriber, #2304) [Link] (2 responses)

> Don't adjust user-visible behavior of existing functions in the freaking glibc.

That equates to 'do not fix bugs', since a major use of symbol versioning is to keep an older version of a buggy symbol when existing binaries are believed to depend upon this bug.

Not a tenable proposition.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 19:44 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> That equates to 'do not fix bugs', since a major use of symbol versioning is to keep an older version of a buggy symbol when existing binaries are believed to depend upon this bug.
It doesn't mean that. You can't fix bugs that break existing programs. Just like the kernel.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 16:01 UTC (Thu) by nix (subscriber, #2304) [Link]

Well, that's strictly worse than what we currently have with symbol versioning, where you can fix such bugs fairly freely, and old programs are only bitten by the changes when they are recompiled (at which point one can be certain that the person bitten by it is able to recompile it and therefore has the source code and can fix it).

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 12:36 UTC (Tue) by smurf (subscriber, #17840) [Link] (11 responses)

> And finally, NO SYMBOL VERSIONING. At all. This was THE most braindead decision in the history of Linux.

No it was not. Symbol versioning solved real problems. If you don't want symbol versions, ultimately you have to rebuild the world whenever there's an incompatible change (because libc is not only used your application, but also by other libraries you're linking with). The BSDs can do that. Linux? not so much.

If your libc is so simple / feature complete as to never require any incompatible change, good for you, but glibc was not such a library when we decided to switch to it, and there was no nearly-as-feature-complete alternative.

Right now? maybe musl is a good answer, I don't know – does it support pthreads, cancellation, C++-ish constructors and destructors, and all the other interesting features that complexify glibc (thread-local 'errno' et al.)?

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 16:26 UTC (Tue) by quotemstr (subscriber, #45331) [Link]

To be fair, symbol versioning might be too fine-grained. The problem with the soname is that it applies to the *whole library*. The problem with symbol versions is that they apply to individual symbols and are largely invisible until something goes wrong. An intermediate level --- a vector of soname-like semver-valued version tags, for example --- might be more comprehensible.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 18:00 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> No it was not. Symbol versioning solved real problems. If you don't want symbol versions, ultimately you have to rebuild the world whenever there's an incompatible change (because libc is not only used your application, but also by other libraries you're linking with).
No. Windows somehow survives just fine without symbol versioning for its core libraries. So I know that it can be done.

It's not rocket science, it's just a freaking libc. You simply don't change its visible behavior.

> Right now? maybe musl is a good answer, I don't know – does it support pthreads, cancellation, C++-ish constructors and destructors, and all the other interesting features that complexify glibc (thread-local 'errno' et al.)?
musl supports pthreads and cancellation (which should also be removed) and all the sane features (non-TLS errno shouldn't even exist).

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 19:44 UTC (Tue) by HenrikH (subscriber, #31152) [Link] (5 responses)

>No. Windows somehow survives just fine without symbol versioning for its core libraries. So I know that it can be done.

I'd rather have symbol versioning than the vcredist hell that they have on Windows.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 19:47 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

What hell? You just bundle MSVCRT libraries along with your application and it works.

You can't bundle glibc along with your app, btw.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 20:00 UTC (Tue) by excors (subscriber, #95769) [Link]

It stopped being that simple since around VC++ 2005, where the CRT refused to run on newer versions of Windows unless you figured out the undocumented process of bundling the correct manifest file too.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 4:22 UTC (Wed) by rkeene (guest, #88031) [Link] (1 responses)

You can indeed bundle glibc along with your application. I do this with AppFS, my FUSE-based package manager.

They're all just files on disk.

I'd be happy to demonstrate to you if you wish.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 6:14 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> You can indeed bundle glibc along with your application.
You'll break NSS modules unless your glibc is compatible with the system one.

It's possible if you bundle a whole custom environment: glibc, NSS modules and other crap. Or if you don't use anything that touches the name resolution.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 19:57 UTC (Wed) by HenrikH (subscriber, #31152) [Link]

And how fun is that when the vcredist package is much larger than the base install of your application? Also I've seen examples of single applications that required several different vcredist packages at the same time.

Combine that with the fact vcredist is not covered by Windows Update so old unsecure version will linger on users drives for ever. More fun is when you discover that no Microsoft shipped software uses the vcredist libraries (as usual they do not eat their own dog food).

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 0:22 UTC (Thu) by wahern (guest, #37304) [Link] (2 responses)

> maybe musl is a good answer, I don't know – does it support pthreads, cancellation, C++-ish constructors and destructors, and all the other interesting features that complexify glibc (thread-local 'errno' et al.)?

Yes, it supports all of those, and does so in a less bug prone manner than glibc.

Arguably the biggest functional differences between glibc and musl are in locale support (musl only supports C and UTF-8), and missing NSS and PAM functionality, though musl recently gained nscd support.

IME the biggest technical difference is that glibc heavily relies on linker and runtime hacks, which has been the source of many races in glibc's threading runtime, some still unresolved (I've confirmed and reported at least 2, IIRC.) musl libc eliminated an entire class of threading bugs by implementing pthreads entirely within libc and discarding the distinction between threaded and non-threaded runtimes, whereas glibc dedicates a tremendous amount of complexity to supporting dynamic initialization of pthreads when dlopen'ing a libpthread-dependent.

Locale, NSS, PAM, and dynamic pthread initialization are the primary sources of complexity (disregarding programming style and conventions) that make glibc source nearly impenetrable as compared to musl libc. Similarly, the implementation of those features also account for much of glibc's complex linking hacks.

I think symbol versioning is a great idea. Where it falls short is in poor integration with the toolchain, and in particular compilers, which require writing ad hoc macro and assembly code to mark versions. It would also be nice to have a tool which could scan libraries and report on symbol versioning information, including potential inconsistencies--where versioning could be used and where it's being used improperly. Symbol versioning will continue to remain niche and poorly appreciated unless and until it's better integrated with toolchains. Unfortunately, that seems increasingly less likely judging by the choices languages like Go and Rust are making, sentiments which seem to be shared by alternative toolchains like LLVM. Which is what it all really boils down to--the quality of toolchain support is often the ultimate arbiter of what people come to believe to be "proper" solutions.

That said, musl libc itself has less need for symbol versioning because it adheres tightly to POSIX and, of course, lacks glibc's legacy baggage--both self-inflicted and unavoidable. It would be nice for musl's linker to gain symbol versioning support, though.

One slightly annoying thing about musl is that it actively *prevents* leaking version information via headers. It's difficult if not impossible to support header-only feature detection with musl libc. There are few interfaces which musl lacks, though that's partly an artifact of most musl environments being reliably modern--you can get away with assuming that musl supports, e.g., dup3 simply because if you're compiling against musl at all it's invariably a very recent release matched to an equally recent kernel. OTOH, many *other* environments may lack support for a particular feature, which means that in portable code rather than asserting the use of the few libcs (including musl) that support feature X, you have to assert the non-use of all the *other* libcs. In other words, because you can't positively assert musl directly, you must assert it by negative implication, requiring enumeration of unsupported environments.[1] In effect, musl libc's policy can have the perverse (yet *known* and *intended*) effect of requiring the use of autoconf-style feature detection. I say perverse because over the past 20 years POSIX has almost completely eliminated subtle API and ABI incompatibilities, and led to a much more uniform C environment. Most extensions, such as epoll/kqueue, are easily and reliably detected via headers[2], so the use of autoconf-style feature detection has never been less necessary, musl's policy notwithstanding.

[1] That's not always a bad thing, and often the better approach. But it would be nice to have a choice. And in any event the fact remains that you can't *exclude* older musl releases, either, which will become a problem if musl begins to see wider adoption.

[2] This is why common C convention is still to use macro constants rather than enums, which will continue until we can enjoy compiler support for type detection similar to the __has_include built-in.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 16:31 UTC (Thu) by nix (subscriber, #2304) [Link]

> This is why common C convention is still to use macro constants rather than enums, which will continue until we can enjoy compiler support for type detection similar to the __has_include built-in.

C11 _Generic is probably what you want here, I think. (... or perhaps not, since all the types in the _Generic must exist at compile-time.)

C library system-call wrappers, or the lack thereof

Posted Nov 18, 2018 17:42 UTC (Sun) by rweikusat2 (subscriber, #117920) [Link]

The problem with symbol versioning as it's provided is that it's not really versioning at all, just tagging symbols with abitrary, other strings: Neither a symbol history nor an ability to determine a consistent tag snapshot based on a certain symbol tag is necessarily available. If version nodes dependencies are being used and if the provided dependency graph enables determining a total ordering of symbol tags, both features could be made available but neither use of version node dependencies is mandatory.

The provided feature is really just good enough to support the "never again increment the major version" glibc policy.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 17:14 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link]

That's not going to solve the problem of certain system calls being only accessible by number via the syscall function.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 4:10 UTC (Tue) by jreiser (subscriber, #11027) [Link] (4 responses)

in some cases, the glibc developers have refused to add the wrappers at all. In such cases, user-space developers must fall back on syscall() to access that functionality, an approach that is both non-portable and error-prone. Rather than using syscall(), there are developers who write wrappers themselves. Neither glibc nor any other library has a monopoly on the appropriate resources (software talent, time, ...).

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 14:14 UTC (Tue) by k8to (guest, #15413) [Link] (3 responses)

For use in their own software? I've done that.

For upstreaming to glibc? I'd be very impressed. It's sort of the right thing to do but when you need them you typically need them on current deployments, which upstreaming won't help with. Add to that the challenge of absorbing a new codebase and I would not elect many to follow this path.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 14:28 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link] (2 responses)

" It's sort of the right thing to do but when you need them you typically need them on current deployments, which upstreaming won't help with"

It will future proof your work and is often less expensive than committing resources to a soft fork.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 14:41 UTC (Tue) by k8to (guest, #15413) [Link] (1 responses)

If you write a wrapper in your own software, and handle the not present case, then your work is both future proofed and past proofed.

Linux is quite serious about not breaking system calls.

I don't think anyone would hack a private version of libc, that would be weird.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 14:48 UTC (Tue) by nix (subscriber, #2304) [Link]

> I don't think anyone would hack a private version of libc, that would be weird.

Several major vendors have done just that, in addition to 'stub libcs' so they can compile programs that can run against old glibcs on systems that only have newer versions.

(I've also done the same myself, but thankfully everything I was doing that for has been upstreamed now. However, it did take many years for me to get around to it and a dozen or so patch rounds before it was ready for integration, so in that window I had no real alternative. :) More generally, everyone adding a feature to glibc hacks a private version until it is upstreamed!)

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 4:22 UTC (Tue) by jreiser (subscriber, #11027) [Link] (3 responses)

in some cases, the glibc developers have refused to add the wrappers at all. Even worse, the wrapper for getpid() has a bug which glibc considers to be a feature. "No one will ever call it during the window when it returns the [cached] pid of the parent instead of the pid of the child." WRONG: my software does. I had to re-implement getpid().

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 4:27 UTC (Tue) by demfloro (guest, #106936) [Link] (2 responses)

The cache got removed in glibc 2.25: https://sourceware.org/glibc/wiki/Release/2.25

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 16:35 UTC (Tue) by cyphar (subscriber, #110703) [Link] (1 responses)

Interestingly, the main justification of the PID namespace semantics (from my understanding) is that glibc would do this pid caching (as well as other programs saving the PID and assuming it won't change) and thus you couldn't change the pid from underneath a process. So I guess PID namespaces will now always just cost a fork(), despite the underlying glibc behaviour having been fixed.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 17:59 UTC (Tue) by nybble41 (subscriber, #55106) [Link]

> (as well as other programs saving the PID and assuming it won't change)

I would imagine this part is still true, and thus the PID namespace semantics are still needed even if glibc no longer caches the PID.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 12:33 UTC (Tue) by ermo (subscriber, #86690) [Link] (12 responses)

It's great that Linux has achieved success as a kernel in the Linux <-> GNU marriage of convenience.

But one can't help but wonder if not exactly this problem highlights an aspect of the *nix kernel <-> libc relationship where the BSD development model (kernel and libc developed in lockstep) actually holds an advantage? Doesn't each BSD have its own libc?

One could argue that systemd is approaching this advantage via an alternative route in the Linux-specific part of the ecosystem, but in either case, it seems that there are certain advantages to having all the plumbing parts (including kernel and libc) under one roof. Note that I'm not suggesting that there aren't any disadvantages in this approach.

I have no dog in the fight, just making a (possibly terribly ill-informed!) observation.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 15:21 UTC (Tue) by pctammela (guest, #126687) [Link] (7 responses)

> But one can't help but wonder if not exactly this problem highlights an aspect of the *nix kernel <-> libc relationship where the BSD development model (kernel and libc developed in lockstep) actually holds an advantage? Doesn't each BSD have its own libc?

Yes, but what's your point?

The fact that the libc is part of the project doesn't shield it from having the same problem as glibc and Linux.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 16:26 UTC (Tue) by pm215 (subscriber, #98099) [Link] (5 responses)

I would expect the BSD model to mean that the kernel and libc developers were more likely to be the same people, or on the same mailing lists, or at least talking to each other more often before and during implementation of new APIs, which ought to lead to fewer instances of "we added this API to the kernel but it doesn't work at the libc level"...

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 19:47 UTC (Tue) by mirabilos (subscriber, #84359) [Link] (4 responses)

The same people.

We don’t even separate kernel and libc in thinking. Commits are done across the entire source tree, oftentimes touching both in the same commit.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 12:40 UTC (Wed) by nix (subscriber, #2304) [Link] (3 responses)

FWIW, this is the primary reason why I stopped using the BSDs for anything. The endless ABI breaks whenever libc bumped soname *yet again* were just too tiresome. (Yes, yes, make world rebuilds everything, *if* everything you use is in the ports system and *if* you don't mind a transient but possibly lengthy period of breakage while the thing rebuilds. That's great, but I was using this system as a firewall and mail relay and my MTA is being built as about the 2000th package and now all my mail is bouncing until the effing thing catches up.)

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 22:25 UTC (Wed) by kmeyer (subscriber, #50720) [Link] (2 responses)

I don't know when you used BSD last, but the FreeBSD libc SONAME hasn't changed in years (maybe a decade?). All symbols are versioned and backwards compatibility is maintained with quite old versions of FreeBSD libc when APIs change. E.g., https://github.com/freebsd/freebsd/blob/master/lib/libc/g... . (FreeBSD 7 is from 2008.)

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 23:25 UTC (Wed) by mirabilos (subscriber, #84359) [Link]

That’s FreeBSD, not BSD ;-)

A bit more seriously, FreeBSD “feels” like trying to imitate what GNU/Linux does, just a bit behind the times and selectively. They also do a lot of rewrites (one could say NIH) and are very removed from the other BSDs (even the two active forks of it), code-wise.

C library system-call wrappers, or the lack thereof

Posted Nov 15, 2018 16:02 UTC (Thu) by nix (subscriber, #2304) [Link]

Yeah, this was OpenBSD I think. What's their libc soname up to now... libc.so.92, good grief.

C library system-call wrappers, or the lack thereof

Posted Nov 13, 2018 16:38 UTC (Tue) by cyphar (subscriber, #110703) [Link]

Well, given that Solaris (and I believe the BSDs) only provide *libc* ABI compatibility and not syscall compatibility, this allows them to change the syscall interfaces over time by adding glue code to libc. In theory this is something that cannot reasonably be done on Linux (especially since quite a few people are forced to use syscall() to overcome glibc bugs or missing implementations), but can be done on the BSDs because there is only *one* libc for each project.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 1:24 UTC (Wed) by marduk (subscriber, #3831) [Link] (3 responses)

Linux did once have it's own libc (a fork of glibc). And then they went (back) to glibc. I think the thinking at the time was that it was undeserable to maitain a libc and the kernel and that the kernel devs were only interested in certain aspects of the C library whereas some other parts that "real" people use weren't getting enough attention.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 16:40 UTC (Wed) by matthewbauer (guest, #128608) [Link] (1 responses)

I think you’re referring to klibc. And it still is in use - just internally with Linux. I don’t think it was ever intended as a general libc replacement.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 16:55 UTC (Wed) by marduk (subscriber, #3831) [Link]

I'm talking about "linux libc" (I don't think there was an official name for it".

http://freesoftwaremagazine.com/articles/history_of_glibc...

I remember the pains of migrating a Slackware box from "libc5" (linux libc) to "libc6" (GNU libc 2).

C library system-call wrappers, or the lack thereof

Posted Nov 29, 2018 9:10 UTC (Thu) by hensema (guest, #980) [Link]

> I think the thinking at the time was that it was undeserable to maitain a libc and the kernel and that the kernel devs were only interested in certain aspects of the C library whereas some other parts that "real" people use weren't getting enough attention.

That's why a split of the C library is needed. One part is the syscall stuff (let's call it liblinux for now) and the other part is the C library with all of its string manipulation, memory management, pam, etc, etc.

This would let kernel devs also develop the userspace counterparts of their system calls while not burdening them with all the other stuff a C library must do. liblinux could even be statically linked into glibc if you want to avoid changing all Makefiles in the world.

Another advantage is that liblinux would become the new ABI, whereas the userspace/kernelspace interface could undergo changes once in a while. And we'd have a REAL reason to increase the kernel major version ;-)

Linux Plumbers Conference?

Posted Nov 13, 2018 21:21 UTC (Tue) by martin.langhoff (subscriber, #61417) [Link] (1 responses)

It seems to me that glibc maintainers and Linux devs are not talking.

Don't we have LPC for this? Sponsor linux-friendly glibc developers attendance... parting bread and sharing a drink with a fellow programmer makes thoughts and collaboration flow.

I understand, glibc isn't _only_ for Linux, but it is counterproductive to have such miscommunication with its most widely used kernel.

Linux Plumbers Conference?

Posted Nov 16, 2018 1:20 UTC (Fri) by siddhesh (guest, #64914) [Link]

This Plumbers had a toolchain microconference (admittedly a bit hurriedly cobbled together) where we had the beginnings of discussions around this and other topics. Some of us are interested in doing this every year, so hopefully we will bridge this communication gap somewhat in the coming years.

C library system-call wrappers, or the lack thereof

Posted Nov 14, 2018 14:30 UTC (Wed) by jani (subscriber, #74547) [Link]

The DRM subsystem effectively requires a real world user before adding uapi. Not just some ad-hoc test tool thrown to github, but basically patches to the userspace graphics stack and acks that the approach and the uapi is sane.

The article makes it sound like syscalls can be added if it sounds like a good idea to the kernel folks alone. Should we have harder open source userspace requirements before adding syscalls?

C library system-call wrappers, or the lack thereof

Posted Nov 19, 2018 10:16 UTC (Mon) by tdz (subscriber, #58733) [Link] (2 responses)

I wonder why glibc is expected to wrap Linux system calls. AFAICT the C library should rather implement POSIX plus maybe some useful extensions. If new system calls help with that, no problem. But if a program wants to use Linux-specific functionality it becomes de-facto a Linux-specific program. Why push that dependency into libc?

C library system-call wrappers, or the lack thereof

Posted Nov 21, 2018 1:57 UTC (Wed) by flussence (guest, #85566) [Link] (1 responses)

Good observation. It might be better for long-term health of the ecosystem to split the non-POSIX stuff off into a liblinux, like there's already a libbsd for functions like arc4random. It's not just about fixing Linux-specific dependencies but glibc-specific ones too; bionic and musl exist. Users of the library ought to have to make a conscious choice to write non-portable code.

C library system-call wrappers, or the lack thereof

Posted Nov 24, 2018 17:23 UTC (Sat) by nix (subscriber, #2304) [Link]

This has been intermittently proposed but hasn't gone anywhere yet as far as I know. (The name of the library is nice: following on from libiberty, the proposal is to call this one 'libinux'. hence gcc -o foo foo.o bar.o ... -linux ... :)