User: Password:
|
|
Subscribe / Log in / New account

Glibc and the kernel user-space API

Benefits for LWN subscribers

The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

By Michael Kerrisk
January 30, 2013

We are accustomed to thinking of a system call as being a direct service request to the kernel. However, in reality, most system call invocations are mediated by wrapper functions in the GNU C library (glibc). These wrapper functions eliminate work that the programmer would otherwise need to do in order to employ a system call. But it turns out that glibc does not provide wrapper functions for all system calls, including a few that see somewhat frequent use. The question of what (if anything) to do about this situation has arisen a few times in the last few months on the libc-alpha mailing list, and has recently surfaced once more.

A system call allows a program to request a service—for example, open a file or create a new process—from the kernel. At the assembler level, making a system call requires the caller to assign the unique system call number and the argument values to particular registers, and then execute a special instruction (e.g., SYSENTER on modern x86 architectures) that switches the processor to kernel mode to execute the system-call handling code. Upon return, the kernel places the system call's result status into a particular register and executes a special instruction (e.g., SYSEXIT on x86) that returns the processor to user mode. The usual convention for the result status is that a non-negative value means success, while a negative value means failure. A negative result status is the negated error number (errno) that indicates the cause of the failure.

All of the details of making a system call are normally hidden from the user by the C library, which provides a corresponding wrapper function and header file definitions for most system calls. The wrapper function accepts the system call arguments as function arguments on the stack, initializes registers using those arguments, and executes the assembler instruction that switches to kernel mode. When the kernel returns control to user mode, the wrapper function examines the result status, assigns the (negated) error number to errno in the case of a negative result, and returns either -1 to indicate an error or the non-negative result status as the return value of the wrapper function. In many cases, the wrapper function is quite simple, performing only the steps just described. (In those cases, the wrapper is actually autogenerated from syscalls.list files in the glibc source that tabulate the types of each system call's return value and arguments.) However, in a few cases the wrapper function may do some extra work such as repackaging arguments or maintaining some state information inside the C library.

The C library thus acts as a kind of gatekeeper on the API that the kernel presents to user space. Until the C library provides a wrapper function, along with suitable header files that define the calling signature and any constant and structure definitions used by the system call, users must do some manual work to make a system call.

That manual work includes defining the structures and constants needed by the system call and then invoking the syscall() library function, which handles the details of making the system call—copying arguments to registers, switching to kernel mode, and then setting errno once the kernel returns control to user space. Any system call can be invoked in this manner, including those for which the C library already provides a wrapper. Thus for example, one can bypass the wrapper function for read() and invoke the system call directly by writing:

    nread = syscall(SYS_read, fd, buf, len);

The first argument to syscall() is the number of the system call to be invoked; SYS_read is a constant whose definition is provided by including <unistd.h>

The C library used by most Linux developers is of course the GNU C library. Normally, glibc tracks kernel system call changes quite closely, adding wrapper functions and suitable header file definitions to the library as new system calls are added to the kernel. Thus, manually coding system calls is normally only needed when trying to use the latest system calls that have not yet appeared in the most recent iteration of glibc's six-month release cycle or when using a recent kernel on a system that has a significantly older version of glibc.

However, for some system calls, glibc support never appears. The question of how the decision is made on whether to support a particular system call in glibc has once again become a topic of discussion on the libc-alpha mailing list. The most recent discussion started when Kees Cook, the implementer of the recently added finit_module() system call, submitted a rudimentary patch to add glibc support for the system call. In response, Joseph Myers and Mike Frysinger noted various pieces that were missing from the patch, with Joseph adding that "in the kexec_load discussion last May / June, doubts were expressed about whether some existing module-related syscalls really should have had functions in glibc."

The module-related system calls—init_module(), delete_module(), and so on—are among those for which glibc does not provide support. The situation is in fact slightly more complex in the case of these system calls: glibc does not provide any header file support for these system calls but does, through an accident of history, export a wrapper function ABI for the calls.

The earlier discussion that Joseph referred to took place when Maximilian Attems attempted to add a header file to glibc to provide support for the kexec_load() system call, stating that his aim was "to axe the syscall maze in kexec-tools itself and have this syscall supported in glibc." One of the primary glibc maintainers, Roland McGrath, had a rather different take on the necessity of such a change, stating "I'm not really convinced this is worthwhile. Calling 'syscall' seems quite sufficient for such arcane and rarely-used calls." In other words, adding support for these system calls clutters the glibc ABI and requires (a small amount of) extra code in order to satisfy the needs of a handful of users who could just use the syscall() mechanism.

Andreas Jaeger, who had reviewed earlier versions of Maximilian's patch, noted that "linux/syscalls.list already [has] similar esoteric syscalls like create_module without any header support. I wouldn't object to do this for kexec_load as well". Roland agreed that the kexec_load() system call is a similar case, but felt that this point wasn't quite germane, since adding the module system calls to the glibc ABI was a "dubious" historical step that can't be reversed for compatibility reasons.

But in the recent discussion of finit_module(), Mike Frysinger spoke in favor of adding full glibc support for module-related system calls such as init_module(). Dave Miller made a similar argument even more succinctly:

It makes no sense for every tool that wants to support doing things with kernel modules to do the syscall() thing, propagating potential errors in argument signatures into more than one location instead of getting it right in one canonical place, libc.

In other words, employing syscall() can be error prone: there is no checking of argument types nor even checking that sufficient arguments have been passed.

Joseph Myers felt that the earlier kexec_load() discussions hadn't fully settled the issue, and was interested in having some concrete data on how many system calls don't have glibc wrappers. Your editor subsequently donned his man-pages maintainer hat and grepped the man pages in section 2 to determine which system calls do not have full glibc support in the form of a wrapper function and header files. The resulting list turns out to be quite long, running to nearly 40 Linux system calls. However, the story is not quite so simple, since some of those system calls are obsolete (e.g., tkill(), sysctl(), and query_module()) and others are intended for use only by the kernel or glibc (e.g., restart_syscall()). Yet others have wrappers in the C library, although the wrappers have a significantly different names and provide some piece of extra functionality on top of the system call (e.g., rt_sigqueueinfo() has a wrapper in the form of the sigqueue() library function). Clearly, no wrapper is required for those system calls, and once they are excluded there remain perhaps 15 to 20 system calls that might be candidates to have glibc support added.

Motohiro Kosaki considered that the remaining system calls could be separated into two categories: those with only one or a few applications uses and those that seemed to him to have more widespread application use. Motohiro was agnostic about whether the former category (which includes the module-related system calls, kcmp(), and kcmp_load()) required a wrapper. However, in his opinion the system calls in the latter category (which includes system calls such as ioprio_set(), ioprio_get(), and gettid()) clearly merited having full glibc support.

The lack of glibc support for gettid(), which returns the caller's kernel thread ID, is an especially noteworthy case. A long-standing glibc bug report requesting that glibc add support for this system gained little traction with the previous glibc maintainer. However, excluding that system call is rather anomalous, since it is quite frequently used and the kernel exposes thread IDs via various /proc interfaces, and glibc exposes various kernel APIs that can employ kernel thread IDs (for example, sched_setaffinity(), fcntl(), and the SIGEV_THREAD_ID notification mode for POSIX timers).

The discussion has petered out in the last few days, despite Mike Frysinger's attempt to further push the debate along by reading and summarizing the various pro and contra arguments in a single email. As noted by various participants in the discussion, adding glibc wrappers for some currently unsupported system calls would seem to have some worthwhile benefits. It would also help to avoid the confusing situation where programmers sometimes end up searching for a glibc wrapper function and header file definitions that don't exist. It remains to be seen whether these arguments will be sufficient to persuade Roland in the face of his concerns about cluttering the glibc ABI and adding extra code to the library for the benefit of what he believes is a relatively small number of users.


(Log in to post comments)

Glibc and the kernel user-space API

Posted Jan 30, 2013 16:23 UTC (Wed) by justincormack (subscriber, #70439) [Link]

There are other issues with the syscall(2) interface too, like having to split up 64 bit arguments manually on 32 bit architectures.

And the documentation in man(2) is very schizophrenic, as it is sometimes about the syscall and sometimes about the glibc interface or quirks. Ideally there should be entries in man 2 and man 3 for some more calls (or all of them?).

Sometimes glibc makes very odd decisions, like caching the output of getpid() so it returns the incorrect value if you call clone(2) yourself.

Sometimes however there can be useful bug fixes, and lots of compatibility wrappers which are useful. But again documentation in man 3 would be useful...

Glibc and the kernel user-space API

Posted Jan 30, 2013 16:30 UTC (Wed) by glandium (subscriber, #46059) [Link]

> There are other issues with the syscall(2) interface too, like having to split up 64 bit arguments manually on 32 bit architectures.

Or having to handle different ways to pass arguments for syscalls with a lot of them on some architectures.

Glibc and the kernel user-space API

Posted Jan 31, 2013 0:09 UTC (Thu) by mkerrisk (subscriber, #1978) [Link]

And the documentation in man(2) is very schizophrenic, as it is sometimes about the syscall and sometimes about the glibc interface or quirks. Ideally there should be entries in man 2 and man 3 for some more calls (or all of them?).

There's some relevant thoughts about this here. The main point is that the user shouldn't (as far as I am concerned) need to consult two pages to get the information they need.

Glibc and the kernel user-space API

Posted Jan 31, 2013 15:22 UTC (Thu) by paulj (subscriber, #341) [Link]

Yeah, there really ought to be a separate section (2s?) to document the actual kernel syscall.

Glibc and the kernel user-space API

Posted Feb 1, 2013 6:17 UTC (Fri) by vapier (subscriber, #15768) [Link]

section 2 is the standard location for syscalls (or rather, for the ABI the kernel provides). but the contention isn't the location, it's splitting information across two pages for end developers (section 2 and section 3). mkerrisk posted a link that explains a bit more.

Glibc and the kernel user-space API

Posted Feb 1, 2013 11:37 UTC (Fri) by paulj (subscriber, #341) [Link]

I'm aware what sections 2 and 3 are traditionally for. However, you can have "sub-sections", indicated by a letter suffix on the section. E.g. several sections have POSIX specific sub-sections. Compare "man 1 sh" to "man 1p sh", or "man nan" to "man 3p nan".

There's no reason you couldn't have a "k" sub-section (so "2k" and/or "3k"), to document kernel specific things about programming interfaces. There's no reason why you can't have *both* a glibc and Linux kernel specific version for a man-page.

Glibc and the kernel user-space API

Posted Feb 2, 2013 22:19 UTC (Sat) by vapier (subscriber, #15768) [Link]

using subsections instead of the proper (and already standardized) main sections doesn't address the stated concern in any way

Glibc and the kernel user-space API

Posted Jan 30, 2013 16:46 UTC (Wed) by michaeljt (subscriber, #39183) [Link]

From the article, quoting Dave Miller:

> It makes no sense for every tool that wants to support doing things with kernel modules to do the syscall() thing, propagating potential errors in argument signatures into more than one location instead of getting it right in one canonical place, libc.

Does that canonical place have to be libc though? Why not e.g. some liblinux which could live in the kernel source tree?

Glibc and the kernel user-space API

Posted Jan 30, 2013 17:59 UTC (Wed) by ballombe (subscriber, #9523) [Link]

Do we even really need a library ?
A header file with the proper macro wrappers set and prototype should be sufficient.

Glibc and the kernel user-space API

Posted Jan 30, 2013 18:31 UTC (Wed) by samroberts (guest, #46749) [Link]

Wasn't there an article in the last year about the kernel perhaps including a userspace library as a part of it's source? That seems a decent place to put "raw" syscall wrappers.

I think C callable functions with prototypes would offer better type-checking than macros that generate syscall(), and one reason to NOT use syscall directly is to get the arg types correct.

Glibc and the kernel user-space API

Posted Jan 30, 2013 19:14 UTC (Wed) by kpfleming (subscriber, #23250) [Link]

A header file with the appropriate 'static inline' functions would serve this purpose nicely; it doesn't need to be compiled and distributed as a library, but the function calls are type-checked and can be written more easily than preprocessor macros.

Glibc and the kernel user-space API

Posted Jan 30, 2013 21:48 UTC (Wed) by justincormack (subscriber, #70439) [Link]

You would have to namespace them though, as they mainly have the same names as existing libc functions...

Glibc and the kernel user-space API

Posted Jan 30, 2013 23:41 UTC (Wed) by and (subscriber, #2883) [Link]

> You would have to namespace them though, as they mainly have the same names as existing libc functions...

The same is true if you would use macros

Glibc and the kernel user-space API

Posted Jan 31, 2013 1:19 UTC (Thu) by skissane (subscriber, #38675) [Link]

Officially GNU LibC is not just to support the Linux kernel, but other kernels as well. Of course, in reality Linux is the main kernel it gets used with, but the portability is a nice idea to maintain.

This suggests it is useful to distinguish new functionality into two groups:

  1. functions that another kernel could well implement, even if no other kernel does so right now; functionality that could potentially be standardised as part of SUS/POSIX, even if it isn't right now
  2. functionality which is inherently very kernel-specific, and while other kernels may provide the functionality, they are likely to do so with rather different API designs. These kinds of functions are unlikely to ever be standardised, and there is little value in providing cross-kernel APIs for them

I think module loading, kexec, etc., really belong to group (2). That then suggests, that these functions don't belong in LibC, but instead in some separate Linux-specific library or header file (a "liblinux").

Glibc and the kernel user-space API

Posted Jan 31, 2013 3:36 UTC (Thu) by raven667 (subscriber, #5198) [Link]

While I'm no software engineer it seems to be that there are inherent complexities and inefficiencies in trying to abstract away non-standard features and in preventing them from being used in the main body of code, to be relegated to a separate support library. You can't pervasively make full use of possibly useful features because you have to support more impoverished environments which might not have the feature, or might have the feature but with a highly incompatible API. If you try to use the local APIs of each OS then you may end up with a different implementation for each OS, wrapped in ifdefs.

I think there is a real debate that should happen to figure out where the best place is to draw the lines for cross platform compatibility. Should one use the OpenBSD model where there is a core project that takes full advantage of all the OpenBSD-only features along with a separate project for porting that to other systems. What about the rest of user space, of desktops, where should things be tied to the particular OS by using OS features and where should things be designed for the lowest common denominator, or where is it acceptable to provide multiple implementations with ifdef.

Glibc and the kernel user-space API

Posted Jan 31, 2013 14:26 UTC (Thu) by bkw1a (subscriber, #4101) [Link]

I'm currently experimenting with the Ceph distributed filesystem, which can make use of "syncfs" if it's available. I'm running a kernel that supports syncfs (The ElRepo kernel-ml-3.7.5-1), but on a distribution (CentOS 6.3) that doesn't have a glibc that knows about this function. It would be nifty if the kernel came with a "liblinux" that implemented things like this, instead of the daunting (non-starter, really) prospect of upgrading to a new glibc just to get syncfs.

Glibc and the kernel user-space API

Posted Jan 30, 2013 18:09 UTC (Wed) by jwarnica (guest, #27492) [Link]

Forgive my ignorance in the details of shared objects... but why would GlibC not include these (more obscure) syscalls in some auxiliary library? One which has a disclaimer like "this is guaranteed to break, but is better than nothing, STFU".

Or maybe then the discussion would just go from "exist?", to "promote to supported?".

Glibc and the kernel user-space API

Posted Feb 1, 2013 0:01 UTC (Fri) by wahern (subscriber, #37304) [Link]

I think the argument is that it's more trouble than it's worth. That is, more time and anxiety is spent providing the wrapper than that spent by the few people actually using the syscall writing their own wrapper. It's not so much about maintenance, but the upfront work and pile of junk wrappers.

Writing your own wrapper is utterly trivial, unless you're a masochist and are intent on supporting multi-arch systems. For syscalls like gettid, there's absolutely nothing wrong with just writing syscall(SYS_gettid). It can even be cleaner if you're writing portable apps, because you can just `#ifdef SYS_gettid', rather than testing for the existence of glibc or __linux, or having to bring autotools into the equation (if it's not already).

Glibc and the kernel user-space API

Posted Feb 1, 2013 12:06 UTC (Fri) by bgoglin (subscriber, #7800) [Link]

All numa related syscalls are also missing in glibc (mbind, move_pages, migrate_pages). You have to link with libnuma to get them even when you don't use the libnuma API.

Glibc and the kernel user-space API

Posted Feb 7, 2013 11:30 UTC (Thu) by pebolle (subscriber, #35204) [Link]

0) It's is interesting that this article mentions Maximilian Attems attempt to add a header file to glibc. Because directly after that attempt he submitted the patch that became mainline commit 29a5c67e7a78815fda0567a867adce467f6e6e5a ("kexec: export kexec.h to user space"): https://lkml.org/lkml/2012/5/24/107.

1) As a result of that commit include/uapi/linux/kexec.h now exports a declaration of kexec_load(). But this declaration doesn't come with a matching definition of that function, so it doesn't help much to include kexec.h if one wants to use kexec_load().

2) The patch ended up in the mainline tree without any discussion, as far as I can tell. A previous attempt, two years earlier, did generate discussion but didn't end up in mainline: https://lkml.org/lkml/2010/6/19/106. Note that this earlier attempt didn't export a declaration of kexec_load().

3) I noticed all that because ever since v3.5 (which shipped that commit) the headers_check utility complains about kexec.h:
[...]include/linux/kexec.h:49: userspace cannot reference function or variable defined in the kernel

(This warning isn't entirely correct, of course, as there is no matching definition of kexec_load() in the kernel tree.)

3) To me it seems this declaration of kexec_load() in kexec.h should be dropped. It serves no real purpose. Am I correct?


Copyright © 2013, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds