LWN.net Logo

KS2007: The greater kernel ecosystem and user-space APIs

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 11, 2007 18:41 UTC (Tue) by nix (subscriber, #2304)
Parent article: KS2007: The greater kernel ecosystem and user-space APIs

[...] the man pages, as currently written, document the system call interface as presented by the C library. But the API exported directly by the kernel can be different, and often is. Which API should be documented?
There's already a scheme for this, and long has been. The syscall docs go into section 2: the docs for the C interface go into section 3.


(Log in to post comments)

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 13, 2007 11:14 UTC (Thu) by michaelk (subscriber, #1978) [Link]

[...] the man pages, as currently written, document the system call interface as presented by the C library. But the API exported directly by the kernel can be different, and often is. Which API should be documented?
There's already a scheme for this, and long has been. The syscall docs go into section 2: the docs for the C interface go into section 3.
Life is not so simple, on Linux at least (and I suspect the same is true of a number of other Unix implementatons): there is a fairly close intertwining of kernel and (g)libc interfaces. Often the glibc wrapper for a system call adds nothing, or very little, on top of the kernel interface. But sometimes the wrapper makes significant changes (e.g., does some manipulation of arguments). Where that is done, the application programmer is almost always interested in the (g)libc interface, rather than the raw kernel interface. The alternative would be two have two man pages for each system call: one in section 2 describing the raw kernel interface, and one in section 3 describing the (g)libc interface. That is kind of clumsy for the following reasons:
  • often the section 3 page will describe no difference from the section 2 page (i.e., the wrapper does nothing except invoke the syscall); and
  • in cases where the wrapper does add something to the syscall, the reader needs to read two man pages to get the full picture.

My preference (already embodied in some pages), is to describe all syscalls in section 2 pages, and, if the (g)libc wrapper provides a different behavior/interface, then document that interface in the main text of the section 2 page, and include a NOTE that describes the differences for the raw kernel interface.

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 13, 2007 15:25 UTC (Thu) by nix (subscriber, #2304) [Link]

In the wrapper-and-syscall-nearly-identical case, you could describe the
differences in a NOTE in the section 3 page. It just seems clumsy to have
user-callable stuff documented in section 2: on other Unixes that's not
what it's for.

But you're the manpage maintainer and I'm just a hanger-on, so ignore
me. :)

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 16, 2007 6:31 UTC (Sun) by michaelk (subscriber, #1978) [Link]

In the wrapper-and-syscall-nearly-identical case, you could describe the differences in a NOTE in the section 3 page.
Yes, but what I want to avoid is people having to look in two places to get all the information they need. Or looking in just one of those two places and not getting all the info that they require (and not realizing that they don't have all the information, if for example they only look in the section 2 page). Ideas are always welcome!
It just seems clumsy to have user-callable stuff documented in section 2: on other Unixes that's not what it's for.
It is not clear to me other Unix implementations always have a clear .2 / .3 divide. Lacking the source, it's not easy to be sure what is done in libc before a syscall is invoked.

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 16, 2007 11:07 UTC (Sun) by nix (subscriber, #2304) [Link]

Er, why would people need to simultaneously know the details of the
kernel-level interface (only of interest to people writing libcs) and of
the POSIX interface (only of interest to people using libcs).

It seems to me that your division is of most use only to libc authors :/
everyone else will need either one half of the info, or the other half.

(Of course this is relevant only for the small minority of syscall/libc
calls that differ significantly, and as I said, I'm not doing the *work*,
so my opinion is worth basically nothing :) )

Solaris has a clear .2 / .3 divide: it's only that it then subdivides
section 3 into enough subsections that you're then left guessing which of
*those* your page might be in. Let's not do *that*. :)

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 16, 2007 15:16 UTC (Sun) by michaelk (subscriber, #1978) [Link]

Er, why would people need to simultaneously know the details of the kernel-level interface (only of interest to people writing libcs) and of the POSIX interface (only of interest to people using libcs).
The majority audience for man pages is of course userland programmers. I suppose that 99.99% (give or take a 9) of those userland programmers use a libc, rather than invoking syscalls directly, and let's say that 99% of them use glibc, and are thus interested in the glibc interface. In terms of documenting the APIs, these are the choices I see:
  1. Document the details of the system call in .2, and have .3 pages that note just the differences in the (g)libc API. I dislike this option, because the (userland) programmer must look at two pages to put together the information they need.
  2. Document the details of the system call in .2, and have .3 pages that fully document the (g)libc API, reproducing all of the details that also appeared in the corresponding .2 page. I dislike this solution because of the duplication involved. Furthermore, for the many interfaces where the glibc wrapper does nothing, the .2 and .3 pages would be exactly the same.
  3. Have .2 pages which include details of the (g)libc API, but clearly indicate those parts where the raw syscall API differs.
So far, I prefer option 3, but I realize it's not perfect, for various reasons, some of which you mention. It may be that someone comes up with a better solution than any of these three.
It seems to me that your division is of most use only to libc authors :/ everyone else will need either one half of the info, or the other half. (Of course this is relevant only for the small minority of syscall/libc calls that differ significantly, and as I said, I'm not doing the *work*, so my opinion is worth basically nothing :) )
But you're polite, and interested, so I can't help but respond ;-).
Solaris has a clear .2 / .3 divide
What I'm suggesting (and it's just a guess), is that maybe the divide on Solaris is no more real than that on Linux. Is *everything* documented in .2 on Solaris a raw syscall? Is *anything* documented in .3 in fact a syscall? I don't know the definitive answer to either question, but I wouldn't be surprised to find that the answer to both questions is "yes".

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 16, 2007 17:10 UTC (Sun) by nix (subscriber, #2304) [Link]

Well, option 2 is implementable by having the nearly-identical subset of
section 2 and 3 manpages generated from a common source (it'd be pretty
trivial to sed out markers that indicate that `this bit is section 2 only'
and `this bit is section 3 only').

But I really will shut up now until I have actual patches implementing
this (medical crud means it may be some time, biology is best observed
from a long way away).

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds