Kernel documentation

By Jonathan Corbet
November 4, 2015

2015 Kernel Summit

Kernel documentation made the list of topics for discussion during the core day at the 2015 Kernel Summit. Your editor, who has been the documentation maintainer for about one year now, led this discussion. Among other things, that means that notes from this session are, well, nonexistent, so this writeup is entirely from memory. If any inconvenient things have been left out, it's purely accidental.

One thing your editor has found over the course of the last year is that it's often not clear where the responsibility for documentation patches lies. Most kernel subsystems are well contained within their own directory subtrees — except that many of them also have files under Documentation/. Some maintainers want to manage documentation patches that relate to their subsystems, while others are happy to leave it to the documentation maintainer. There is a slow-moving effort underway to document these preferences in the MAINTAINERS file. If nothing else, that will help reduce your editor's email load; thanks to the wonders of get_maintainer.pl, he is copied on every patch that touches anything in the documentation tree.

There was a bit of talk about whether it would make sense to split the documentation out into the various subsystem trees, but people seemed to feel that it would make things harder to find. It seems that kernel developers often use the "grep the documentation subtree" technique to search for information.

The bulk of the session, though, was concerned with the structured documentation found in the DocBook subdirectory. A document here starts as a DocBook template file, which is read by the docproc utility to determine which source-code files to extract documentation comments from. Those files are passed to kernel-doc, which, using its own Perl-based C parser, finds all the symbol names of interest and passes them back to docproc. Then docproc invokes kernel-doc again to actually extract the documentation and do some basic markup. The end result is patched into the template file and passed to xmlto for formatting into HTML files, PDF files, man pages, and more.

Various functionalities have been added to this mechanism; 4.3, for example, adds a simple facility for automatically adding cross-references within a single template file. In the end, the kernel community has spent years slowly building up its own special document-formatting system. With all due respect to the people in the room, your editor said, they just might not be the right crowd for that particular project. The whole thing is a bit of a house of cards; one need not look far to find kernel developers who have given up on making this toolchain actually work.

There is a desire to develop things further, though; a current patch set adds the ability to format the in-code documentation as Markdown text. The patches adding this feature are relatively straightforward, but they depend on the pandoc utility to do the Markdown formatting. An attempt to install pandoc on a Fedora system led to a demand to drag in no less than 70 Haskell-language dependencies. Somehow that didn't seem like something the kernel community would be thrilled about.

Your editor's question was: do we want to merge that work, or maybe consider more wide-ranging changes to the documentation toolchain, preferably in a direction that uses more standard tools supported by others? There was little enthusiasm for adding a pandoc dependency, but little consensus otherwise. Linus said that, while he finds the in-code documentation comments useful, he has never seen any real point in building the formatted manuals. As far as he is concerned, that feature could just be removed. There was some agreement with that position, but others seem to find some value in the formatted documents.

The session was not particularly conclusive in the end. There is general agreement that documentation is good, and a certain preference for documentation that is maintained as comments within the code itself. For the most part, the community will continue to muddle along, producing documentation as well as it can with the time that is available.

Index entries for this article
Kernel	Documentation
Conference	Kernel Summit/2015

Kernel documentation

Posted Nov 5, 2015 4:55 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> An attempt to install pandoc on a Fedora system led to a demand to drag in no less than 70 Haskell-language dependencies.

Well, this is due to using shared libraries. If we used static libraries, it would be larger and probably have a release version of ~50 (or not as updating one of the smaller, lower packages would turn into a mass Haskell rebuild). Using stack (or cabal) might be a better option, but that just trades out the deps for GHC itself.

Kernel documentation

Posted Nov 6, 2015 5:12 UTC (Fri) by xanni (subscriber, #361) [Link] (12 responses)

Would asciidoc be a workable alternative to Markdown?

Kernel documentation

Posted Nov 6, 2015 8:32 UTC (Fri) by bronson (subscriber, #4806) [Link] (11 responses)

I think the problem is pandoc, not Markdown. I'm curious how difficult it would be to shove a simple markdown generator into that toolchain. Hoping someone knows offhand because, though the desire exists, I'm not sure where I'd find the time to play with this.

Kernel documentation

Posted Nov 6, 2015 8:54 UTC (Fri) by xanni (subscriber, #361) [Link] (1 responses)

Yes, but using the asciidoc format might obviate the need for pandoc.

Kernel documentation

Posted Nov 6, 2015 16:05 UTC (Fri) by bronson (subscriber, #4806) [Link]

Markdown doesn't need pandoc either.

Kernel documentation

Posted Nov 6, 2015 16:42 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (8 responses)

> I think the problem is pandoc, not Markdown.

I don't think the problem is pandoc. The problem is the aversion to shared library dependencies.

Yes, Haskell programs tend to use lots of libraries. The community encourages small libraries that do just one thing. As an extreme example, there is a library which defines a "Void" type—a placeholder type with no constructors, and thus no objects. Besides the type, it includes just a couple of simple functions, variations on "Void -> a".[1] This Void type is surprisingly useful, and referenced by several common libraries, so the void library is a dependency for many Haskell programs. It's a bit like having a C library just for "void*" and NULL.

If having 70 library dependencies is a problem, pandoc could always be statically linked instead. This has been the default mode for building Haskell programs for some time, but distributions tend to prefer shared libraries. This would only mean statically linking the Haskell dependencies, not any external libraries. Another option would be to gather the foundation libraries from the Haskell Platform together into a single package.

[1] For example, given a value x of type "Either Void Int" and the library function "absurd :: Void -> a" you could write "case x of { Left v -> absurd v; Right a -> a; }", which handles every case and yet is statically guaranteed to always produce the value in the Right constructor. ("Either Void Int" is isomorphic to "Int", since you can't produce the Void object needed to use the Left constructor.) Of course, you wouldn't write this directly, but it can be useful when dealing with parametric types, e.g. generic code that does error handling with the "Either err val" type. You can make the "err" type Void to indicate that errors are impossible, and if the code requires you to provide a function to handle errors, you can supply "absurd".

Kernel documentation

Posted Nov 6, 2015 21:39 UTC (Fri) by bronson (subscriber, #4806) [Link] (3 responses)

That may be true but I think you'd still get pushback if it were a single Haskell executable.

Incorporating Haskell into the Linux build system is an uphill push. Not because anyone dislikes Haskell, just because the toolchain is managed by a small group of people with a finite capability (and desire) to learn new technologies. Next someone wants to merge some Go code, and someone else some Scala...

Kernel documentation

Posted Nov 6, 2015 23:18 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (1 responses)

This would be for the documentation, not the main build step (like Perl is (was?)). And the documentation is readable in its source form (unlike raw HTML or docbook).

Kernel documentation

Posted Nov 12, 2015 2:58 UTC (Thu) by tbird20d (subscriber, #1901) [Link]

It would be nice to do some kind of "checkdoc" functionality as part of pre-testing patches prior to their acceptance into mainline. Linus mentioned that he'd like to see some documentation moved from Documentation into the source files (or at least some attempt to capture information that is now only available in source comments - besides the structure and function docs. In any event, it would be nice to keep documentation operations lightweight and avoid too many build system dependencies. I'd vote against a dependency on Haskell (although I kind of like pandoc).

Kernel documentation

Posted Nov 13, 2015 20:50 UTC (Fri) by robbe (guest, #16131) [Link]

Why would Linux (documentation) maintainers start hacking on pandoc? It's just a build tool. Neither are most Linux hackers banging on GCC code, even though it is written in C.

Now, device drivers written in Haskell, that would mean serious business.

Kernel documentation

Posted Nov 6, 2015 23:20 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> Another option would be to gather the foundation libraries from the Haskell Platform together into a single package.

This had been done, but as packages move into the platform (e.g., text), it's easier to just have a subpackage than to do the Requires/Obsoletes dance (at least in RPM). Especially if things move back out of the platform (though I don't know of any examples).

I suppose we could shared link libraries, but static link executables to minimize the need to do whole stack rebuilds on minor library updates.

Kernel documentation

Posted Nov 12, 2015 7:05 UTC (Thu) by dvdeug (guest, #10998) [Link] (2 responses)

Right now, under Debian unstable, trying to install xmonad demands 657 megabytes of space. About half of that is the single GHC package. It's not about the number of dependencies, it's about the size, at least for me. I suppose objectively 657 MB is not much, but it was impossible on my last system to spare that much space on the /usr partition, and it grates a bit to spend that on a rarely used subsystem.

I don't think the comparisons to Go and Scala are on target, as Go has a 30 MB runtime library and Scala has a 10 MB runtime library. (The JVM weighs in at over a 100 MB, so Scala is more comparable if you think Linux developers don't have the JVM installed.)

Kernel documentation

Posted Nov 12, 2015 15:59 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

> Right now, under Debian unstable, trying to install xmonad demands 657 megabytes of space. About half of that is the single GHC package.

Xmonad is a bit of a special case, though, since it's a plug-in system designed to be customized with arbitrary user-supplied Haskell code. Naturally, that means that it depends on having the GHC compiler and development versions of a number of Haskell libraries. Pandoc, on the other hand, is fairly self-contained.

On my Debian unstable system, with no /ghc/ or /haskell/ packages previously installed (other than haskell-mode, which is written in Emacs Lisp rather than Haskell), installing pandoc would require downloading 6.5 MiB and take an estimated 50 MiB of disk space. (This version appears to be statically linked; most of that size is the main binary, and there are no Haskell library dependencies.) Not a small package, to be sure, but nowhere near the 650 MiB required for xmonad. At 50 MiB it's in the same range as calibre, qemu-user, mysql-client-5.6 or gimp-data, smaller than emacs24-common, and around 1/3 the size of gcc-5. Considering the number of languages supported, that doesn't seem unreasonable.

Kernel documentation

Posted Nov 12, 2015 18:26 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

It is possible to package the default configuration as a binary which doesn't need the compiler to compile your own xmonad.hs. This is what Fedora does (xmonad-basic). Since I have my own xmonad.hs, I install ghc-xmonad-contrib-devel and xmonad-core and go from there.

Kernel documentation

Posted Nov 12, 2015 14:30 UTC (Thu) by vasvir (subscriber, #92389) [Link] (1 responses)

Obviously I am missing something but why the use of doxygen has been ruled out?

Kernel documentation

Posted Nov 19, 2015 18:47 UTC (Thu) by ksandstr (guest, #60862) [Link]

There's a few things to be said against Doxygen and tools like it.

Doxygen's output is worse than useless because it separates documentation from code. It's like the antithesis of an extreme interpretation of Knuth's literate programming.

Doxygen also encourages dirtying up otherwise clean and readable programs and header files by encouraging rules-oriented documentation, leading to things like documenting a buffer pointer and its length variable separately (as "the buffer that does X" and "the length of the buffer that does X") because The Rules say that each parameter must have Doxygen entries. The problem being that every programmer past the novice stage understands that a "x_p" and "x_len", in that order, is a matched pair to that effect; therefore documentation (Doxygen's input) making the fact explicit is often also worse than useless (noise, potentially misleading or outright wrong).

Many corporations have "coding standards" enforcing rules such as that above, and I'm guessing that if a tool like Doxygen were adopted then more would follow.

Kernel documentation

Posted Jan 8, 2016 22:23 UTC (Fri) by elvis_ (guest, #63935) [Link]

"If any inconvenient things have been left out, it's purely accidental." Comedy Gold :-)