The present and future of formatted kernel documentation

By Jonathan Corbet
January 13, 2016

The kernel source tree comes with a substantial amount of documentation, believe it or not. Much of that can be found in the Documentation tree as a large set of rather haphazardly organized plain-text files. But there is also quite a bit of documentation embedded within the source code itself that can be extracted and presented in a number of formats. There has been an effort afoot for the better part of a year to improve the capabilities of the kernel's formatted-documentation subsystem; it's a good time for a look at the current state of affairs and where things might go.

Anybody who has spent much time digging around in the kernel source will have run across the specially formatted comments used there to document functions, structures, and more. These "kerneldoc comments" tend to look like this:

    /**
     * list_add - add a new entry
     * @new: new entry to be added
     * @head: list head to add it after
     *
     * Insert a new entry after the specified head.
     * This is good for implementing stacks.
     */

This comment describes the list_add() function and its two parameters (new and head). It is introduced by the "/**" marker and follows a number of rules; see Documentation/kernel-doc-nano-HOWTO.txt for details. Normal practices suggest that these special comments should be provided for all functions meant to be used outside of the defining code (all functions that are exported to modules, for example); some subsystems also use kerneldoc comments for internal documentation.

The documentation subsystem is able to extract these comments and render them into documents in a number of formats, including plain text, man pages, HTML, and PDF files. This can be done in a kernel source tree with a command like "make mandocs" or "make pdfdocs". There is also a copy of the formatted documentation on kernel.org; the end result for the comment above can be found on this page, for example. The results are not going to win any prizes for beautiful design, but many developers find them helpful.

Inside kernel-doc

The process of creating formatted documents starts with one of a number of "template files," found in the Documentation/DocBook directory. These files (there are a few dozen of them) are marked up in the DocBook format; they also contain a set of specially formatted (non-DocBook) lines marking the places where documentation from the source should be stuffed into the template. Thus, for example, kernel-api.tmpl contains a line that reads:

    !Iinclude/linux/list.h

The !I directive asks for the documentation for all functions that are not exported to modules. It is used rather than !E (which grabs documentation for exported functions) because the functions, being defined in a header file, do not appear in an EXPORT_SYMBOL() directive.

Turning a template file into one or more formatted documents is a lengthy process that starts with a utility called docproc, found in the scripts directory. This program (written in C) reads the template file, finds the special directives, and, for each of those directives, it does the following:

A pass through named source file is made, and each of the EXPORT_SYMBOL() directives found therein is parsed and the named function added to the list of exported symbols.
A call is made to scripts/kernel-doc (a 2,700-line Perl script) to locate all of the functions, structures, and more that are defined in the source file. kernel-doc tries to parse the C code well enough to recognize the definitions of interest; in the process, it attempts to deal with some of the kernel's macro trickery without actually running the source through the C preprocessor. It will output a list of the names it found.
docproc calls kernel-doc again, causing it to parse the source file a second time; this time, though, the output is the actual documentation for the functions of interest, with some minimal DocBook formatting added.

The formatted output is placed into the template file in the indicated spot. If the target format is HTML, the kernel-doc-xml-ref script is run to generate cross-reference links. This feature, only added in 4.3, can only generate links within one template file; cross-template links are not supported.

The final step is to run the documentation-formatting tool to actually create the files in the format of interest. Most of the time, the xmlto tool is used for this purpose, though there are some provisions in the makefile for using other tools.

In other words, this toolchain looks just like what one might expect from a documentation system written by kernel developers. It gets the basic job done but it is not particularly pretty or easy to use. It is somewhat brittle, making it easy for developers to break the documentation build without knowing it. Numerous developers have said that they have given up on trying to actually get formatted output from it; depending on one's distribution, getting all of the pieces is place is not always easy. And a lot of potentially desirable features, like cross-file links, indexing, or formatting within the in-source comments, are not present.

Formatted comments

The latter issue — adding formatting to the kerneldoc comments — has been the subject of some work in recent times. Daniel Vetter has a long-term goal of putting much more useful graphics-subsystem information into those comments, but has found the lack of formatting to be an impediment once one gets beyond documenting function prototypes. To fix that, Intel funded some work that, among other things, produced a patch set allowing markup in the comments. Nobody really wants to see XML markup in C source, though, so the patch took a different approach, allowing markup to be done using the Markdown language. Using Markdown allowed a fair amount of documentation to be moved to the source from the template file, shedding a bunch of ugly XML markup on the way.

This work has not yet been merged into the mainline. Daniel has his own hypothesis as to why:

Unfortunately it died in a bikeshed fest due to an alliance of people who think docs are useless and you should just read the code, and others who didn't even know how to convert the kerneldoc into something pretty.

Your editor (who happens to be the kernel documentation maintainer, incidentally), has a different hypothesis. Perhaps this work remains outside because: (1) it is a significant change affecting all kernel developers that shouldn't be rushed; (2) it used pandoc, requiring, on your editor's Fedora test box, the installation of 70 Haskell dependencies to run; (3) it had unresolved problems stemming from disagreements between pandoc and xmlto regarding things like XML entity escaping; and (4) a certain natural reluctance to add another step to the kernel documentation house of cards. All of these concerns led to a discussion at the 2015 Kernel Summit and a lack of enthusiasm for quick merging of this change.

All that notwithstanding, there is no doubt that there is interest in adding formatting to the kernel's documentation comments. Your editor thinks that there might be a better way to do so, perhaps involving the removal of xmlto (and DocBook) entirely in favor of a Markdown-only solution or a system like Sphinx. Unfortunately, your editor has proved to be thoroughly unable to find the time to actually demonstrate that such an approach might work, and nobody else seems ready to jump in and do it for him. Meanwhile, the Markdown patches have been reworked to use AsciiDoc (which can be thought of as a rough superset of Markdown) instead. That change gets rid of the Haskell dependency (replacing it with a Python dependency) and improves some formatting features at the cost of slowing the documentation build considerably. Even if it is arguably not the best solution, it is out there and working now.

As a result, these patches will probably be pulled into the documentation tree (and, thus, into linux-next) in the next few weeks, with an eye toward merging in 4.6 if all looks well. It has been said many times that a subsystem maintainer's first job is to say "no" to changes. Sometimes, though, the right thing is to say "yes," even if said maintainer thinks that a better solution might be possible. A good-enough solution that exists now should not be held up overly long in the hopes that vague ideas for something else might turn into real, working code.

Index entries for this article
Kernel	Documentation

The present and future of formatted kernel documentation

Posted Jan 14, 2016 2:33 UTC (Thu) by JohnLenz (guest, #42089) [Link] (1 responses)

The pandoc dependency problem is just a relic of how Fedora packages pandoc. Haskell programs are statically linked and do not require any haskell dependencies to be executed. Indeed, the debian pandoc package (https://packages.debian.org/jessie/pandoc) has as dependencies libc, libgmp, libicu, liblua, libpcre, and libyaml. Of those, I would suspect only only libicu and libyaml are not already installed on most debian machines, so pandoc is quite cheap to install and use. You could try using alien to install the debian package or look at modifying the Fedora pandoc package to have a separate binary package and development package.

The present and future of formatted kernel documentation

Posted Jan 14, 2016 22:05 UTC (Thu) by cortana (subscriber, #24596) [Link]

As a user I'm not sure I'd care very much about whether a distro-provided pandoc package uses static or dynamic linking. That's what the package manage is for!

Now, for upstream-provided packages, I'd definitely prefer static linking, or dynamic linking with all the dynamic library dependencies, bar those that are _very_ stable, shipped together. That way, when I update my distro I don't have to worry about the program stopping working.

The present and future of formatted kernel documentation

Posted Jan 14, 2016 3:56 UTC (Thu) by eternaleye (guest, #67051) [Link] (1 responses)

AsciiDoc also has the advantage of having a direct semantic relationship to DocBook proper, whereas Markdown does not. In addition, AsciiDoctor is a from-scratch Ruby reimplementation, if people prefer that.

The present and future of formatted kernel documentation

Posted Jan 14, 2016 15:01 UTC (Thu) by TheGopher (subscriber, #59256) [Link]

Agreed, the docbook conversion is what makes AsciiDoc a far better format than markdown. Markdown does have a larger toolchain though, maybe due to its prevalence on github.

The present and future of formatted kernel documentation

Posted Jan 14, 2016 14:59 UTC (Thu) by jezuch (subscriber, #52988) [Link]

> A call is made to scripts/kernel-doc (a 2,700-line Perl script)

Ouch.

How about a compiler plugin instead?

The present and future of formatted kernel documentation

Posted Jan 14, 2016 16:54 UTC (Thu) by tshow (subscriber, #6411) [Link] (7 responses)

<bikeshed>
Have you looked at NaturalDocs?

http://www.naturaldocs.org/

I find the source documentation format much more readable, and much more flexible. Your list_add in NaturalDocs would look like:

/*
* Function: list_add
* Add a new entry. Insert a new entry after the specified head. This is good for implementing stacks.
*
* Arguments:
* new - new entry to be added
* head - list head to add it after
*/

The way it works is neat; once it sees a heading it knows ("Function:" in this case), it adds the symbol ("list_add") to its list of known symbols, and then documents to the end of the comment. It extracts the first sentence after the heading as the summary, and paragraph-formats the subsequent comment text until it either runs out of comment, or until it hits another keyword-looking thing. Any symbols it knows can be linked to with angle brackets:

...
* See Also:
* <list_add>
...

The neat thing is, you could have just about anything as a heading; as long as it ends in a colon, NaturalDocs treats it as a new heading and formats accordingly. The result is source documentation that is easy to write *and* read, and isn't polluted with @param style stuff.

</bikeshed>

The present and future of formatted kernel documentation

Posted Jan 14, 2016 17:08 UTC (Thu) by corbet (editor, #1) [Link] (5 responses)

A quick grep shows 67,483 "/**" comments in the kernel source tree.

Oops — hold on — git pull — 67,827 now.

Any documentation system change that requires reformatting the existing kerneldoc comments is simply not going anywhere. At least, I don't have the courage to go to Linus with that kind of pull request...

The present and future of formatted kernel documentation

Posted Jan 14, 2016 17:20 UTC (Thu) by tshow (subscriber, #6411) [Link] (4 responses)

Fair enough.

The present and future of formatted kernel documentation

Posted Jan 15, 2016 0:07 UTC (Fri) by gerdesj (subscriber, #5446) [Link] (3 responses)

"Fair enough."

No need to give in. Simply rewrite the parser or create a preprocessor for NaturalDocs. You managed to translate the given example pretty easily. If it really is that easy then what you probably need is some sort of OS with an embarrassment of riches in tools to fiddle with text and some way of passing the input of one app to another 8)

It's either that or 70,000+ patches and attempting to re-educate rather a lot of devs which will probably end up with a damn good kicking and a lengthy stay in hospital or perhaps some negative comments. On the bright side, they'll get to you before Mr T does ...

The present and future of formatted kernel documentation

Posted Jan 15, 2016 0:32 UTC (Fri) by corbet (editor, #1) [Link] (2 responses)

The current parser is kernel-doc. If your desire is to change the backend, that is fairly easily done, yes. But then the format shown in the original comment (which was the point of that comment) is fairly well irrelevant, since only machines would see it.

The present and future of formatted kernel documentation

Posted Jan 15, 2016 1:01 UTC (Fri) by gerdesj (subscriber, #5446) [Link] (1 responses)

"The current parser is kernel-doc. If your desire is to change the backend, that is fairly easily done, yes. But then the format shown in the original comment (which was the point of that comment) is fairly well irrelevant, since only machines would see it."

There's a good chance I'm missing something here. My point was supposed to be merely that tshow seemed to be able to easily translate the input that is within the kernel source into the input that NaturalDocs requires, for a happy outcome. So if (s)he wanted NaturalDocs output then that could be achieved by a preprocessor or some other simple mechanism, activated by a switch perhaps. No need to fiddle with the source much, let alone established working practices.

Cheers
Jon

The present and future of formatted kernel documentation

Posted Jan 15, 2016 17:45 UTC (Fri) by tshow (subscriber, #6411) [Link]

My interest is more in the format of the comments in the source, because that's still where people are going to spend most of their time. Having worked with a lot of documentation generators, I still find NaturalDocs to have the most pleasant source format, both for reading and writing.

The present and future of formatted kernel documentation

Posted Jan 15, 2016 1:08 UTC (Fri) by robclark (subscriber, #74945) [Link]

> The neat thing is, you could have just about anything as a heading; as long as it ends in a colon, NaturalDocs treats it as a new heading and formats accordingly. The result is source documentation that is easy to write *and* read, and isn't polluted with @param style stuff.

tbh, I don't really think @param is the issue so much. Really we are wanting something where we can include tables/formulas/etc, and other appropriate formatting more easily in the output. (Ie. for some things we might want a big comment block w/ sub-headings/etc, or in other cases diagrams/tables/etc.)

The Sphinx suggestion in the article is interesting. We've been using it for gallium docs (http://gallium.readthedocs.org), there is some unmerged NIR docs (http://people.freedesktop.org/~cwabbott0/nir-docs/), etc. I've used it a bit and the graphviz and math plugins are quite nice. Plus it generates some nice looking output (html / html single-page / pdf / etc). I wouldn't say that I am fully versed on all the various alternatives, but I've been happy with sphinx.

No mention of doxygen?

Posted Jan 15, 2016 19:45 UTC (Fri) by bokr (guest, #58369) [Link] (1 responses)

http://www.doxygen.org/index.html
https://github.com/doxygen/doxygen

No mention of doxygen?

Posted Jun 24, 2019 22:36 UTC (Mon) by kendra (guest, #132800) [Link]

whats your point ?

The present and future of formatted kernel documentation

Posted Jan 21, 2016 20:37 UTC (Thu) by tbird20d (subscriber, #1901) [Link]

A good-enough solution that exists now should not be held up overly long in the hopes that vague ideas for something else might turn into real, working code.

Jon - I have no idea how you can call yourself a kernel maintainer with this sensible attitude. :-)