Kernel documentation
One thing your editor has found over the course of the last year is that it's often not clear where the responsibility for documentation patches lies. Most kernel subsystems are well contained within their own directory subtrees — except that many of them also have files under Documentation/. Some maintainers want to manage documentation patches that relate to their subsystems, while others are happy to leave it to the documentation maintainer. There is a slow-moving effort underway to document these preferences in the MAINTAINERS file. If nothing else, that will help reduce your editor's email load; thanks to the wonders of get_maintainer.pl, he is copied on every patch that touches anything in the documentation tree.
There was a bit of talk about whether it would make sense to split the documentation out into the various subsystem trees, but people seemed to feel that it would make things harder to find. It seems that kernel developers often use the "grep the documentation subtree" technique to search for information.
The bulk of the session, though, was concerned with the structured documentation found in the DocBook subdirectory. A document here starts as a DocBook template file, which is read by the docproc utility to determine which source-code files to extract documentation comments from. Those files are passed to kernel-doc, which, using its own Perl-based C parser, finds all the symbol names of interest and passes them back to docproc. Then docproc invokes kernel-doc again to actually extract the documentation and do some basic markup. The end result is patched into the template file and passed to xmlto for formatting into HTML files, PDF files, man pages, and more.
Various functionalities have been added to this mechanism; 4.3, for example, adds a simple facility for automatically adding cross-references within a single template file. In the end, the kernel community has spent years slowly building up its own special document-formatting system. With all due respect to the people in the room, your editor said, they just might not be the right crowd for that particular project. The whole thing is a bit of a house of cards; one need not look far to find kernel developers who have given up on making this toolchain actually work.
There is a desire to develop things further, though; a current patch set adds the ability to format the in-code documentation as Markdown text. The patches adding this feature are relatively straightforward, but they depend on the pandoc utility to do the Markdown formatting. An attempt to install pandoc on a Fedora system led to a demand to drag in no less than 70 Haskell-language dependencies. Somehow that didn't seem like something the kernel community would be thrilled about.
Your editor's question was: do we want to merge that work, or maybe consider more wide-ranging changes to the documentation toolchain, preferably in a direction that uses more standard tools supported by others? There was little enthusiasm for adding a pandoc dependency, but little consensus otherwise. Linus said that, while he finds the in-code documentation comments useful, he has never seen any real point in building the formatted manuals. As far as he is concerned, that feature could just be removed. There was some agreement with that position, but others seem to find some value in the formatted documents.
The session was not particularly conclusive in the end. There is general
agreement that documentation is good, and a certain preference for
documentation that is maintained as comments within the code itself.
For the most part, the community will continue to muddle along, producing
documentation as well as it can with the time that is available.
Index entries for this article | |
---|---|
Kernel | Documentation |
Conference | Kernel Summit/2015 |
Posted Nov 5, 2015 4:55 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
Well, this is due to using shared libraries. If we used static libraries, it would be larger and probably have a release version of ~50 (or not as updating one of the smaller, lower packages would turn into a mass Haskell rebuild). Using stack (or cabal) might be a better option, but that just trades out the deps for GHC itself.
Posted Nov 6, 2015 5:12 UTC (Fri)
by xanni (subscriber, #361)
[Link] (12 responses)
Posted Nov 6, 2015 8:32 UTC (Fri)
by bronson (subscriber, #4806)
[Link] (11 responses)
Posted Nov 6, 2015 8:54 UTC (Fri)
by xanni (subscriber, #361)
[Link] (1 responses)
Posted Nov 6, 2015 16:05 UTC (Fri)
by bronson (subscriber, #4806)
[Link]
Posted Nov 6, 2015 16:42 UTC (Fri)
by nybble41 (subscriber, #55106)
[Link] (8 responses)
I don't think the problem is pandoc. The problem is the aversion to shared library dependencies.
Yes, Haskell programs tend to use lots of libraries. The community encourages small libraries that do just one thing. As an extreme example, there is a library which defines a "Void" type—a placeholder type with no constructors, and thus no objects. Besides the type, it includes just a couple of simple functions, variations on "Void -> a".[1] This Void type is surprisingly useful, and referenced by several common libraries, so the void library is a dependency for many Haskell programs. It's a bit like having a C library just for "void*" and NULL.
If having 70 library dependencies is a problem, pandoc could always be statically linked instead. This has been the default mode for building Haskell programs for some time, but distributions tend to prefer shared libraries. This would only mean statically linking the Haskell dependencies, not any external libraries. Another option would be to gather the foundation libraries from the Haskell Platform together into a single package.
[1] For example, given a value x of type "Either Void Int" and the library function "absurd :: Void -> a" you could write "case x of { Left v -> absurd v; Right a -> a; }", which handles every case and yet is statically guaranteed to always produce the value in the Right constructor. ("Either Void Int" is isomorphic to "Int", since you can't produce the Void object needed to use the Left constructor.) Of course, you wouldn't write this directly, but it can be useful when dealing with parametric types, e.g. generic code that does error handling with the "Either err val" type. You can make the "err" type Void to indicate that errors are impossible, and if the code requires you to provide a function to handle errors, you can supply "absurd".
Posted Nov 6, 2015 21:39 UTC (Fri)
by bronson (subscriber, #4806)
[Link] (3 responses)
Incorporating Haskell into the Linux build system is an uphill push. Not because anyone dislikes Haskell, just because the toolchain is managed by a small group of people with a finite capability (and desire) to learn new technologies. Next someone wants to merge some Go code, and someone else some Scala...
Posted Nov 6, 2015 23:18 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Nov 12, 2015 2:58 UTC (Thu)
by tbird20d (subscriber, #1901)
[Link]
Posted Nov 13, 2015 20:50 UTC (Fri)
by robbe (guest, #16131)
[Link]
Now, device drivers written in Haskell, that would mean serious business.
Posted Nov 6, 2015 23:20 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
This had been done, but as packages move into the platform (e.g., text), it's easier to just have a subpackage than to do the Requires/Obsoletes dance (at least in RPM). Especially if things move back out of the platform (though I don't know of any examples).
I suppose we could shared link libraries, but static link executables to minimize the need to do whole stack rebuilds on minor library updates.
Posted Nov 12, 2015 7:05 UTC (Thu)
by dvdeug (guest, #10998)
[Link] (2 responses)
I don't think the comparisons to Go and Scala are on target, as Go has a 30 MB runtime library and Scala has a 10 MB runtime library. (The JVM weighs in at over a 100 MB, so Scala is more comparable if you think Linux developers don't have the JVM installed.)
Posted Nov 12, 2015 15:59 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link]
Xmonad is a bit of a special case, though, since it's a plug-in system designed to be customized with arbitrary user-supplied Haskell code. Naturally, that means that it depends on having the GHC compiler and development versions of a number of Haskell libraries. Pandoc, on the other hand, is fairly self-contained.
On my Debian unstable system, with no /ghc/ or /haskell/ packages previously installed (other than haskell-mode, which is written in Emacs Lisp rather than Haskell), installing pandoc would require downloading 6.5 MiB and take an estimated 50 MiB of disk space. (This version appears to be statically linked; most of that size is the main binary, and there are no Haskell library dependencies.) Not a small package, to be sure, but nowhere near the 650 MiB required for xmonad. At 50 MiB it's in the same range as calibre, qemu-user, mysql-client-5.6 or gimp-data, smaller than emacs24-common, and around 1/3 the size of gcc-5. Considering the number of languages supported, that doesn't seem unreasonable.
Posted Nov 12, 2015 18:26 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
Posted Nov 12, 2015 14:30 UTC (Thu)
by vasvir (subscriber, #92389)
[Link] (1 responses)
Posted Nov 19, 2015 18:47 UTC (Thu)
by ksandstr (guest, #60862)
[Link]
Doxygen's output is worse than useless because it separates documentation from code. It's like the antithesis of an extreme interpretation of Knuth's literate programming.
Doxygen also encourages dirtying up otherwise clean and readable programs and header files by encouraging rules-oriented documentation, leading to things like documenting a buffer pointer and its length variable separately (as "the buffer that does X" and "the length of the buffer that does X") because The Rules say that each parameter must have Doxygen entries. The problem being that every programmer past the novice stage understands that a "x_p" and "x_len", in that order, is a matched pair to that effect; therefore documentation (Doxygen's input) making the fact explicit is often also worse than useless (noise, potentially misleading or outright wrong).
Many corporations have "coding standards" enforcing rules such as that above, and I'm guessing that if a tool like Doxygen were adopted then more would follow.
Posted Jan 8, 2016 22:23 UTC (Fri)
by elvis_ (guest, #63935)
[Link]
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation
Kernel documentation