Treating documentation as code

February 26, 2024

This article was contributed by Koen Vervloesem

At FOSDEM 2024, the "Tool the docs" devroom hosted several talks about free and open-source tools for writing, managing, testing, and rendering documentation. The central concept was to treat documentation as code, which makes it possible to incorporate various tools into documentation workflows in order to maintain high quality.

One software-development best practice is to have a continuous-integration (CI) setup for a project. By automatically running a formatter, linter, and tests on every code change committed to the project's repository, developers can maintain uniform code quality. In her talk "Open Source DocOps", Lorna Jane Mitchell made an argument for applying the same approach to documentation projects.

This DocOps approach, short for "documentation operations", derives from a well-known definition of DevOps, which calls it "a set of practices, tools, and a cultural philosophy that automate and integrate the processes between software development and IT teams". Instead of applying these things to software development, DocOps applies them to documentation. Mitchell characterized DocOps as "allowing documentation to be created, maintained, and published collaboratively and in an efficient manner."

Treating docs as code

DocOps builds on Docs-as-code, an approach that stipulates that documentation should be written in the same manner as code. Mitchell stressed the importance of tools and techniques borrowed from software development:

Using Git is key for many documentation workflows, and writing your documentation in a text-based markup language that is then converted into HTML or other formats simplifies integrating documentation into source control.

Additional key aspects of Docs-as-code include automated tests using continuous integration and the ability to run the same tests with local tools. "If you need to push your documentation changes to Git to see whether the tests succeed, that can't be right", Mitchell said. The process should be both fast and frictionless.

Illustrating that last point, Mitchell emphasized that all documentation writers should be able to see a preview of their changes immediately. She showed the example of an OpenAPI description of an API in Visual Studio Code. The documentation's YAML source file was positioned on the left, accompanied by a live preview of the generated documentation on the right. "Writers need to have this immediate feedback when working on documentation."

Another indispensable tool is a link checker. Many docs contain links to web sites, which can break when those sites change their internal structure or go offline. A link checker automatically visits all links within the documentation and verifies whether they're still valid. There are two approaches to do this: either it checks links in the source format or it checks them in the generated HTML files. "Either approach is valid", Mitchell said, "I like to check links in Markdown files with mlc, but you can also run an HTML link checker after the build process." She is referring to the MIT-licensed Markup Link Checker.

However, Mitchell warned not to apply link checking indiscriminately. If a continuous-integration pipeline has strict link checks and aborts the documentation build process on every broken link, this could lead to some unexpected problems when a web site experiences temporary downtime: "Don't let someone else's downtime block your build or release." She suggested some strategies that could assist in this situation. For example, restrict the scope of the link checks: "Perhaps only check internal links, or only check links in changed source files." To still be able to find all of the broken links, Mitchell recommended scheduling a comprehensive link check on a weekly basis.

Validation, formatting, and prose linting

Just like a programming language, a markup language has a syntax that should be followed. But markup mistakes in the documentation's source files can yield non-obvious results, even upon reviewing the live preview. Thus, any documentation workflow needs some validation. For Markdown, Mitchell recommended David Anson's MIT-licensed markdownlint (various other tools with the same name exist). "Markdownlint is quite configurable. And for reStructuredText, the Sphinx documentation generator includes a linter." Redocly, Mitchell's employer, has developed an MIT-licensed linter for OpenAPI documentation, Redocly CLI.

Linting a source file to ensure syntax validity is one thing, but even in valid files there's a wide range of possible ways to write something. A formatter can help in maintaining a consistent style across the source files. "A formatter adjusts things like newlines, spacing, indentation, line length, and a consistent syntax for bullet lists and tables", Mitchell explained. "Using consistent markup makes it considerably easier to spot problems in your source files."

Of course, there are concerns beyond just the markup language, as the natural language used in the text obviously also makes a difference. Other tools assist in quality assurance of the text, such as Vale. With Vale, documentation writers can catch common mistakes in their writing, such as repeated words and misspellings, but it can do a lot more than that. Mitchell noted that "Vale is especially useful for checking for the consistent use of the correct case and spelling of product names". An LWN article about Vale is coming soon.

Regardless of the tools a project uses for its documentation, Mitchell underscored that they should be seamlessly integrated within the team's processes: "Incorporate them into your workflows, and make sure to use the same tools and configuration everywhere, across all users." To maintain documentation quality, she recommended applying the tools to every pull request, ensuring that the workflow never misses any check. "Furthermore, build previews on any pull request to assist reviewers", she added.

Opening a pull request is a big feedback loop, though. Mitchell reiterated that documentation writers also need small feedback loops. "They need to have all tools available locally, ideally fully integrated into their IDE." Of course, a local setup requires the user to install the tools. Consequently, Mitchell warned, the setup of all documentation tools should be clearly written down for onboarding new writers.

Useful tools for documentation

After Mitchell's introductory talk set the stage, several other talks delved into specific tools useful for documentation writers. Jack Baldry, a software engineer at Grafana Labs, explained how his company uses Vale to enforce a consistent style across Grafana's documentation:

If you're using a style guide for your documentation, memorizing each one of those rules can be quite a challenge. Vale assists you by flagging any rule violations it detects and providing a clear explanation.

The company's Writers' Toolkit has implemented rules for Vale that show a message whenever a writer uses terms such as "modal" instead of "dialog box", opts for future tense rather than present tense, or writes "alert manager" instead of the correct "Alertmanager".

Documentation not only consists of text, but often also uses diagrams or other pictures. Frank Vanbever, an electrical engineer working at embedded systems company Mind, described how Gaphor allows documentation writers to create diagrams:

The natural way to explain a complex system has always been walking toward a whiteboard and starting to draw boxes and arrows. Gaphor allows writers to replicate this process in their documentation."

The Apache 2-licensed software implements the UML, SysML, RAAML, and C4 modeling languages, allowing documentation writers to use block diagrams, state machines, software architecture diagrams, and other visualizations of software.

Gaphor can also be integrated with the Sphinx documentation generator, enabling the documentation build system to automatically update diagrams in documentation when the source model is changed. Furthermore, the tool has a Python API, which permits model testing with pytest and makes interactive exploration possible with a web-based development environment such as JupyterLab.

The final tool introduced in the devroom was Anton Zhiyanov's Apache 2-licensed codapi. Percona's Peter Zaitsev demonstrated how to make tutorials and how-to guides interactive by adding embeddable code playgrounds using codapi. While traditional documentation merely shows some example code and its result, with codapi the code snippet becomes a dynamic, runnable entity. As soon as the reader clicks on the Run button, the code runs and updates the output in the HTML page. Readers can even edit the code snippet and rerun it to explore the differences.

Codapi has two implementations: one completely browser-based and one with a Docker container on the backend. The first implementation uses WebAssembly to run Python, PHP, Ruby, Lua, or SQLite code in a sandboxed execution environment, completely in-browser. The other implementation connects to a Docker container on a server, which can run other language environments.

Conclusion

Mitchell's presentation underscored the advantages of applying various best practices from software development to documentation workflows. Additionally, "Tool the docs" provided documentation writers with a comprehensive overview of various available FOSS tools in this domain. Hopefully, this will help convince software projects to add these tools into their workflows. Every project stands to benefit from consistent, high-quality documentation.

Index entries for this article
GuestArticles	Vervloesem, Koen
Conference	FOSDEM/2024