Development

A common Markdown

By Nathan Willis
September 10, 2014

The Markdown text-markup format was created in 2004 by John Gruber, and has been widely adopted—especially in applications where some sort of text formatting is desirable, but full HTML is, for some reason, considered overkill. Despite its wide adoption, though, there have long been differing interpretations of various ambiguities in the canonical description of the format, leading to incompatible implementations. Now a small team of Markdown enthusiasts has decided to publish a more formal specification that can be used as a strict guidebook for implementers concerned about valid formatting.

The effort is being led by Jeff Atwood (most recently of Discourse fame), along with Etherpad creator David Greenspun, Vicent Marti of GitHub, Neil Williams of Reddit, Benjamin Dumke-von der Ehe of Stack Exchange, and John MacFarlane, creator of the pandoc text-file converter. Atwood and Greenspun first began discussing the need for a standardized Markdown specification in late 2012, after noting the incompatibilities between the various Markdown implementations available, the lack of Markdown test suites, and Gruber's apparent disinterest in leading—or even participating in—any effort to repair shortcomings in the official description of the format.

Mark questions

Gruber's description of the Markdown syntax is essentially a static document that resides at his personal site, and in many ways is informal. There is no version number and Gruber has not made additions to it, even when Markdown users have made suggestions. And the heavy users of Markdown on the web do indeed seem to find several markup options missing, which has led to several popular sites each supporting its own tweaked version of the format. Beyond taking their suggestions, many Markdown users seem to lament the fact that Gruber will not host any sort of open standard-writing process.

The GitHub flavor, for example, differs in a number of respects where code markup is concerned—such as ignoring intra-word underscore characters, which are used to indicate italics in the official Markdown, but which often need to be used variable and function names (and, thus, should not be interpreted as markup). It also adds text strikethrough, table support, and indentation. The Stack Exchange variant also supports several additions of its own, such as automatic hyperlinking of URLs and [tag: ] elements.

More fundamentally, though, there are ambiguities with respect to how Markdown should be converted to HTML, which can lead to considerable variation in the output of competing Markdown conversion tools. For example, Gruber's description of Markdown does not indicate whether or not a blank line must precede a block quote or a header element. Most implementations do require a blank line, though, because Markdown's delimiter for these elements (the ">" and "#" characters, respectively) can easily appear as the first character on the line in normal, unformatted text. This is rarely the case for many other delimiters, such as sequences of multiple punctuation characters.

There are also no formal rules describing the precedence of markers for nested elements, how combinations of block-level and inline markup should be parsed, and which elements can be nested within others. In Markdown's early days, users would look to Gruber's original Markdown-conversion Perl script as the reference implementation of sorts, but it was inconsistent and buggy enough that it did not serve that function well.

Atwood ran a relatively simple Markdown sample through MacFarlane's online Markdown comparison tool Babelmark, and counted fifteen different output variations from the 22 Markdown converters tested. Admittedly, many of those differences are minor (such as the use of blank lines and white space), and some would seem to violate common-sense expectations (such as generating an unordered list from numbered input), but the point remains that HTML output is not entirely predictable for a given Markdown input. Moreover, there is only one known Markdown test utility, MDTest, and it encodes its own take on the ambiguities for the format, so few if any implementations pass it.

Standard Markdown

In a September 3 blog post, Atwood announced the public availability of the standardization work. Called, at that point, "Standard Markdown," the effort included a specification, inline examples, and reference implementations in both C and JavaScript.

Shortly after the announcement was made public, though, news of the project evidently reached Gruber, who was less than pleased. Atwood reported on September 5 that he and MacFarlane both received an email from Gruber describing the name "Standard Markdown" as "infuriating" because it implied an official status. Gruber asked them to rename the project, as well as to shut down the site (standardmarkdown.com) without redirecting it and to apologize to him. In the post, Atwood noted that he had invited Gruber to participate in the standardization project in 2012, but never heard back, and that he had also contacted Gruber in August asking for his feedback on the soon-to-be-published Standard Markdown work, but again received no reply.

Atwood agreed to Gruber's requests for a name change and apology, and sent him a list of candidate replacement names asking for his opinion. When Gruber again did not reply, Atwood and the others announced that their new name for the project would be "Common Markdown." Unfortunately, it seems, that choice was also not acceptable to Gruber. In an addendum to the September 5 post, Atwood said that Gruber had emailed again to say "that no form of the word "Markdown" is acceptable to him in this case", and that, as a result, the project was now adopting the name CommonMark.

Gruber's apparent hostility to the notion of others contributing to the further development of Markdown is, at least in Atwood's view, not a new position. In 2009, Atwood had commented on Gruber's disinterest in making updates to Markdown, citing his "passive-aggressive interaction" with the user community as a persistent problem. It could certainly be argued that some of Atwood's comments about Gruber, particularly in 2009, exhibited a fair amount of rancor as well, but at least the public messaging about the launch of CommonMark has been almost entirely free of personal criticisms about Gruber. That may not be sufficient to earn Gruber's blessing, but the project appears to be more focused on attracting the input of other Markdown users than on winning over the format's original creator.

CommonMark

The CommonMark specification was written primarily by MacFarlane, based on the input of the other project participants. It is considerably more formal than Gruber's description of Markdown; each markup element is described in precise terms that describe not only which variations are permissible, but listing the non-permissible options as well—and showing examples of each case.

For example, the horizontal rule corresponding to the HTML <hr /> element is defined as:

A line consisting of 0-3 spaces of indentation, followed by a sequence of three or more matching -, _, or * characters, each followed optionally by any number of spaces, forms a horizontal rule.

In addition, the description includes 19 examples of correct and incorrect Markdown input along with the HTML output that compliant implementations should produce. If there was any doubt about whether or not spaces are allowed before, in between, or after the three -, _, or * characters, there is an unambiguous example of each.

Similarly, the examples show how the horizontal rule element should be interpreted when used in combination with paragraph markers, lists, headers, and other elements. Moreover, in instances where existing Markdown implementations varied in their interpretations, the CommonMark specification explains the options and provides an explanation for its choice. In some cases, interestingly enough, that justification involves a reference to published comments by Gruber.

The result of this specificity is a lengthy document (well over 10,000 words, according to one count). Apart from the effort to disambiguate the existing Markdown standard, there are few changes to be found in this initial version of CommonMark. Some of the most often-used features of site-specific Markdown variants are included, such as leaving underscores within words untouched, the ability to choose a starting number for numbered lists, and the ability to escape any punctuation character with a backslash. Other popular variations are not included, though, such as automatically creating hyperlinks on plain-text URLs.

MacFarlane notes that he has also introduced a new syntax for line breaks and fenced code blocks, because he felt these were "core" features left out of Gruber's description. Despite the paucity of changes against Gruber's Markdown, MacFarlane does leave the door open for further additions, such as footnotes and definition lists.

The reference implementations include a portable C99 library, a command-line Markdown-to-HTML converter called stmd, as well as two JavaScript converters (one meant to be linked into a page and one to be run with Node.js). All are BSD-licensed. The project's repository also includes a Perl script test utility that can be run against any Markdown interpreter. The test input for this program is intended to be the CommonMark specification document itself (which, needless to say, is written in CommonMark-compliant Markdown). Finally, the project has released an online test tool for experimenting with small Markdown snippets.

Where CommonMark will go from here remains to be seen, but the project's leaders are committed to incorporating feedback from the public at large. There is a Discourse discussion site running at talk.commonmark.org where interested parties can weigh in on the specification, ask questions, and suggest changes. In the eight days since its launch, several hundred comments have already been posted. Some ask for new features, some are concerned about complexity and convenience, while others have raised interesting meta-questions, like whether CommonMark should use its own file extension rather than Markdown's .md.

Certainly the rise of a vocal community is a good sign for the continued success of CommonMark. The trickier subject will be what would happen if CommonMark grows into a substantial challenge to Gruber's existing Markdown. A fork-and-re-merge in such situations has happened in the past, but there is no guarantee that the original Markdown will fade out or adapt, just because a popular new variant has emerged.

Comments (65 posted)

Brief items

Quote of the week

But then I looked at those ninety-eight 200 OK URLs, too.

77 reported 200 OK, but were parked domains, advertising landing pages, or otherwise completely different content. This is link rot, too, just harder for an automated system to detect. I marked these as 210 OK But Gone.

That's 205 failures, an actual link rot figure of 91%, not 57%.

That leaves only 21 URLs as 200 OK and containing effectively the same content.

— Vitorio Miliano, reporting the discouraging results of his recent experiment to document link-rot in the wild.

Comments (2 posted)

LLVM 3.5 released

Version 3.5 of the LLVM compiler system is out. There is support for a number of new architecture versions and more. "Clang makes a considerable jump forward as well, including new warnings and better support for new standards: in addition to full support for the recently completed C++’14 standard, it includes initial support for 'C++1z' features. Additionally, it now supports generating “remarks” to indicate when optimizations like vectorization and inlining occur, allowing you to tune your programs more effectively." See the release notes for more information.

Full Story (comments: none)

SystemTap 2.6 released

Version 2.6 of SystemTap is now available. Among the changes, this release adds on-the-fly arming and disarming of probes, support for multiple scripts, and many new probes.

Full Story (comments: none)

Glibc 2.20 released

Version 2.20 of the GNU C Library is now available. Significant changes include support for file-private POSIX locks, removal of support for the _BSD_SOURCE and _SVID_SOURCE feature test macros (see this article for more information), various performance improvements, and more.

Full Story (comments: 14)

Newsletters and articles

Development newsletters from the past week

What's cooking in git.git (September 9)
LLVM Weekly (September 8)
OCaml Weekly News (September 9)
OpenStack Community Weekly Newsletter (September 5)
Perl Weekly (September 8)
PostgreSQL Weekly News (September 7)
Python Weekly (September 4)
Tor Weekly News (September 10)
Wikimedia Tech News (September 8)

Comments (none posted)

Fedora Developer Announces New Partition Manager (Linux Magazine)

Linux Magazine takes a look at blivet-gui, a partition tool built from storage and configuration management tools used in Fedora’s Anaconda installer. "According to the developer, the Linux community needs a new partition tool because of all the new storage technologies that have appeared over the last few years. Traditional tools such as GParted no longer support the full range of Linux filesystem and storage options."

Comments (27 posted)

Page editor: Nathan Willis
Next page: Announcements>>