Wiki markup isn't too bad
Wiki markup isn't too bad
Posted Aug 14, 2025 12:23 UTC (Thu) by t-8ch (subscriber, #90907)In reply to: Wiki markup isn't too bad by KJ7RRV
Parent article: Arch shares its wiki strategy with Debian
It's very complex and internally inconsistent, having grown through ad-hoc changes to the parsing logic over the years.
When I researched wikitext parsers a few years ago, the only comprehensive and robust one was Parsoid.
It has been developed by the Wikimedia foundation since 2012 and ships with newer MediaWiki installations out of the box. It can convert between wikitext and annotated HTML which is then very easy to handle with any HTML library.
As an example for the complexity of wikitext, Parsoid needs multiple parsing phases. Around five if I recall correctly.
Posted Aug 18, 2025 14:27 UTC (Mon)
by paravoid (subscriber, #32869)
[Link] (2 responses)
Parsoid is supposed to become the default for read view on Wikipedias by July 2026, and the default for MediaWiki 1.47 LTS (Nov 2026), with the legacy parser to be ultimately deprecated in 2028. These timelines have slipped multiple times before, and the language the WMF folks use to announce them is... careful ("tentantively scheduled", "we hope", etc.), so don't hold your breath. Fortunately one can already benefit from it by using VisualEditor, including on the new Debian wiki.
https://wikimedia.eventyay.com/talk/wikimania2025/talk/UU... and, linked from there, https://docs.google.com/presentation/d/198_UG5VmHYMoO_38s... is probably the most recent update from the project.
Posted Aug 18, 2025 21:04 UTC (Mon)
by smurf (subscriber, #17840)
[Link] (1 responses)
Posted Aug 22, 2025 18:50 UTC (Fri)
by cscott (guest, #178938)
[Link]
And that's the case for wikitext, for sure. Parsoid has been in production use since 2012, and has powered all the mobile apps for almost as long. Many many other WMF projects have been using Parsoid HTML for a decade. Before we completely ditch the old legacy parser, however, we need to make sure that 99.<some number of nines>% of Wikipedia's >100M pages are bug-for-bug compatible, because we take seriously our duty as custodians of the knowledge base which is Wikipedia and the WMF projects. At this point you might consider our work more 'archivist' then 'engineer', in that our main effort isn't the parser, per se, but preserving the rendering of existing articles.
The goal of Parsoid is to render pages into well-specified semantic HTML, which preserves all the meaningful information (template boundaries, template arguments, invisible constructs, etc) of the original wikitext. This isn't *just* to allow us to use an HTML editor and round-trip back to the original wikitext, it also paves the way for other editors and markup languages in the future: as long as it can round-trip to and from "MediaWiki DOM Spec" HTML, you can use it to edit wikipedia.
More information: https://en.wikipedia.org/wiki/User:Cscott/Ideas/A_Dozen_V...
Wiki markup isn't too bad
Wiki markup isn't too bad
Wiki markup isn't too bad