Fedora and fallback DNS servers
Fedora and fallback DNS servers
Posted Feb 25, 2021 17:23 UTC (Thu) by excors (subscriber, #95769)In reply to: Fedora and fallback DNS servers by atnot
Parent article: Fedora and fallback DNS servers
E.g. HTML4 parsing (as implemented in the real world) had the tolerance but not the specification. Early browsers made up their own rules for error handling in an attempt at DWIM, web sites started accidentally relying on those rules, then new browsers would break on those sites and (in order to remain competitive and stop users defecting) had to reverse-engineer and emulate the old browsers' behaviour. Similar for HTTP headers and other parts of the web platform. That led to many interoperability issues, and to security and privacy issues (because nobody could understand how the browsers actually behaved, so they couldn't work out a coherent security model and verify whether the browsers followed it).
XHTML didn't have the tolerance - browsers would completely refuse to display an ill-formed document. But a vanishingly small minority of sites actually used XHTML properly (virtually everyone sent it with Content-Type: text/html which meant it got parsed as HTML4 instead, relying on the browsers' error handling to cope with e.g. "<br/>" which is not valid HTML4). And almost every dynamically-generated site that used XHTML properly (as application/xhtml+xml) could be broken by e.g. users posting a comment containing a U+FFFF, which is not allowed in XML but the sites didn't realise and would happily print it out again, thus completely breaking the page for every user (and, in some cases, also breaking the admin pages that were needed to delete the offending comment).
I suspect the main reason that browsers didn't implement a tolerant XHTML parser (which would make the browser significantly more usable on many sites, giving it a competitive advantage and attracting more users) is that nobody used XHTML so it wasn't worth the bother. It's not a good case study for the benefits of rejecting invalid inputs.
HTML5 precisely specified the HTML4 error-handling behaviour. You can pass /dev/urandom into any browser and it should get parsed the same way, and that way is based on the original browsers' DWIM behaviour. There are test suites to verify that, and browser developers care about following the specification, and there's little competitive advantage in violating the specification and DWIMing differently, so the implementations converged and that has greatly increased interoperability. And then it became possible to analyse security/privacy issues by looking at the specification (which is much easier than untangling the logic from source code) and verifying that it follows some proposed security model - it doesn't automatically solve the issues but it makes it possible to reason about them and begin to address them comprehensively. A similar process has happened with HTTP etc.
(Then the web added a zillion more features, and the sheer quantity and complexity means that interoperability is very hard again. But at least it's been largely solved in specific areas.)
I think that lesson applies to most protocol-like technologies for large-scale communication between independent implementations. It doesn't apply to e.g. programming languages, where the person who writes the invalid input can immediately see a fatal error message and fix it themselves. It may not apply to configuration files, since interoperability isn't particularly relevant there, though there's possibly a similar dynamic between the person providing the invalid input (misconfiguring the DHCP/DNS server or whatever) and the person who just wants to get their work done and who doesn't have a good way to convince the first person to fix the issue and will eventually switch to a different distro that doesn't keep getting in their way.
