|
|
Log in / Subscribe / Register

Fedora and fallback DNS servers

Fedora and fallback DNS servers

Posted Feb 25, 2021 17:23 UTC (Thu) by excors (subscriber, #95769)
In reply to: Fedora and fallback DNS servers by atnot
Parent article: Fedora and fallback DNS servers

I think the lesson from the web is that interoperability comes from tolerantly handling invalid input, *and* precisely specifying how consumers must handle that invalid input (and having the major implementations follow that specification). The second part is crucial but is missing from Postel's law.

E.g. HTML4 parsing (as implemented in the real world) had the tolerance but not the specification. Early browsers made up their own rules for error handling in an attempt at DWIM, web sites started accidentally relying on those rules, then new browsers would break on those sites and (in order to remain competitive and stop users defecting) had to reverse-engineer and emulate the old browsers' behaviour. Similar for HTTP headers and other parts of the web platform. That led to many interoperability issues, and to security and privacy issues (because nobody could understand how the browsers actually behaved, so they couldn't work out a coherent security model and verify whether the browsers followed it).

XHTML didn't have the tolerance - browsers would completely refuse to display an ill-formed document. But a vanishingly small minority of sites actually used XHTML properly (virtually everyone sent it with Content-Type: text/html which meant it got parsed as HTML4 instead, relying on the browsers' error handling to cope with e.g. "<br/>" which is not valid HTML4). And almost every dynamically-generated site that used XHTML properly (as application/xhtml+xml) could be broken by e.g. users posting a comment containing a U+FFFF, which is not allowed in XML but the sites didn't realise and would happily print it out again, thus completely breaking the page for every user (and, in some cases, also breaking the admin pages that were needed to delete the offending comment).

I suspect the main reason that browsers didn't implement a tolerant XHTML parser (which would make the browser significantly more usable on many sites, giving it a competitive advantage and attracting more users) is that nobody used XHTML so it wasn't worth the bother. It's not a good case study for the benefits of rejecting invalid inputs.

HTML5 precisely specified the HTML4 error-handling behaviour. You can pass /dev/urandom into any browser and it should get parsed the same way, and that way is based on the original browsers' DWIM behaviour. There are test suites to verify that, and browser developers care about following the specification, and there's little competitive advantage in violating the specification and DWIMing differently, so the implementations converged and that has greatly increased interoperability. And then it became possible to analyse security/privacy issues by looking at the specification (which is much easier than untangling the logic from source code) and verifying that it follows some proposed security model - it doesn't automatically solve the issues but it makes it possible to reason about them and begin to address them comprehensively. A similar process has happened with HTTP etc.

(Then the web added a zillion more features, and the sheer quantity and complexity means that interoperability is very hard again. But at least it's been largely solved in specific areas.)

I think that lesson applies to most protocol-like technologies for large-scale communication between independent implementations. It doesn't apply to e.g. programming languages, where the person who writes the invalid input can immediately see a fatal error message and fix it themselves. It may not apply to configuration files, since interoperability isn't particularly relevant there, though there's possibly a similar dynamic between the person providing the invalid input (misconfiguring the DHCP/DNS server or whatever) and the person who just wants to get their work done and who doesn't have a good way to convince the first person to fix the issue and will eventually switch to a different distro that doesn't keep getting in their way.


to post comments

Fedora and fallback DNS servers

Posted Feb 25, 2021 17:54 UTC (Thu) by Sesse (subscriber, #53779) [Link] (5 responses)

> HTML5 precisely specified the HTML4 error-handling behaviour. You can pass /dev/urandom into any browser and it should get parsed the same way, and that way is based on the original browsers' DWIM behaviour.

“Should” is the word. I've read HTML5 parsers full of comments like “the spec says this, but Firefox does it differently, so we have to oblige”.

Fedora and fallback DNS servers

Posted Feb 26, 2021 6:49 UTC (Fri) by roc (subscriber, #30627) [Link] (4 responses)

Where?

The guy who owns the Gecko HTML5 parser is VERY diligent about avoiding this sort of thing. I can let him know.

Fedora and fallback DNS servers

Posted Feb 26, 2021 7:29 UTC (Fri) by Sesse (subscriber, #53779) [Link] (1 responses)

I no longer have access to the code in question, sorry. The point is that even in HTML5, you cannot assume consistent bug-by-bug compatibility of tag soup parsing.

Fedora and fallback DNS servers

Posted Feb 26, 2021 11:10 UTC (Fri) by roc (subscriber, #30627) [Link]

If you say so, but FWIW, my impression is that HTML parsing differences are far down the list of issues that cause compatibility problems.

Fedora and fallback DNS servers

Posted Feb 26, 2021 14:25 UTC (Fri) by jkingweb (subscriber, #113039) [Link] (1 responses)

My own experience agrees with this. Last year I found a bug in the Encoding spec test suite, which resulted in bugs being filed for Gecko, WebKit, and Chromium, because they passed the incorrect test. The Gecko and WebKit bugs were fixes promptly. The Chromium bug, unsurprisingly, is still open. They don't seem to care whether they decode characters the same as everyone else (they have tons of decoder bugs), so I wouldn't be surprised if that extends up to parsing chain. Mozilla, though? Doesn't seem to be a problem.

Maybe Sesse was referring to code which predated the parsing test suite, however.

Fedora and fallback DNS servers

Posted Feb 26, 2021 15:57 UTC (Fri) by excors (subscriber, #95769) [Link]

If I remember correctly, there is very little "code which predated the [HTML5] parsing test suite" - the first reasonably-comprehensive test suites (including tests for a lot of the error handling) were developed in parallel with the first public parser implementation (html5lib, I think?) and in parallel with the specification itself. That was valuable for detecting and fixing any unspecified or ambiguous behaviour in the specification, and then the specification plus test suites were a strong foundation for the subsequent browser implementations, which at least in Mozilla's case was basically a from-scratch rewrite.

I'm sure it wasn't perfect and there were still bugs, and probably things have changed a lot since I last looked at it seriously (a worryingly large number of years ago), but my impression at the time was that it was very successful at achieving interoperability across all the browsers and several non-browser parser implementations. (And it was enormously more successful than HTML4's approach of "here's the specification of a valid document, and how browsers should handle it. Huh, invalid document? Why would anyone do that? Just fix your document" and XHTML's approach of "Invalid document? YELLOW SCREEN OF DEATH".)

(Of course parsing is only a tiny part of the web platform, and probably one of the easiest parts for this kind of comprehensive specification and testing because it's a nice self-contained platform-independent linear transformation from bytes to a tree of elements (ignoring fiddly bits like document.write). But similar principles were applied with some success to other parts of the platform too, and I think the lesson is that it's a significant improvement over Postel's law.)

Fedora and fallback DNS servers

Posted Feb 25, 2021 18:49 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (3 responses)

> (virtually everyone sent it with Content-Type: text/html which meant it got parsed as HTML4 instead, relying on the browsers' error handling to cope with e.g. "<br/>" which is not valid HTML4)

To clarify: That technically *is* valid HTML4, it's just the wrong HTML4. Formally, it's equivalent to <br>> (i.e. the tag ends at the slash, as part of a more general <foo/bar/ syntax which is allegedly "easier" than writing <foo>bar</foo>), but I think approximately three people in the entire history of the universe have actually wanted it to be interpreted that way, so all the browsers cheated and ignored the slash. Then XHTML came along and said "actually, you need the slash" and made everything even worse (because as you say, everyone was serving XHTML with text/html and it was then getting parsed as HTML4).

Then HTML5 came along and had to fix this mess. So they decided to specify that the slash is optional and has no semantic meaning (i.e. <br> and <br/> are exactly equivalent), and while they were going to the trouble of doing that, they also specified that </br> is illegal (the tag is always empty, so no need to close it), as is <p/> (arbitrary self-closing tags are not supported). Both of those misfeatures had been legal in XHTML, but approximately nobody had been using them, so it was reasonably safe to just yank them from the spec before they could turn into an attractive nuisance.

Fedora and fallback DNS servers

Posted Feb 25, 2021 22:31 UTC (Thu) by pbonzini (subscriber, #60935) [Link] (2 responses)

Maybe not <p/>, but <td/> was certainly quite common in XHTML.

Fedora and fallback DNS servers

Posted Feb 26, 2021 0:42 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (1 responses)

HTML5 solves that problem by specifying that you can just write <td> and then a closing tag is inferred at the next <td> or <th>, or at the end of the <tr>. This is not considered "error handling," either. It's perfectly legal to omit the closing tag, and the result is considered a well-formed HTML5 document. This was certainly never the case in XHTML, although I'm not sure how HTML4 handled this sort of chicanery.

If you do include the slash, it just strips it out and emits a parse error, so you would end up with an unclosed <td>. But as discussed in the previous paragraph, that <td> will probably close itself anyway, and so it hardly matters.

Incidentally, you can also do this with <p>, meaning you can write prose like this:

<p>
Here is a paragraph of text...
<p>
Here is a second paragraph...
<p>
[and so on]

This is also considered well-formed HTML5.

Fedora and fallback DNS servers

Posted Feb 26, 2021 1:20 UTC (Fri) by jkingweb (subscriber, #113039) [Link]

> HTML5 solves that problem by specifying that you can just write <td> and then a closing tag is inferred at the next <td> or <th>, or at the end of the <tr>. This is not considered "error handling," either. It's perfectly legal to omit the closing tag, and the result is considered a well-formed HTML5 document. This was certainly never the case in XHTML, although I'm not sure how HTML4 handled this sort of chicanery.

This has been a design feature of HTML from its earliest days. Many end tags are optional, as are some start tags, including those for html, body, and tbody.

That last is perhaps lesser-known: in an HTML (but not XHTML) document, <tr> is never a child of <table>; there is always an implicit tbody (or explicit thead or tfoot) element in between.

Fedora and fallback DNS servers

Posted Feb 25, 2021 23:14 UTC (Thu) by Wol (subscriber, #4433) [Link]

And then you throw PHBs into the mix.

Many moons ago, in the days back when ISPs actually knew what they were doing, our ISP upgraded their email servers, and they started sending "250 EHLO". Our MS Mail Gateway threw a hissy fit, and the PHB demanded that our ISP "fix" their sendmail, despite MS Mail completely ignoring the SMTP spec, namely that you MUST NOT quit in response to a command you don't recognise.

Cheers,
Wol

Fedora and fallback DNS servers

Posted Feb 26, 2021 6:51 UTC (Fri) by roc (subscriber, #30627) [Link]

As a former Mozilla distinguished engineer --- this is exactly right.

Fedora and fallback DNS servers

Posted Mar 6, 2021 17:58 UTC (Sat) by anton (subscriber, #25547) [Link]

It doesn't apply to e.g. programming languages, where the person who writes the invalid input can immediately see a fatal error message and fix it themselves.
If only programming languages guaranteed a fatal error message on invalid input. We have that for syntax and so-called "static semantics" (things beyond context-free grammars that are checked by the compiler). But then there are run-time errors, which may be seen by a different person. And then there is undefined behaviour, where a new version of the compiler that the code was tested with might compile the code different than the old version; or worse, the same version of a library might choose to behave differently on some hardware than on the tested hardware (happened with memcpy).


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds