Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
LWN.net Weekly Edition for May 9, 2013
(Nearly) full tickless operation in 3.10
(Log in to post comments)
Posted Mar 30, 2012 19:18 UTC (Fri) by corbet (editor, #1)
Here's the deal...it seems there is a difference of opinion between Firefox and the Python HTMLparser module on what "<!-->" means. Firefox interprets as the beginning and the end of an HTML comment, while HTMLparser thinks it's only the beginning. That allowed our well-named prankster to slip a bit of otherwise prohibited markup past the checker.
I've done a bit of digging, and I think HTMLparser is right by the standard. The "<--" sequence starts a comment; a second "--" is needed to end it. But Firefox, at least, disagrees, with the results seen here.
We have just put in a quick patch to disable HTML comments altogether, so this particular problem should not afflict us again. Thanks to "slashdot" for bringing the problem to our attention, though it must be said we would prefer a nice email.
Posted Mar 30, 2012 19:35 UTC (Fri) by geofft (subscriber, #59789)
Of course, if you were to actually put a free license on it and put up a git repo, I wouldn't complain. :-)
Posted Mar 30, 2012 19:36 UTC (Fri) by Darkmere (subscriber, #53695)
Posted Mar 30, 2012 19:48 UTC (Fri) by slashdot (guest, #22014)
Take that, distributions who want to embargo security holes for weeks or months, despite having orders of magnitude more employees and security bug experience than LWN!
Back to the topic at hand, the fundamental issue is that you seem to be copying the HTML verbatim after a check is passed, which, as you can see, is a rather dangerous practice, because it can fail catastrophically when the check doesn't quite work as expected.
A much better approach which should guarantee security is to parse the comment, and then generate completely new HTML based on the logical parse tree, making sure the code only outputs fixed safe HTML snippets (such as "<b>") or HTML-escaped strings; finally, check that the newly generated HTML is a well-formed XHTML fragment and automatically report the issue to the site administrator if isn't, with rate limiting.
Posted Mar 30, 2012 20:00 UTC (Fri) by slashdot (guest, #22014)
Browsers use the HTML parsing rules for this page, while Python HTMLparser either uses the XHTML ones, or is simply broken.
Posted Mar 31, 2012 15:53 UTC (Sat) by pdewacht (subscriber, #47633)
Posted Mar 31, 2012 22:37 UTC (Sat) by foom (subscriber, #14868)
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds