LWN.net Logo

Closing off cross-site scripting holes

When writing web applications, it is easy to lose track of the fact that HTML is not quite the same as plain text. As a result, erroneous characters (such as an unescaped "&") can easily slip into a web page. They can result in poorly rendered pages, RSS files that fail to load, and lots of email suggesting that the author buy and read a copy of "HTML for drooling morons." Trust us, we know.

As annoying as that sort of problem can be, it fades into insignificance when compared to the other issue that arises when text is treated as if it were HTML: cross-site scripting. If an outside attacker can get your web application to present arbitrary HTML to another user, that attacker can often get the victim to disclose information or carry out an unwanted action. Cross-site scripting problems have afflicted many applications, and they are unlikely to go away anytime soon. It is just too easy for a web application programmer to slip up and let untreated text slip through.

Version 0.6 of the Quixote web application framework, which saw its first beta release last week, includes an interesting approach to the cross-site scripting problem. Quixote (which is the framework used by LWN) includes a nice "template" feature which allows an easy and natural mixing of HTML text and Python code. Text generated by a template is passed back to the web browser as an HTML document. In the current Quixote release, as in most web frameworks, text is sent directly back without processing or quoting. After all, web templates need to be able to include HTML tags in their output, and things would not work very well if those tags were quoted. Quixote provides a function for the safe quoting of untrusted text, but the programmer must remember to use it in all the relevant places. Sooner or later, most programmers forget.

Version 0.6, instead, has two kinds of text. Anything which appears in a literal, quoted string is of type "htmltext," and it is assumed to be exactly as the programmer wanted it to be (since he or she wrote it that way). Anything which takes the form of an ordinary Python string, however, is assumed to need quoting on its way to the browser; this quoting happens automatically as the template is executed.

The result is that text that comes from a database or other external source is automatically quoted, and thus can not be used for a cross-site scripting attack. The programmer no longer needs to worry about quoting every bit of text that passes through the application. This is, of course, the way things should be done from a security standpoint. Assume that everything is suspect in the absence of an explicit statement to the contrary. This approach, too, can create bugs - HTML tags may end up being quoted when they should be passed through directly. But that kind of bug is immediately evident, while a failure to quote is usually invisible - until it bites you. The new Quixote HTML template mechanism errs on the side of security and makes failures happen in the right way.


(Log in to post comments)

python language extension and active content

Posted Jan 16, 2003 8:10 UTC (Thu) by scottt (subscriber, #5028) [Link]

I think the artcle should make it more clear that quixote provides a domain specific language extension for python, implemented through the 'compiler' module in the standard library.So quixote users have a new __convenient__ notation to go with the new security feature when generating html output.

Also to convince the reader of the danger of untrusted html, it should be sufficient to simply mention 'javascript'.
A nice talk related to this topic :
"Active Content: Really Neat Technology or Impending Disaster"
http://technetcast.ddj.com/tnc_play_stream.html?stream_id=627

By the way, when can I see the source code operating lwn.net ? :)

TAL is pretty safe too

Posted Jan 16, 2003 14:10 UTC (Thu) by fergal (subscriber, #602) [Link]

Just thought I'd mention that TAL (part of Zope and also available for Perl) escapes everything by default too and you have to use the "structure" keyword in order to insert a chunk of unescaped text.

A fundamentally different approach

Posted Jan 16, 2003 17:08 UTC (Thu) by nas (subscriber, #17) [Link]

The Quixote "htmltext" approach marks data as safe at it's source. With the method you describe, data is marked safe at the point it is used. That's a very different thing (although it appears to be similar at first).

Imagine writing a template the takes arguments that could be the value of another template or data from the user, etc. When writing the template should you use the "structure" keyword? If you do, someone might pass unescaped, possibly malicious, data to it. If you don't then the template is less useful since programmers can't pass markup they know to be safe to it.

Spanish typo

Posted Jan 24, 2003 20:30 UTC (Fri) by Max.Hyre (subscriber, #1054) [Link]

Quixote HTML template machanism

Shouldnt that be ``Manchanism''? :-)

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds