LWN.net Logo

Cross-site scripting attacks

Cross-site scripting attacks

Posted Apr 13, 2006 6:19 UTC (Thu) by jwb (guest, #15467)
Parent article: Cross-site scripting attacks

I used to hunt around the web looking for cross-site scripting attacks at famous sites (stock
brokerages, banks, ISPs, etc) in 2000 and 2001. It is very sad to me that, five years later, I can
open a web browser, use the same screening techniques, and still find XSS vulnerabilities at
major commercial sites at the rate of several per day.

To protect a site from XSS, the developer really must abandon the decade-old practice of
communicating with the browser using the print function. The days of print "<html>" are over.
That technique should be shunned by any self-respecting developer. A better
implementation, which is immune to XSS, is to build your web pages on the server side using the
Document Object Model. The DOM can then be serialized to HTML (or XML) and sent to the
browser.

Why is the DOM method immune to XSS? Suppose you have some user-provided input which
contains malicious javascript. If you print(tainted), that malicious code will be send verbatim to
the browser and executed. If you instead use document.createTextNode(tainted), the tainted
input will be harmlessly added to the document tree as a text node (which is what you want;
there's no way for a text node to have any structural meaning in an XML document.) Later, when
you serialize the DOM to a byte stream, all text nodes will be harmlessly escaped.

Now, you might say that you want your users to be able to provide "rich" input, meaning you
want them to be able to enter a subset of HTML tags, usually for basic formatting. That's fine
and can be solved in the DOM method. You simply parse the user input into a new Document on
the server side, walk the document, and prune any nodes which are found to not be on a pre-
approved list of allowed nodeType/tagName combinations.

This may sound like a lot of programming, but it really isn't. Java, Perl, and C have perfectly
serviceable DOM implementations, and I'm sure other languages also have that feature. And
you'll find after you adopt this method that most server-side web programming is much easier.
The spaghetti code of print() calls drops away, and amazing new features, like actually removing
elements from your page, become possible.

That's just my suggestion, anyway.


(Log in to post comments)

Cross-site scripting attacks

Posted Apr 13, 2006 6:39 UTC (Thu) by Dom2 (guest, #458) [Link]

Personally, I think our tools our to blame. I wrote The Wrong Defaults a little while back to try and explain why.

-Dom

Cross-site scripting attacks

Posted Apr 13, 2006 6:52 UTC (Thu) by jwb (guest, #15467) [Link]

Having read your blog entry, it seems like you would agree that something like
document.createTextNode() does the right thing by default, no? If you stick to the DOM, there's no
way to inadvertently do something stupid. Everything, stupid or otherwise, is done explicitly.

Regarding your example of SQL placeholders, even that wisdom has not trickled down to the great
programming masses. The vast majority of PHP code out there in wild builds up SQL queries using
string concatenation and explicit escaping. Usually this means no or insufficient escaping. PHP
only recently acquired a decent interface for interacting with SQL databases, and the use of it is not
yet widespread.

Cross-site scripting attacks

Posted Apr 13, 2006 9:23 UTC (Thu) by Dom2 (guest, #458) [Link]

Yes, document.createTextNode() does do the right thing. But I was thinking more in terms of server side solutions like PHP, ASP and JSP. They default to "insecure".

-Dom

Cross-site scripting attacks

Posted Apr 13, 2006 14:48 UTC (Thu) by kingdon (subscriber, #4526) [Link]

Yes, yes, yes! Thank you for saying this.

Some systems that get the quoting right: DOM, tinytemplate, XmlWriter, Amrita (a ruby template engine), probably a few others.

Some systems that get the quoting wrong: jsp, velocity, rhtml (a ruby template engine, alas more popular than Amrita), print statements, m4 (or anything else not specific to XML/HTML), etc, etc, etc.

Maybe others can augment these lists with some of the popular engines out there for python and others.

Cross-site scripting attacks

Posted Apr 13, 2006 20:49 UTC (Thu) by iabervon (subscriber, #722) [Link]

For many applications, it's nicer to just have a print function that quotes everything it gets, and a separate printTag function that can be used to insert non-text. It's an easier conversion than switching to DOM, doesn't inherently require that the whole document be stored at once, and still fails safely (i.e., if you call printTag on a non-tag, it gives you and error; if you call print on a tag, it gives you the escaped version; either way, bugs in the normal case are quick to find and in attacks nothing happens).

The harder thing is actually cases where you want to permit some markup but not scripts, especially if what you're accepting is HTML fragments. (Not that people don't often screw up the easy cases.)

Of course, these problems should really be called HTML injection attacks, since they're essentially the same as SQL injection attacks: some content which is supposed to be a string literal is treated as structure. Of course, trying to do the equivalent of a prepared statement would be a bit less practical (use AJAX to get each variable region as a separate request and insert it as the appropriate type?).

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds