Cross-site scripting attacks
Posted Apr 13, 2006 6:19 UTC (Thu) by jwb
Parent article: Cross-site scripting attacks
I used to hunt around the web looking for cross-site scripting attacks at famous sites (stock
brokerages, banks, ISPs, etc) in 2000 and 2001. It is very sad to me that, five years later, I can
open a web browser, use the same screening techniques, and still find XSS vulnerabilities at
major commercial sites at the rate of several per day.
To protect a site from XSS, the developer really must abandon the decade-old practice of
communicating with the browser using the print function. The days of print "<html>" are over.
That technique should be shunned by any self-respecting developer. A better
implementation, which is immune to XSS, is to build your web pages on the server side using the
Document Object Model. The DOM can then be serialized to HTML (or XML) and sent to the
Why is the DOM method immune to XSS? Suppose you have some user-provided input which
the browser and executed. If you instead use document.createTextNode(tainted), the tainted
input will be harmlessly added to the document tree as a text node (which is what you want;
there's no way for a text node to have any structural meaning in an XML document.) Later, when
you serialize the DOM to a byte stream, all text nodes will be harmlessly escaped.
Now, you might say that you want your users to be able to provide "rich" input, meaning you
want them to be able to enter a subset of HTML tags, usually for basic formatting. That's fine
and can be solved in the DOM method. You simply parse the user input into a new Document on
the server side, walk the document, and prune any nodes which are found to not be on a pre-
approved list of allowed nodeType/tagName combinations.
This may sound like a lot of programming, but it really isn't. Java, Perl, and C have perfectly
serviceable DOM implementations, and I'm sure other languages also have that feature. And
you'll find after you adopt this method that most server-side web programming is much easier.
The spaghetti code of print() calls drops away, and amazing new features, like actually removing
elements from your page, become possible.
That's just my suggestion, anyway.
to post comments)