Leaking browser history
Browser history is fairly sensitive information for most people. If there were a way for random web sites to grab a list of other sites you have visited recently, it would cause a fair amount of concern. Unfortunately, a longstanding problem in the HTML Document Object Model (DOM) makes for an information leak nearly as bad as that.
The problem stems from the handy feature that browsers implement to show you which links you have already visited. The way that they show links in a different color if you have visited them is by turning on the "visited" style for the link. Many sites, such as LWN, then change the default colors for both visited and non-visited links via the site's Cascading Style Sheet (CSS). This information gets recorded in the DOM for the page which can be queried from Javascript.
Because of the nature of the leak, scripts cannot get a full dump of the browser's history, but they can get the visited status for a set of sites they are interested in. A web site that wishes to gather this kind of information need only add a link to each site of interest—often in an unreadable font size or color—and send over a bit of Javascript to read the DOM status for each link.
While this problem has been known since at least 2002, there is no easy fix while still being compliant with the CSS standard. Because of that, most or all browsers are vulnerable. It has recently been in the news because it is being used in a benign, or at least semi-benign, way.
These days many news sites and blogs have small images that correspond to various social networking sites—digg, reddit and the like—that allow voting on particular stories or postings. Those images are buttons that register a vote or submission of the site that displays them. With the proliferation of these sites, a great deal of screen real estate was being taken up by these icons, many of which were not useful because the person viewing them never visited those particular sites.
To reduce the clutter, Aza Raskin created some Javascript code to determine which of the social networking sites a particular user had visited so that only the icons for those sites were displayed. Many people would find that to be a useful hack, one that was fairly minimally intrusive, which it is at some level. Others, with a more strict personal privacy desire, might find it more than a bit creepy.
Reducing clutter is one thing, but this technique can be used to gather much more sensitive information than which of the many social networking "news" sites you visit. It is tempting to remind readers of the NoScript Firefox extension, but it has become increasingly difficult to do nearly anything on the web without enabling Javascript. Many sites essentially hide their content behind a Javascript test, refusing to display it unless Javascript is enabled.
This makes it difficult to avoid giving away some of your browsing history to dodgy sites—or those with cross-site scripting vulnerabilities—other than by avoiding them entirely. It is an unfortunate side effect of a useful property that, as the discussion on the Mozilla bugzilla shows, will be difficult to completely eliminate. It should be noted that the links do not have to be obfuscated—by adding a dash of Javascript LWN could know whether you have visited digg or reddit. But, of course, we don't force Javascript on our readers.
Index entries for this article | |
---|---|
Security | Document Object Model (DOM) |
Security | Web browsers |
Posted Jun 26, 2008 2:39 UTC (Thu)
by cventers (guest, #31465)
[Link] (11 responses)
Posted Jun 26, 2008 3:41 UTC (Thu)
by elanthis (guest, #6227)
[Link] (9 responses)
Posted Jun 26, 2008 5:35 UTC (Thu)
by jwb (guest, #15467)
[Link] (8 responses)
Posted Jun 26, 2008 5:42 UTC (Thu)
by jhs (guest, #12429)
[Link] (3 responses)
Posted Jun 26, 2008 5:48 UTC (Thu)
by cventers (guest, #31465)
[Link]
Posted Jun 26, 2008 8:52 UTC (Thu)
by jamesh (guest, #1159)
[Link] (1 responses)
Posted Jun 27, 2008 0:34 UTC (Fri)
by wahern (subscriber, #37304)
[Link]
Posted Jun 26, 2008 13:31 UTC (Thu)
by Jonno (subscriber, #49613)
[Link]
Posted Jun 26, 2008 16:26 UTC (Thu)
by iabervon (subscriber, #722)
[Link] (2 responses)
Alternatively, browsers could say that the domain or URL of the page containing the link (or something else suitable) is part of the identity of the link for purposes of determining whether you've previously visited it, and therefore only disclose to sites whether you previously clicked on this very link, rather than disclosing whether you've visited the target at all. (In general, sites can easily collect information on which of their links you've used with an "onclick" event handler, and I don't think people expect privacy with respect to the source site there.) This change would mean that links to sites you visit from sites you haven't visited look new, but I don't think that would be an unwelcome change for users.
Posted Jun 26, 2008 20:37 UTC (Thu)
by droundy (subscriber, #4559)
[Link] (1 responses)
Posted Jun 28, 2008 19:34 UTC (Sat)
by man_ls (guest, #15091)
[Link]
Posted Jun 26, 2008 6:40 UTC (Thu)
by dlang (guest, #313)
[Link]
Posted Jun 26, 2008 5:38 UTC (Thu)
by jhs (guest, #12429)
[Link] (2 responses)
Posted Jul 3, 2008 18:58 UTC (Thu)
by aquasync (guest, #26654)
[Link] (1 responses)
Posted Jul 3, 2008 22:02 UTC (Thu)
by roc (subscriber, #30627)
[Link]
Posted Jun 26, 2008 7:12 UTC (Thu)
by ekj (guest, #1524)
[Link] (6 responses)
Posted Jun 26, 2008 7:56 UTC (Thu)
by khim (subscriber, #9252)
[Link] (5 responses)
Afterall the situation is, from the pages perspective, completely identical to a broweser with no saved history at all. Not really. If "visited" links are using significantly different fonts (20pt vs 10pt) JavaScript still can pull information about you history - by calculating sizes of visited and unvisited links. So you need to lie about geometry of object which includes link. Further you need to lie about sizes of objects, placements of objects (big "visited" link can be forced to overlap with something while small "unvisited" link will not do so), etc. It short: it's nothing like "completely identical to a broweser with no saved history at all". It's almost impossible to do this "right". The only solution is to remove all style changes from "visited" links except safe ones (color and probably nothing else) - and that's quite intrusive change...
Posted Jun 26, 2008 8:31 UTC (Thu)
by rvfh (guest, #31018)
[Link]
Posted Jun 26, 2008 9:18 UTC (Thu)
by nix (subscriber, #2304)
[Link] (3 responses)
Posted Jun 26, 2008 9:33 UTC (Thu)
by ekj (guest, #1524)
[Link] (2 responses)
Posted Jun 26, 2008 12:23 UTC (Thu)
by NAR (subscriber, #1313)
[Link]
Posted Jun 27, 2008 9:33 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted Jun 26, 2008 15:36 UTC (Thu)
by johnkarp (guest, #39285)
[Link] (1 responses)
Posted Jun 26, 2008 20:12 UTC (Thu)
by mrshiny (guest, #4266)
[Link]
Posted Jul 4, 2008 7:05 UTC (Fri)
by okeydoke (guest, #46751)
[Link]
Leaking browser history
That is an interesting problem indeed... it sounds like the appropriate
fix might involve an option in the browser's configuration so that users
can choose between having a slightly broken DOM and a slight privacy leak.
Leaking browser history
I'm not really sure it would be a "broken DOM" - it would just be an option to willfully
choose to not store history in a way accessible by the DOM. It would be no different (from
the standpoint of the DOM and JavaScript) than a browser which does not remember history at
all.
I have to admit, there are only two or three places I ever use the colored history links. And
even those are just because it's only slightly more convenient to use them than to look at the
timestamps on the link text.
Leaking browser history
This would definitely break a specified behavior of the DOM. If you call getComputedStyle you
expect to get the computed style. If you can get the computed color, then you can get the
visited or unvisited status of a link. Simple as that.
Leaking browser history
Perhaps NoScript or another extension could have a new option along the lines of "Allow
Javascript, but disable/override privacy-leaking functions in a non-standard way"? The
wording is awkward but it might be a reasonable compromise for some situations.
Leaking browser history
Konqueror does something lik this intelligently already. For "Open new
windows", you can choose "Allow", "Ask", "Deny" and "Smart". They also
have an "Allow" and "Ignore" for:
1. Resize window
2. Move window
3. Focus window
4. Modify status bar text
Presumably, they could add a 5:
5. Examine URL history
Leaking browser history
It depends on what the you consider to be privacy-leaking functions.
If the CSS visited handling remains intact, getComputedStyle() is not the only way to get at
the information. If you specify a different font size for visited links, then the dimensions
of any parent element will leak the information.
Displaying all links as non-visited is pretty much the only way of fixing the bug. Applying
the browser's visited link colour when rendering while leaving the DOM as is might be an
option, but that leads to accessibility problems for sites that change font/background colours
(i.e. almost every site).
Leaking browser history
Not all links. Just links outside the domain.
Leaking browser history
There is one solution to this problem that would not break the DOM model, but it would
introduce a loss of (minor) functionality. The browser simply don't set the 'visited'
pseudoclass to any links!
That means all links looks like they are unvisited, both in the UI and for any scripts. So the
user looses the usual visual clue to whether a link has already been visited but gains some
privacy.
Actually, it's completely conforming for a browser to simply never say a link has been visited (and render it in the :link style), or to claim to have rendered it in the non-visited style while showing it to the user in the visited style (not that this couldn't social-engineer the user into disclosing the information). See The CSS spec.
Leaking browser history
Leaking browser history
This sounds to me like a perfect solution. It maintains most of the currently used (and
useful) functionality, while at the same time closing the hole, as far as I can see. Does
anyone have an idea whether this is under discussion by the folks at mozilla?
Seconded. It could get a little annoying if link colors depended on whether I brought up www.lwn.net, lwn.net, https://lwn.net, and so on. But most of the time it would be fine, and a little heuristics (such as storing just the second level domain) would do the rest.
Leaking browser history
Leaking browser history
this wouldn't be the first time that safety requires 'breaking' the official standard of
something (just about all anti-spam functionality involves 'breaking' the initial SMTP
standards (although recent revisions may have been changed to allow current behavior)
changing the implementation so that the status of the links (and anything else that the
browser sets based on it's private information) is not qble to be queried by any code sent as
part of that page is a smart thing to do, and once it's done by a few browsers it will get
written into the next version of the standard (as an optional mode of operation)
as noted, the result will not look any different to the page then if the browser didn't have
any relavent history.
NoScript does help
Well, the thing about NoScript is you whitelist only the sites you trust (or at least, sites
which you have to use regularly). After you build up a whitelist for a week or so, the web is
basically usable again.
Having said that, I eventually disabled NoScript for personal use since it is indeed quite a
price to pay. (I still use Flashblock, however.) But the security benefits are real. Just
because it's a bit much for home use doesn't mean it's not a good component of a
defense-in-depth strategy for Government, Military, or some other sensitive situation.
NoScript does help
While I haven't tested it, I'd presume its also possible to harvest this information server
side, solely with CSS, by accessing uniquely named zero sized images in the appropriate
styles.
NoScript does help
That's correct.
Leaking browser history
Would it be possible for the browser to lie without much consequence ?
Couldn't the browser -actually- render links according to their visited-status, but
nevertheless in the javascript-accessible DOM *pretend* that all links are unvisited ?
Okay, so that migth make it technically noncompliant, but from the point of view of the page,
everything should work perfectly.
Afterall the situation is, from the pages perspective, completely identical to a broweser with
no saved history at all.
It's not as simple as that.
It's not as simple as that.
Thanks for that: this was the main information missing from the article...
But that means there is really no solution then, except not remembering which sites were
visited at all?
Mind you, the browser could apply (and report) the same style to visited and not visited
links, and add some local information (like special background) in the rendering of the
page...
Are there any sites relying on the visited links to render their page? Like: if you visited
the 'New stuff' page then the news are not the latest but the greatest kind of thing? I expect
some must do?
It's not as simple as that.
But 99.9% of users do not change anything but the link colour. Having it lie about that alone
seems perfectly practical (yes, if the page modifies its background colour such that all links
are visible even if the user has changed their colour, that code will break, but *that* is
done by essentially nobody).
It's not as simple as that.
You misunderstand. It's not what the -user- has changed. It's what the -site- has changed.
If I design a malicious site with the goal of spying out which pages you have visited, there's
nothing stopping ME from specifying that visited-links should be in 20pt and nonvisited in
10pt, and then use javascript to figure out which ones fall in which category. (the links can
be invisible to you, say by having the background-color or being in a div that has
display:none.
I'm with the other reply: Render -ALL- links as if they're nonvisited, then apply some local
(not influenced by site-CSS) decoration-change that is invisible in the DOM.
Yes, this means site lose the possibility of specifying how visited links should look. An
advantage anyway since most use that only to set them equal to nonvisited in the first place,
because it offends "designers" to have lists of links that have two different looks.
It's not as simple as that.
I might be missing something, but e.g. Opera has a feature to disable the site CSS and use a
"user CSS" and this would eliminate this bug. Of course, in this case the web starts to look
quite differently than it used to do :-)
It's not as simple as that.
Ah, yes. I must agree with that. I actually use a per-user stylesheet to
force visited links to be rendered as I like, dammit, specifically because
of annoying sites like you describe.
Leaking browser history
Couldn't you add a data tainting mechanism to JS/DOM, such that the client
side can use the history, but nothing derived from it can be sent to a
server?
Leaking browser history
The problem is that you can deduce the status of visited links indirectly without accessing
the link in the dom. This is because a link which contains text is rendered in a way that
takes up space on the page. If a visited link changes the size of its container you'd be able
to deduce that a link was visited by examining the container. You'd need to taint the entire
dom at that point.
Just use SafeCache and
SafeHistory
Leaking browser history