LWN.net Logo

Anonymous metrics collection from Firefox

From:  Benjamin Smedberg <benjamin-AT-smedbergs.us>
To:  "mozilla.dev.planning group" <dev-planning-AT-lists.mozilla.org>
Subject:  Anonymous metrics collection from Firefox
Date:  Mon, 06 Feb 2012 13:56:53 -0500
Message-ID:  <4F302275.8020406@smedbergs.us>
Archive-link:  Article, Thread

There has been a project being worked on for some time to collect 
metrics from Firefox installations in an "on by default" manner. This is 
different from off-by-default telemetry. I became aware of this project 
recently when I was asked to review some implementation code, and I have 
some concerns about our privacy stance in this feature. Because the bugs 
are getting a bit out of hand, I wanted to move the discussion to the 
proper newsgroup.

For background, the feature page (not strictly a feature page) is here: 
https://wiki.mozilla.org/MetricsDataPing

Note that this page contains data from several different authors and 
isn't a coherent proposal page any more. See the wiki history for 
context if necessary.

The tracking bug is https://bugzilla.mozilla.org/show_bug.cgi?id=718066 
from which several other bugs (core implementation, preference UI) are 
available.

I understand that this opt-out data collection is vastly superior than 
telemetry in terms of collecting a representative sample and controlling 
for bias. But it's not clear to me why that makes it "ok" from a privacy 
perspective, compared with telemetry, to make this opt-out instead of 
opt-in. Just from my personal experience, I would be surprised by any 
data submitted by Firefox to Mozilla which was not part of regular 
Firefox functionality (app update seems pretty straightforward, 
extension update also, crash submission is opt-in). It seems that if 
this data submission contains any information which is potentially 
personally identifying, then it would be a "surprise". As already 
identified in the bug, there are so many different ways in which data 
can be potentially identifying:

* unique sets of themes (theme collection was removed)addons
* unique sets of addons (addon collection is still proposed)
* the unique IDs used to keep track of particular installations can 
potentially track data back to users (note that the UUID proposal has 
changed somewhat due to privacy concerns, but that there is still a 
local ID -> remote data mapping)

A fair bit of the proposal is focused on how we would be protecting and 
anonymizing the data. But if we're not actually collecting personally 
identifyable data, why couldn't we make the entire server system public 
and queryable? It seems that any system that requires server-side 
anonymization to meet user privacy expectations is an unexpected privacy 
risk. Might it also open up our users to potential tracking via court 
order (search warrants) from both U.S. courts and whatever countries we 
put data centers in?

It seems as if we are saying that since we already collect most of this 
data via various product features, that makes it ok to also collect this 
data in a central place and attach an ID to it. Or, that because we 
*need* this data in order to make the product better, it's ok to collect 
it. This makes me intensely uncomfortable. At this point I think we'd be 
better off either collecting only the data which cannot be used to track 
individual installs, or not implementing this feature at all.

Note that while Ben Bucksch has also brought up legal concerns about 
whether German or European law forbids this kind of data collection, I'm 
not particular interested in that portion of the discussion because very 
few of us in the project are legal experts who can have an informed 
opinion. So please let's avoid ratholing on those legal issues instead 
of the basic privacy issue.

--BDS


(Log in to post comments)

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds