Anonymous metrics collection from Firefox
[Posted February 8, 2012 by jake]
| From: |
| Benjamin Smedberg <benjamin-AT-smedbergs.us> |
| To: |
| "mozilla.dev.planning group" <dev-planning-AT-lists.mozilla.org> |
| Subject: |
| Anonymous metrics collection from Firefox |
| Date: |
| Mon, 06 Feb 2012 13:56:53 -0500 |
| Message-ID: |
| <4F302275.8020406@smedbergs.us> |
| Archive-link: |
| Article, Thread
|
There has been a project being worked on for some time to collect
metrics from Firefox installations in an "on by default" manner. This is
different from off-by-default telemetry. I became aware of this project
recently when I was asked to review some implementation code, and I have
some concerns about our privacy stance in this feature. Because the bugs
are getting a bit out of hand, I wanted to move the discussion to the
proper newsgroup.
For background, the feature page (not strictly a feature page) is here:
https://wiki.mozilla.org/MetricsDataPing
Note that this page contains data from several different authors and
isn't a coherent proposal page any more. See the wiki history for
context if necessary.
The tracking bug is https://bugzilla.mozilla.org/show_bug.cgi?id=718066
from which several other bugs (core implementation, preference UI) are
available.
I understand that this opt-out data collection is vastly superior than
telemetry in terms of collecting a representative sample and controlling
for bias. But it's not clear to me why that makes it "ok" from a privacy
perspective, compared with telemetry, to make this opt-out instead of
opt-in. Just from my personal experience, I would be surprised by any
data submitted by Firefox to Mozilla which was not part of regular
Firefox functionality (app update seems pretty straightforward,
extension update also, crash submission is opt-in). It seems that if
this data submission contains any information which is potentially
personally identifying, then it would be a "surprise". As already
identified in the bug, there are so many different ways in which data
can be potentially identifying:
* unique sets of themes (theme collection was removed)addons
* unique sets of addons (addon collection is still proposed)
* the unique IDs used to keep track of particular installations can
potentially track data back to users (note that the UUID proposal has
changed somewhat due to privacy concerns, but that there is still a
local ID -> remote data mapping)
A fair bit of the proposal is focused on how we would be protecting and
anonymizing the data. But if we're not actually collecting personally
identifyable data, why couldn't we make the entire server system public
and queryable? It seems that any system that requires server-side
anonymization to meet user privacy expectations is an unexpected privacy
risk. Might it also open up our users to potential tracking via court
order (search warrants) from both U.S. courts and whatever countries we
put data centers in?
It seems as if we are saying that since we already collect most of this
data via various product features, that makes it ok to also collect this
data in a central place and attach an ID to it. Or, that because we
*need* this data in order to make the product better, it's ok to collect
it. This makes me intensely uncomfortable. At this point I think we'd be
better off either collecting only the data which cannot be used to track
individual installs, or not implementing this feature at all.
Note that while Ben Bucksch has also brought up legal concerns about
whether German or European law forbids this kind of data collection, I'm
not particular interested in that portion of the discussion because very
few of us in the project are legal experts who can have an informed
opinion. So please let's avoid ratholing on those legal issues instead
of the basic privacy issue.
--BDS
(
Log in to post comments)