Re: Anonymous metrics collection from Firefox
[Posted February 8, 2012 by jake]
| From: |
| Ben Bucksch <ben.bucksch.news-AT-beonex.com> |
| To: |
| dev-planning-AT-lists.mozilla.org |
| Subject: |
| Re: Anonymous metrics collection from Firefox |
| Date: |
| Mon, 06 Feb 2012 20:47:27 +0100 |
| Message-ID: |
| <4F302E4F.9050308@beonex.com> |
| Archive-link: |
| Article, Thread
|
On 06.02.2012 19:56, Benjamin Smedberg wrote:
> There has been a project being worked on for some time to collect
> metrics from Firefox installations in an "on by default" manner. This
> is different from off-by-default telemetry. I became aware of this
> project recently when I was asked to review some implementation code,
> and I have some concerns about our privacy stance in this feature.
> Because the bugs are getting a bit out of hand, I wanted to move the
> discussion to the proper newsgroup.
>
> For background, the feature page (not strictly a feature page) is
> here: https://wiki.mozilla.org/MetricsDataPing
>
> Note that this page contains data from several different authors and
> isn't a coherent proposal page any more. See the wiki history for
> context if necessary.
>
> The tracking bug is
> https://bugzilla.mozilla.org/show_bug.cgi?id=718066 from which several
> other bugs (core implementation, preference UI) are available.
>
> I understand that this opt-out data collection is vastly superior than
> telemetry in terms of collecting a representative sample and
> controlling for bias. But it's not clear to me why that makes it "ok"
> from a privacy perspective, compared with telemetry, to make this
> opt-out instead of opt-in. Just from my personal experience, I would
> be surprised by any data submitted by Firefox to Mozilla which was not
> part of regular Firefox functionality (app update seems pretty
> straightforward, extension update also, crash submission is opt-in).
> It seems that if this data submission contains any information which
> is potentially personally identifying, then it would be a "surprise".
> As already identified in the bug, there are so many different ways in
> which data can be potentially identifying:
>
> * unique sets of themes (theme collection was removed)addons
> * unique sets of addons (addon collection is still proposed)
> * the unique IDs used to keep track of particular installations can
> potentially track data back to users (note that the UUID proposal has
> changed somewhat due to privacy concerns, but that there is still a
> local ID -> remote data mapping)
Thanks, Benjamin.
A few additions:
* Finterprinting: The data we submit under the current proposal from
the Metrics group is highly fingerprintable. For example, it has not
only the list of addons (which in many cases will already be unique
in its combination, or even pinpoint company association with custom
addons), but also install date of each addon.
* UUID: The "document UUID" proposal (actually simply a submission ID)
sends the previous submission ID as well, which allows the server to
trivially connect them together and still have a server-side UUID.
The submission ID may have some advantages in some cases, but it
doesn't remove the ability to track individual users.
To fingerprinting: I doubt that we really critically need all of that
data to answer the most pressing questions. More data can always be nice
and justified somehow, but it's not necessarily critical.
To UUID: I also think that there are solutions without tracking
individual users. I proposed one, one that even allows to see when users
stopped using Firefox. See
https://wiki.mozilla.org/MetricsDataPing#Anonymous_altern...
---
Another, additional way to limit the privacy impact is to only take a
representative sample. Instead of collecting the data from all of
200,000,000 users, we only pick a random (!) sample of 10,000.
Concretely: if ( ! pref.userSet()) pref.set(Math.random * 20000 > 1). If
true, submit, otherwise no data collection. Given that the sample is
random, it's guaranteed to be statistically representative.
It makes a huge difference whether you collect data from 200,000,000
people or just 10,000.
Again, you can find arguments why it's better to get a lot more data,
but when you consider the user interest of privacy, I think that's a
fair balance of needs.
---
I would like to add that this feature has a serious potential of
actively decreasing Firefox market share. Firefox is biggest in Europe,
and there still has the largest market share, from what I know. The
reason why people here in Europe use Firefox is mostly philosophical,
including privacy. It is not so much pure technical merits that wins
users, these are only the second priority. Now, if the users get the
idea that Firefox is not dramatically and fundamentally different than,
say, Google Chrome, then people see no reason to be loyal to Firefox,
and switch to Chrome.
This project will make very bad news, that is almost certain. The
Telemetry question already gave a bad impression.
This project has a very real risk of actively decreasing the market
share that it is trying to preserve.
----
There are other ways to get the needed data without offending users. I
propose to 1) remove the UUID and use the algorithm I proposed, which
still allows to gather the critically needed data, but without tracking
users, 2) remove any data which has a high likeliness of being unique
when fingerprinting 3) reduce the collected sample to a random sample of
10,000.
If all 3 are done, I would have a good conscience that this is a good
balance between need of data for produce decisions and user interests
for privacy, and I'd even be OK with an opt-out. But only if the
tracking of individual users is removed and the sample is limited to 10,000.
Ben
(
Log in to post comments)