User: Password:
|
|
Subscribe / Log in / New account

Re: Anonymous metrics collection from Firefox

From:  Ben Bucksch <ben.bucksch.news-AT-beonex.com>
To:  dev-planning-AT-lists.mozilla.org
Subject:  Re: Anonymous metrics collection from Firefox
Date:  Mon, 06 Feb 2012 20:47:27 +0100
Message-ID:  <4F302E4F.9050308@beonex.com>
Archive-link:  Article

On 06.02.2012 19:56, Benjamin Smedberg wrote:
> There has been a project being worked on for some time to collect 
> metrics from Firefox installations in an "on by default" manner. This 
> is different from off-by-default telemetry. I became aware of this 
> project recently when I was asked to review some implementation code, 
> and I have some concerns about our privacy stance in this feature. 
> Because the bugs are getting a bit out of hand, I wanted to move the 
> discussion to the proper newsgroup.
>
> For background, the feature page (not strictly a feature page) is 
> here: https://wiki.mozilla.org/MetricsDataPing
>
> Note that this page contains data from several different authors and 
> isn't a coherent proposal page any more. See the wiki history for 
> context if necessary.
>
> The tracking bug is 
> https://bugzilla.mozilla.org/show_bug.cgi?id=718066 from which several 
> other bugs (core implementation, preference UI) are available.
>
> I understand that this opt-out data collection is vastly superior than 
> telemetry in terms of collecting a representative sample and 
> controlling for bias. But it's not clear to me why that makes it "ok" 
> from a privacy perspective, compared with telemetry, to make this 
> opt-out instead of opt-in. Just from my personal experience, I would 
> be surprised by any data submitted by Firefox to Mozilla which was not 
> part of regular Firefox functionality (app update seems pretty 
> straightforward, extension update also, crash submission is opt-in). 
> It seems that if this data submission contains any information which 
> is potentially personally identifying, then it would be a "surprise". 
> As already identified in the bug, there are so many different ways in 
> which data can be potentially identifying:
>
> * unique sets of themes (theme collection was removed)addons
> * unique sets of addons (addon collection is still proposed)
> * the unique IDs used to keep track of particular installations can 
> potentially track data back to users (note that the UUID proposal has 
> changed somewhat due to privacy concerns, but that there is still a 
> local ID -> remote data mapping)

Thanks, Benjamin.

A few additions:

  * Finterprinting: The data we submit under the current proposal from
    the Metrics group is highly fingerprintable. For example, it has not
    only the list of addons (which in many cases will already be unique
    in its combination, or even pinpoint company association with custom
    addons), but also install date of each addon.
  * UUID: The "document UUID" proposal (actually simply a submission ID)
    sends the previous submission ID as well, which allows the server to
    trivially connect them together and still have a server-side UUID.
    The submission ID may have some advantages in some cases, but it
    doesn't remove the ability to track individual users.


To fingerprinting: I doubt that we really critically need all of that 
data to answer the most pressing questions. More data can always be nice 
and justified somehow, but it's not necessarily critical.

To UUID: I also think that there are solutions without tracking 
individual users. I proposed one, one that even allows to see when users 
stopped using Firefox. See 
https://wiki.mozilla.org/MetricsDataPing#Anonymous_altern...

---

Another, additional way to limit the privacy impact is to only take a 
representative sample. Instead of collecting the data from all of 
200,000,000 users, we only pick a random (!) sample of 10,000. 
Concretely: if ( ! pref.userSet()) pref.set(Math.random * 20000 > 1). If 
true, submit, otherwise no data collection. Given that the sample is 
random, it's guaranteed to be statistically representative.
It makes a huge difference whether you collect data from 200,000,000 
people or just 10,000.

Again, you can find arguments why it's better to get a lot more data, 
but when you consider the user interest of privacy, I think that's a 
fair balance of needs.

---

I would like to add that this feature has a serious potential of 
actively decreasing Firefox market share. Firefox is biggest in Europe, 
and there still has the largest market share, from what I know. The 
reason why people here in Europe use Firefox is mostly philosophical, 
including privacy. It is not so much pure technical merits that wins 
users, these are only the second priority. Now, if the users get the 
idea that Firefox is not dramatically and fundamentally different than, 
say, Google Chrome, then people see no reason to be loyal to Firefox, 
and switch to Chrome.

This project will make very bad news, that is almost certain. The 
Telemetry question already gave a bad impression.

This project has a very real risk of actively decreasing the market 
share that it is trying to preserve.

----

There are other ways to get the needed data without offending users. I 
propose to 1) remove the UUID and use the algorithm I proposed, which 
still allows to gather the critically needed data, but without tracking 
users, 2) remove any data which has a high likeliness of being unique 
when fingerprinting 3) reduce the collected sample to a random sample of 
10,000.

If all 3 are done, I would have a good conscience that this is a good 
balance between need of data for produce decisions and user interests 
for privacy, and I'd even be OK with an opt-out. But only if the 
tracking of individual users is removed and the sample is limited to 10,000.

Ben


(Log in to post comments)


Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds