User: Password:
|
|
Subscribe / Log in / New account

Re: Anonymous metrics collection from Firefox

From:  Justin Lebar <justin.lebar-AT-gmail.com>
To:  Daniel E <deinspanjer-AT-gmail.com>
Subject:  Re: Anonymous metrics collection from Firefox
Date:  Tue, 7 Feb 2012 10:51:53 -0500
Message-ID:  <CAFWcpZ5kPijewb3F+4MZrK-4Qe8oMNJsLs4nN1U5k_heGx-=sA@mail.gmail.com>
Cc:  dev-planning-AT-lists.mozilla.org
Archive-link:  Article

>> Now Telemetry has been very carefully designed to have privacy
>> characteristics that suit Mozilla's stated privacy principles and
>> those characteristics have been bragged about. And then another team
>> comes along, treats that design as a bug wants to send a per-user ID
>> to enable longitudinal study. If doing what this metrics feature
>> suggests to be done was OK, surely Telemetry would already have UUIDs
>> and support for "longitudinal study".
>
>We definitely spent a lot of time looking at Telemetry and working
>with that team.  The data that Telemetry collects and the purpose that
>it exists for is different though.  Telemetry was designed to enable
>developers to understand the performance characteristics of individual
>features or code paths "in the wild".  It does not require retention
>or the same sort of longitudinal data that MDP proposes to meet those
>requirements.  Putting those characteristics into Telemetry would be
>doing the very thing that several people have spoken out against,
>adding data to a system that is not directly needed by that system.
>
>There is a significant value in judiciously partitioning data by
>purpose.  It enables better policy governing the data.  It allows
>finer control over what data is collected and how it is reviewed.  It
>allows walls to be put up to prevent associations from being made
>where the organization does not wish them to be made (for instance
>tying usage data directly to crash reports).

I feel like we're talking past each other.  This does not address
Henri's point, nor my earlier point.

When Telemetry was started, we explicitly decided not to include a
UUID, not because it wouldn't be useful -- in fact, it would have been
extremely useful! -- but because we decided that doing so would have
violated our principals.

So what Henri was saying is, we decided that a UUID is not acceptable
in a ping-type thing.  This metrics ping is a ping-type thing.  And
yet we did not apply our earlier decisions about ping-type things to
the metrics ping.  Why is that?

Again, this has nothing to do with what you're going to use the
metrics ping data for, or what Telemetry was designed for.  It has
nothing to do with partitioning data.  It has nothing to do with
metrics' requirements for MDP.  And most importantly, it has nothing
to do with what is or isn't needed in order to generate useful data.

As a point of comparison, when telemetry started sending the list of
add-ons, Sid insisted we re-prompt every user who had agreed to
telemetry with new text explicitly saying we were sending the list of
add-ons.  So again, here we have a decision made about sending the
list of add-ons in a ping-type thing, that we cannot do it without
explicit permission, even for people who already opted in to data
collection.  Henri's point is that we're reversing this decision, yet
not explicitly acknowledging that we're doing so.

Opt-out, UUID, and every other piece of data in this proposed
telemetry ping needs to be shown to be consistent with our privacy
principals, absent any appeal to why it's needed.  So far, nobody has
attempted to do so in this thread.

-Justin

On Tue, Feb 7, 2012 at 9:32 AM, Daniel E <deinspanjer@gmail.com> wrote:
> On Feb 6, 8:02 pm, "David E. Ross" <nob...@nowhere.invalid> wrote:
>> An enterprise the size of Mozilla must surely have attorneys on staff or
>> retainer.  You should find out if what is proposed is legal before
>> expending any efforts to implement it.  Besides Germany, there might be
>> other nations with laws impacting on this concept.
>>
>> Furthermore, where such laws do not exist, Mozilla needs to have a firm
>> policy on how the organization would respond to a warrant or subpoena
>> for the data.  That policy must be in place before the data collection
>> begins and should address not only a government's request for the data
>> but also a request resulting from a civil lawsuit.
>>
>
> We do have a legal team, we also engaged outside legal council
> specifically on the question of European and German law for this
> project.  We have asked the legal and privacy teams to share the
> results of their reviews.
>
>
>
>
> On Feb 7, 1:28 am, "Justin Wood (Callek)" <Cal...@gmail.com> wrote:
>> Using this logic, SeaMonkey should gather all data about all users, we
>> possibly can, because we have been losing market share heavily every
>> since we became SeaMonkey from "the Mozilla Suite".
>>
>
> Please reconsider the phrase "should gather all data about all users
> we possibly can".  This project is not about gathering all data
> possible.  It has a very specific list of the minimal data that was
> determined to be required to answer the questions determined as
> necessary to answer.  There has been a lot of information shared about
> what those questions are and the justifications for most of the data
> points on other mediums such as the bugs and the wiki.  I am happy to
> continue to work toward sharing justifications and considerations for
> any of the data listed.  It is right for Mozilla and the community to
> ask for those explanations.  It is difficult to maintain a productive
> discussion where everyone has a clear picture of the facts when using
> exaggerated phrases though.
>
>>  From where I sit, the largest fault of our market share is the fact
>> that Google has heavy brand awareness, and is doing LOTS of expensive
>> advertising campaigns, and well-done in most cases. So "Google Chrome"
>> is interesting to the ignorant-of-computer users.
>>
>> Also Microsoft is (Finally) developing a Sane IE, which means less
>> reason for people to install a different web browser on Windows.
>>
>
> Both of these are great concerns that tie in to this project.  These
> changes in the market are significant changes that primarily deal with
> a large class of mainstream users that are under-represented in our
> current understanding.  These other companies are focusing a lot of
> attention on understanding how the browser is used by mainstream
> users.  We are striving to improve our own understanding.
>
> We don't want to just do things the same way as others though.  We
> have tried to develop a project that can analyze usage without
> collecting personally identifying information.  We have worked with
> the privacy and legal teams to propose policies to mitigate the
> unavoidable PII such as ensuring that IP addresses are never tied to
> the data and that we don't leave any easy way to associate identifying
> information such as an e-mail address or name with the data.  We have
> also put into the project a set of goals around giving the users
> visibility, functionality, and control of the data generated by their
> browser.
>
>
>
>
> On Feb 7, 3:25 am, Henri Sivonen <hsivo...@iki.fi> wrote:
>> ...
>> Now Telemetry has been very carefully designed to have privacy
>> characteristics that suit Mozilla's stated privacy principles and
>> those characteristics have been bragged about. And then another team
>> comes along, treats that design as a bug wants to send a per-user ID
>> to enable longitudinal study. If doing what this metrics feature
>> suggests to be done was OK, surely Telemetry would already have UUIDs
>> and support for "longitudinal study".
>
> We definitely spent a lot of time looking at Telemetry and working
> with that team.  The data that Telemetry collects and the purpose that
> it exists for is different though.  Telemetry was designed to enable
> developers to understand the performance characteristics of individual
> features or code paths "in the wild".  It does not require retention
> or the same sort of longitudinal data that MDP proposes to meet those
> requirements.  Putting those characteristics into Telemetry would be
> doing the very thing that several people have spoken out against,
> adding data to a system that is not directly needed by that system.
>
> There is a significant value in judiciously partitioning data by
> purpose.  It enables better policy governing the data.  It allows
> finer control over what data is collected and how it is reviewed.  It
> allows walls to be put up to prevent associations from being made
> where the organization does not wish them to be made (for instance
> tying usage data directly to crash reports).
>
>
>> As for the Germany/EU aspect: (Note the rest of this paragraph says
>> nothing about law. I'm not trying to play a lawyer here.) Even if
>> sending an UUID had no real privacy impact, sending an UUID would be
>> bad publicity in Europe. The usage share of Firefox is in the decline.
>> Europe in general and Germany in particular is a place where the usage
>> share of Firefox is high. It seems like a bad idea to hurt that market
>> share in order to study metrics related to it.
>
> I just want to clarify precisely what is being discussed when we say
> "sending an UUID".  MDP is generating cumulative data on the client
> and submitting that data as a document.  That document is given a new
> UUID and the client retains that document ID.  Every time a new
> submission is made, it will have a new document identifier.  It is
> even possible for the identifier to not be part of the URL (which is
> sent using SSL).  If the user wishes to delete the usage data for
> their installation, the browser submits a delete request with last
> submitted ID.  When a new document is generated on another day and
> submitted, the client also sends the old document ID to be deleted so
> that there are not two copies of the data on the server.  This allows
> us to look at retention.  If a document is older than N days, we know
> that there have been no further submissions from that installation.
> This implementation does still require policy and trust.  It requires
> that we not record IP addresses with the data set.  It requires that
> we do not longitudinally track location.  There might be further ways
> we can make it easier to follow those policies.
>
>
>
> On Feb 7, 6:19 am, Gervase Markham <g...@mozilla.org> wrote:
>> On 06/02/12 22:16, Daniel E wrote:
>>
>> > It is an unfortunate fact that even in the other data available to us
>> > today, there are occasional ways in which a user can modify their
>> > system or browser such that some private information is leaked out.
>> > One of the best examples I can give of that is the ability to change
>> > variables that are used in the update or blocklist checks.  There are
>> > requests to those systems that have an e-mail address in the place of
>> > the product name ("Firefox").  There are systems that have a changeset
>> > or bug number or username in the channel or distribution name.
>>
>> I have no reason to doubt you that this happens, but there is a big
>> difference between designing your system to request particular data, and
>> accidentally receiving some of it because a user mis-configures their
>> browser.
>>
>> If I have a web "contact me" form, and someone pastes their entire
>> medical history into it and hits Submit, I probably want to delete the
>> data - but I don't have to engineer my data handling process for content
>> coming from that form so that it's robust for handling medical data!
>>
>
> We need the legitimate data that is expected to be in those
> variables.  We are designing the system to be able to use that data.
> We do not want to be burdened by illegitimate data that is available
> as the result of a mistake on the part of a developer or user, so we
> have made sure that the system has checks and features to restrict and
> eliminate that data easily.
>
>
>> > It was critical for us when we proposed this system to have data
>> > collection that was focused on the browser installation rather than
>> > any attempt to learn anything about an individual person.
>>
>> I'm not sure that's a distinction we can make. I am the only user of my
>> browser, and I'm sure that's true of lots of other people too. What can
>> you tell about me from my list of installed add-ons? I won't give you
>> the full list, but I suspect you could tell:
>>
>> - I do web development of RESTful services using JSON
>> - I work for Mozilla
>> - I care about my privacy
>
> I believe that it is important to consider even the worst cases, but
> please keep in mind that this is not a normal case.  The system is
> designed such that it would have no way of telling that Gerv is a web
> developer who works for Mozilla and cares about privacy.  There are
> specific policies and features put in place to prevent the system from
> ever being able to associate those conclusions with a person.  We
> don't keep IP addresses with the data to prevent the possibility of
> using that IP address to identify the person using an installation.
> We use a document identifier so that even if one document ID were ever
> leaked or shared by you (say via an e-mail), the ID would change at
> the next submission so we would not be able to use that ID to look up
> the data from your installation next month and see if you still care
> about privacy.
>
>
> _______________________________________________
> dev-planning mailing list
> dev-planning@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-planning
_______________________________________________
dev-planning mailing list
dev-planning@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-planning


(Log in to post comments)


Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds