By Jake Edge
February 8, 2012
User tracking is always contentious. There are real advantages to
gathering lots of information on how an application is used, but there are also
serious drawbacks in terms of privacy. Many applications or distributions
have "opt-in" mechanisms that report back, but that makes the data somewhat
suspect because it comes from a self-selected group. But "opt-out" data
gathering is frowned upon by privacy advocates and privacy-conscious
users. As a recent discussion in the Mozilla dev-planning group shows,
though, there are some who find that the need for data may outweigh
some
privacy concerns.
Mozilla is understandably concerned with Firefox's decline in market share
and would like to try to determine what the underlying causes are. That
has led to a proposal for a feature
called MetricsDataPing that would collect a wide variety of information
about the browser, its add-ons, and how it is used. That information would
be sent to Mozilla over HTTPS each day that the browser is used.
Crucially, the proposal is that MetricsDataPing would be an opt-out feature,
which would require users to know about the feature and disable it if they
didn't want to share that data.
This stands in contrast to features like Telemetry,
which gathers data on
browser performance, but it has two crucial differences from
MetricsDataPing. First, it is opt-in so that users actively have to
enable it, and secondly, it tries to avoid gathering
any personally identifiable information (PII). It does not store IP
addresses (but does geolocate the IP address and store that) and it
generates a new ID every time the browser is restarted.
MetricsDataPing on the other hand would gather a much wider range of
information such that "fingerprinting" a user just based on the data
gathered would be a real possibility. Just a list of add-ons installed is
probably nearly unique, but adding in just the installation date for the add-on, as
MetricsDataPing does, would almost
certainly make it unique. Information about search sources used, number of
searches done, and that sort of thing also rings alarm bells for those
concerned about privacy. It also uses a "document ID" to identify the data
sent to the server, which would allow users to delete their data from the
Mozilla servers. But the document ID could also essentially serve as a
unique user ID (UUID) because the
previous document ID is always sent with the current update, so that the
older can be deleted.
There are efforts to anonymize the data that would be stored, but, as we
have seen
before, it is very difficult to truly anonymize collected data. Some of
that is also true for Telemetry, because it has added fingerprintable data after its
initial roll-out, but the key difference is that
users have willingly chosen to share that data. That's the main difficulty
that some see with the MetricsDataPing proposal. Benjamin Smedberg started off the
discussion with a posting of his concerns:
It seems as if we are saying that since we already collect most of this
data via various product features, that makes it ok to also collect this
data in a central place and attach an ID to it. Or, that because we *need*
this data in order to make the product better, it's ok to collect it. This
makes me intensely uncomfortable. At this point I think we'd be better off
either collecting only the data which cannot be used to track individual
installs, or not implementing this feature at all.
But others, especially on the Mozilla metrics team, believe that the
information gathered is critical. Blake Cutler described it this way:
The Metrics Data Ping is an attempt to apply scientific principles to
product design and development. Mozilla relies too much on gut
decisions, which directly translates to poor product decisions.
Firefox analytics are stuck in the dark ages. It shows.
Ben Bucksch made several suggestions on how to improve the privacy of the
data gathered, but he is also worried that
gathering data to figure out why Firefox usage is declining will actually
result in more users leaving because of a perception that the
browser is intruding on their privacy. While the data may be important and
useful, there are other considerations according to Justin Lebar:
Yeah, it sucks that we can't tell why people stop using Firefox. But
our [principles] are more important than that.
To that end, the discussion shouldn't center on why these metrics are
important or difficult to obtain another way. The discussion is about
whether we can at once collect the proposed metrics and stay true to
our values. If we can't, then we can't collect the data, no matter
how important it may be.
There was some discussion of technical measures to try to reduce the PII
content of the
messages, but there are still problems with things like fingerprinting. If
you gather enough information (of the kind the metrics team thinks it
needs), you are very likely to be able to track users. Even if the data is
massaged in some fashion (aggregated for example), the perception of
privacy invasion will still be present as Boris Zbarsky pointed out:
One problem is that some people will assume that if data is being sent then
it's being used, no matter what we actually do with it and say we do with
it. So if we _can_ design things such that we couldn't misuse them even if
we were to want to, we should. I understand that in general this is pretty
difficult....
Even for opt-in services like Telemetry, gathering additional information
requires user agreement. When the list
of add-ons was added to the information that Telemetry supplied, users were
required to opt back in to Telemetry after being informed of that
change. As Lebar noted: "So again, here we have a decision made about sending the
list of add-ons in a ping-type thing, that we cannot do it without
explicit permission, even for people who already opted in to data
collection." But MetricsDataPing would, seemingly, gather that
information without asking the user even once.
Early in the thread, Mike Beltzner pointed
to a posting
on the Mozilla privacy blog that committed Mozilla "to a basic policy of 'no surprises,
real choices, sensible settings, limited data, and user control'",
he said.
It's a bit hard to see how MetricsDataPing fits into that framework. For
some Linux distributions (which is probably not really where Mozilla is focused on
market share) it could easily be seen as a misfeature that should
be removed from the code—though that might lead to more "iceweasels"
due to Mozilla trademark issues.
In the end, Mozilla may need to find a way to satisfy its data needs with
an opt-in feature, or find a very convincing argument for the impossibility
of user tracking with the data it does collect. There is also the argument
that there is a subtle self-unselection bias that is introduced with an
opt-out feature. In what ways does the data get skewed by eliminating the
very privacy-conscious? It is certainly understandable that the metrics
team (and Mozilla as a whole) wants the data, but, like Linux distributions it may have to settle
for indirect measurements or some self-selection bias.
(
Log in to post comments)