By Jake Edge
August 29, 2012
Data about us—our habits, associates, purchases, and so on—is
collected all the time. That's been true at smaller scales for hundreds or
even thousands of
years, but today's technology makes it much easier to gather, store, and analyze
that data. While some of the results of that analysis may make (some) people's
lives better—think tailored search results or Amazon's
recommendations—there is a strong temptation to secretly, or at least
quietly, use the collected data in other, less benign, ways.
Because the
data collection and analysis is typically done without any fanfare, it
often flies under the radar. So it
makes sense to stop and think about what it all means from a privacy
perspective.
A recent essay
by Alistair Croll does exactly that. He notes that we have reached a time
where the constraint of "big, fast, and varied—pick any two" for
databases is no longer valid. Because of that, it is common for data to be
collected without any particular plan for how it will be used, under the
assumption that some use will eventually be found. It doesn't cost that
much to do, which leads to the rise of "big data".
There are some eye-opening things that can be done using big data. It is
not difficult to determine someone's race, gender, and sexual orientation
using just the words in their Twitter or Facebook feeds, for example. Much
of that
information is completely public, and could be mined fairly easily by
banks, insurance companies, prospective employers, and so on. Those
attributes that can be derived could then be used to set rates, deny coverage,
choose to interview or not, and more.
It is easy to forget that the data collection is even happening. "Loyalty"
cards
that provide a discount at grocery and other stores gather an enormous
amount of information about our habits, for example. Deriving race,
gender, family size, and other
characteristics from that data should not be very difficult. If that
information
is used to give
discounts on other products one might be likely to buy, it may seem
relatively harmless. But if it is being sold to others to help determine
voting patterns, foreclosure likelihood, or credit-worthiness, things are
definitely amiss. But, as Croll points out, that is exactly what is
happening with that data at times.
Croll notes several different examples in his essay, but examples are not
hard to come by. Almost every day, it seems, there are new abuses, or
worries about abuses of big data. People in Texas are concerned about the
kinds of data that would be collected by "smart" electricity
meters—to the point of running off the smart meter installers. Mitt
Romney's campaign for the US Presidency is using a secretive organization
to analyze data to find
potential donors—President Obama's
campaign is certainly doing much the same.
Another example is the "anonymized" data sets that have been released for
various purposes over the past few years They show that it is quite
difficult to truly
anonymize data. When trying to derive a signal from the data (movie
recommendations for Netflix, for example), surprising correlations can be
made. This shows the power of big data even when someone is
trying not to
reveal our secrets in a data set. A new
technique may help by providing a way to release data without compromising
privacy.
The real problems may come when these disparate data sets are combined. Truly
personally identifiable information correlated from multiple sources is
likely to give a distressingly accurate picture of an individual. It could
be used by companies and other organizations for a wide range of purposes.
Those could be relatively harmless, even helpful, or downright malicious
depending on one's
perspective and privacy consciousness. One organization that is likely
quite interested in this kind of data is the same that some would like to
turn to for protection from abuses of big data: government.
There are clearly good uses that such data can be put to. Croll identifies
things like detecting and tracking disease outbreaks, improving learning,
reducing commute times, etc. But the "Big Brother" overtones are worrisome
as well. It's not at all clear how regulations would impact the collection
and analysis of big data, and governments' interest in using it (for good
or "bad" purposes) makes
for an interesting conundrum. Until and unless a solid chunk of people
are concerned about the problem—and express that concern to their
governments and to other organizations in some visible way—things
will continue much as they
are. In that, the problem is little different than many other privacy
issues; those who truly care are going to have to jealously guard their
privacy themselves, as best they can.
(
Log in to post comments)