By Nathan Willis
June 26, 2013
Back in February, Mozilla implemented
a new cookie policy for Firefox, designed to better protect users'
privacy—particularly with respect to third-party user tracking.
That policy was to automatically reject cookies that originate from
domains that the user had not visited (thus catching advertisers and
other nefarious entities trying to sneak a cookie into place). The
Electronic Frontier Foundation (EFF) lauded
the move as "standing up for its users in spite of powerful
interests," but also observed that cross-site tracking cookies
were low-hanging fruit compared to all of the other methods used to
track user behavior.
But as it turns out, nipping third-party cookies in the bud is not
as simple as Mozilla had hoped. The policy change was released in
Firefox nightly builds, but Mozilla's Brendan Eich pointed
out in May that the domain-based approach caused too many false
positives and too many false negatives. In other words,
organizations that deliver valid and desirable cookies from a
different domain name—such as a content distribution network (CDN)—were being unfairly blocked by the policy, while sites that a user
might visit only once—perhaps even accidentally—could set
tracking cookies indefinitely. Consequently, the patch was held back
from the Firefox beta channel. Eich commented that the project was
looking for a more granular solution, and would post updates within
six weeks about how it would proceed.
Six weeks later, Eich has come back with a description
of the plan. Seeing how the naïve has-the-site-been-visited policy
produced false positives and false negatives, he said, Mozilla
concluded that an exception-management system was required. But the
system could not rely solely on the user to manage
exceptions—Eich noted that Apple's Safari browser has a similar
visited-based blocking system that advises users to switch off the
blocking functionality the first time they encounter a false
positive. This leaves a blacklist/whitelist approach as the
"only credible alternative," with a centralized service
to moderate the lists' contents.
Perhaps coincidentally, on June 19 (the same day Eich posted his
blog entry), Stanford University's Center for Internet and Society (CIS)
unveiled just such a centralized cookie listing system, the Cookie Clearinghouse (CCH). Mozilla
has committed to using the CCH for its Firefox cookie-blocking policy,
and will be working with the CCH Advisory Board (a group that also
includes a representative from Opera, and is evidently still sending
out invitations to other prospective members). Eich likens the CCH exception
mechanism to Firefox's anti-phishing
features, which also use a whitelist and blacklist maintained
remotely and periodically downloaded by the browser.
Codex cookius
As the CCH site describes it, the system will publish blacklists
(or "block lists") and whitelists (a.k.a. "allow lists") based on
"objective, predictable criteria"—although it is
still in the process of developing those criteria.
There are already four presumptions made about how browsers will
use the lists. The first two more-or-less describe the naïve
"visited-based"
approach already tried by Safari and Firefox: if a user has visited a
site, allow the site to set cookies; do not set cookies originating
from sites the user has not visited. Presumption three is that if a
site attempts to save a Digital Advertising Alliance "opt out cookie," that
opt-out cookie (but not others) will be set. Presumption four is
that cookies should be set when the user consents to them. The site
also notes that CCH is contemplating adding a fifth presumption to
address sites honoring the Do Not Track (DNT) preference.
Obviously these presumptions are pretty simple on their own; the
real work will be in deciding how to handle the myriad sites that fall
into the false-match scenarios already encountered in the wild. To
that end, CCH reports that it is in the initial phase of drawing up
acceptance and blocking criteria, which will be driven by the Advisory
Board members. The project will then hammer out a file format for the
lists, and develop a process that sites and users can use to challenge
and counter-challenge a listing. Subsequently, the lists will be
published and CCH will oversee them and handle the challenge process.
The site's FAQ page
says the project hopes to complete drawing up the plans by Fall 2013
(presumably Stanford's Fall, that is).
The details, then, are pretty scarce at the moment. The Stanford
Law School blog posted
a news item with a bit more detail, noting that the idea for the CCH
grew out of the CIS team's previous experience working on DNT.
Indeed, that initial cookie-blocking patch to Firefox, currently
disabled, was written by a CIS student affiliate.
Still, this is an initiative to watch. Unlike DNT, which places
the onus for cooperation solely on the site owners (who clearly have
reasons to ignore the preference), CCH enables the browser vendor to
make the decision about setting the cookie whether the site likes it
or not. The EFF is right to point out that cookies are not required
to track users between sites, but there is no debating the fact that
user-tracking via cookies is widespread. The EFF Panopticlick illustrates
other means of uniquely identifying a particular browser, but it is
not clear that anyone (much less advertisers in particular) use
similar tactics.
To play devil's advocate, one might argue that widespread adoption
of CCH would actually push advertisers and other user-tracking vendors
to adopt more surreptitious means of surveillance, arms-race style.
The counter-argument is that ad-blocking software—which is also
purported to undermine online advertising—has been widely
available for years, yet online advertising has shown little sign of
disappearing. Then again, if Firefox or other browsers adopt CCH
blacklists by default, they could instantly nix cookies on millions of
machines—causing a bigger impact than user-installed ad
blockers.
The other challenges in making CCH a plausible reality include the
overhead of maintaining the lists themselves. The anti-phishing
blacklists (as well as whitelists like the browser's root
Certificate Authority store) certainly change, but the sheer
number and variety of legitimate sites that employ user-tracking
surely dwarfs the set of malware sites. Anti-spam blacklists, which
might be a more apt comparison in terms of scale, have a decidedly
mixed record.
It is also possible that the "input" sought from interested parties
will complicate the system needlessly. For example, the third
presumption of CCH centers around the DAA opt-out cookie, but there
are many other advertising groups and data-collection alliances out
there—a quick search for "opt out cookie" will turn up an
interesting sample—all of whom, no doubt, will have their own
opinion about the merits of their own opt-out system. And that does
not even begin to consider the potential complexities of the planned
challenge system—including the possibility of lawsuits from
blacklisted parties who feel their bottom-line has been unjustly harmed.
Privacy-conscious web users will no doubt benefit from any policy
that restricts cookie setting, at least in the short term. But the
arms-race analogy is apt; every entity with something to gain by
tracking users through the web browser is already looking for the next
way to do so more effectively. Browser vendors can indeed stomp out
irritating behavior (pop-up ads spring to mind), but they cannot
prevent site owners from trying something different tomorrow.
Comments (15 posted)
By Nathan Willis
June 26, 2013
The WebRTC standard is much hyped for its ability to
facilitate browser-to-browser media connections—such as placing video
calls without the need to install a separate client application.
Relying solely on the browser is certainly one of WebRTC's strengths,
but another is the fact that the media stream is delivered directly
from one client to the other, without making a pit-stop at the
server. But direct connections are available for arbitrary binary data
streams, too, through WebRTC's RTCDataChannel. Now the first applications to take advantage of this
feature are beginning to appear, such as the Sharefest peer-to-peer
file transfer tool.
Changing the channel
WebRTC data channels are an extension to the original WebRTC
specification, which
focused first on enabling two-way audio/video content delivery in
real-time. The general WebRTC framework remains the same; sessions
are set up using the WebRTC session layer, with ICE
and STUN used to
connect the endpoints across NAT. Likewise, the RTCPeerConnection
is used as a session control channel, taking care of signaling,
negotiating options, and other such miscellany. The original
justification for RTCDataChannel was that multimedia applications also
needed a conduit to simultaneously exchange ancillary information,
anything from text chat messages to real-time status updates in a
multiplayer game.
But data and real-time media are not quite the same, so WebRTC
defines a separate channel for the non-multimedia content. While
audio/video streams are delivered over Real-time Transport Protocol
(RTP), a RTCDataChannel is delivered over Stream Control Transmission
Protocol (SCTP), tunneled over UDP using Datagram TLS (DTLS). Using SCTP
enables a TCP-like reliability on top of UDP.
After all, if one video chat frame in an RTP conversation arrives too
late to be displayed, it is essentially worthless; arbitrary
binary data, however, might require retransmission or error handling.
That said, there is no reason that a web application must implement
a media stream in order to use an RTCDataChannel. The API is
deliberately similar to Web Sockets, with the obvious difference being
that the remote end of a RTCDataChannel is another browser, rather
than a server. In addition, RTCDataChannels use DTLS to encrypt the
channel by default, and SCTP provides a mechanism for congestion control.
Obviously there are other ways to send binary data from one browser
to another, but the WebRTC API should be more efficient, since the
clients can choose whatever (hopefully compact) format they wish, and
use low-overhead UDP to deliver it. At a WebRTC meetup in March,
developer Eric Zhang estimated
[SlideShare] that encoding data in
Base64 for exchange through JSON used about 37% more bandwidth than
sending straight binary data over RTCDataChannels.
Browser support for RTCDataChannels is just now rolling out.
Chrome and Chromium support the feature as of version 26 (released in
late March); Firefox currently requires a nightly build.
Sharefest
Sharefest is a project started by a video streaming company called
Peer5. A blog post introducing
the tool says the impetus was two coworkers who needed to share a
large file while in a coffee shop, but Dropbox took ages to transfer
the file to the remote server then back again. With RTCDataChannels, however, the transfer would be considerably faster since it
could take place entirely on the local wireless LAN. A few months
later, Sharefest was unveiled on GitHub.
The Sharefest server presents an auto-generated short-URL for each
file shared by the "uploading" (so to speak) party. When a client
browser visits the URL, the server
connects it to the uploader's browser and the uploader's browser
transfers the file over an RTCDataChannel. If multiple clients connect
from different locations, the server attempts to pair nearby clients
to each other to maximize the overall throughput; thus for a
persistent sharing session Sharefest does offer bandwidth savings for
the sharer.
Currently the size limit for shared files is
around 1GB, though there is an issue open to
expand this limit. A demo server is running at sharefest.me (continuing the
near-ubiquitous domination of the .me domain for web-app
demos). The server side is written in Node.js, and the entire work is
available under the Apache 2.0 license.
Naturally, only the recent versions of Chrome/Chromium and Firefox
are supported, although the code detects the browser and supplies a
friendly message illustrating the problem for unsupported browsers.
This is particularly important because the RTCDataChannel API is not yet
set in stone, and there are several pieces of WebRTC that are
implemented differently in the two browsers—not differences distinct
enough to break compatibility; mostly just varying default
configuration settings.
Sharefest is certainly simple to use; it does require that the uploading
browser keep the Sharefest page open, but it allows straightforward
"ephemeral" file sharing in a (largely) one-to-many fashion. From the
user's perspective, probably the closest analogy would be to Eduardo
Lima's FileTea, which
we covered in 2011. But FileTea is
entirely one-to-one; Sharefest does its best to maximize efficiency
by dividing each file into "slices" that the downloading peers exchange
with each other.
This is useful, of course, but to many people the term
"peer-to-peer" effectively means BitTorrent, which is a considerably
more complex distribution model than Sharefest supports (supporting,
for example, super-seeding and distributed hash trackers). One could
implement the existing BitTorrent protocol on top of RTCDataChannels, perhaps, and pick up some nice features like DTLS encryption
along the way, but so far no one seems to have taken that challenge
and run with it.
But there are others pursuing RTCDataChannels as a file
distribution framework. Chris Ball, for example, has written a serverless
WebRTC file-sharing tool, although it still requires sharing the
STUN-derived public IP address of the uploader's machine, presumably
off-channel in an email or instant message. The PeerJS library is a project that may
evolve in the longer term to implement more pieces of the file-sharing
puzzle.
In the meantime, for those curious to test out the new API, Mozilla
and Google have publicized several other RTCDataChannel examples
involving games. But as Zhang noted in his presentation, direct
client-to-client data transfer is likely to have more far-reaching
effects than that: people are likely to write libraries for file I/O,
database drivers, persistent memory, etc. The potential is there for
many assumptions about the web's client-server nature to be broken for
good.
Comments (25 posted)
By Nathan Willis
June 26, 2013
"Open government" is a relatively recent addition to the "open"
family; it encompasses a range of specific
topics—everything from transparency issues to open access to
government data sets. Some—although arguably not all—of
these topics are a natural fit with the open source software movement,
and in the past few years open source software has taken on a more
prominent role within open government. The non-profit Knight Foundation just
awarded a sizable set of grants to open government efforts, a number
of which are also fledgling open source software projects.
Like a lot of large charitable foundations, the Knight Foundation
has a set of topic areas in which it regularly issues grant money.
The open government grants just announced were chosen through the
foundation's "Knight
News Challenge," which deals specifically with news and media
projects, and runs several themed rounds of awards each year. The Open
Government round accepted entries in February and March, and the winning
projects were announced
on June 24.
Of the eight grant awardees, three are open source projects
already, with a fourth stating its intention to release its source
code. OpenCounter
is a web-based application that cities can deploy as a self-service
site for citizens starting up small businesses. The application walks
users through the process of registering and applying for the various
permits, filling out forms, and handling other red tape that can be both
time-consuming and confusing. The live version currently online is the site
developed specifically for the city of Santa Cruz, California as part
of a fellowship with the non-profit Code For America, but the source is available on
GitHub. GitMachines
is a project to build virtual machine images that conform to the
often-rigorous requirements of US government certification and
compliance policies (e.g., security certification), requirements which many off-the-shelf VMs evidently do not meet. The
project lead is the former founder of Sunlight Labs (another open
government software-development non-profit), and some of the source is already
online.
Plan
in a Box is not yet available online, because the proposal is an
effort to take several of the existing free software
tools developed by OpenPlans and
merge them into a turn-key solution suitable for smaller city and
regional governments without large IT staffs. OpenPlans's existing
projects address things like
crowdsourced mapping (of, for example, broken streetlights),
transportation planning, and community project management. Several of
OpenPlans's lead developers are Code For America alumni. Procure.io
is an acquisitions-and-procurement platform also developed by a former
Sunlight Labs veteran. At the moment, the Procure.io URL redirects to
a commercial service, although a "community version" is available on Github
and the project's grant proposal says the result will be open source.
There are also two "open data" projects among the grant winners,
although neither seems interested in releasing source code. The first
is Civic
Insight, a web-based tool to track urban issues like abandoned
houses. It originated as the "BlightStatus" site for New
Orleans, and was written during a Code For America
fellowship—although now its creators have started a for-profit company.
The Oyez Project at the
Chicago-Kent College of Law currently maintains a lossless,
high-bit-rate archive (in .wav format) of all of the US Supreme Court's audio
recordings; its grant
is supposed to fund expanding the archive to federal appellate courts
and state supreme courts.
The other grant winners include Open
Gov for the Rest of Us, which is a civic campaign to increase
local government engagement in low-income neighborhoods, and Outline.com,
a for-profit venture to develop a policy-modeling framework.
Outline.com is being funded as a start-up venture through a different
Knight Foundation fund. In addition to those eight top-tier winners,
several other minor grants were announced, some of which are also open
source software, like the Open Source Digital Voting Foundation's TrustTheVote.
Free money; who can resist?
Looking at the winners, there are several interesting patterns to
note. The first is that some of the projects sharply demarcate the
differences between open source and open data. There is obviously a
great deal of overlap in the communities that value these two
principles, but they are not inherently the same, either. One can
build a valuable public resource on top of an open data set (in a
sense, "consuming" the open data, like BlightStatus), but if
the software to access it remains non-free, it is fair to question its
ultimate value. On the flip side, of course, an open source
application that relies on proprietary data sets might also be
susceptible to criticism that it is not quite free enough. In either
case, the closed component can impose restrictions that limit its
usefulness; worse yet, if the closed component were to disappear,
users would be out of luck.
Perhaps more fundamentally, it is also interesting to see just how
much the grants are dominated by former members of Sunlight Labs and
Code For America. Both groups are non-profits, with a special
emphasis on open source in government and civic software development.
But neither has a direct connection to the Knight Foundation; they
simply have built up a sustainable process for applying and winning
grant money. And perhaps there is a lesson there for other, existing
free software projects; considering how difficult crowdfunding and
donation-soliciting can be in practice, maybe more large projects
ought to put serious effort into pursuing focused grants from
non-profit organizations.
To be sure, the Knight News Challenge for Open
Government is a niche, and might be well outside of the scope of many
open source projects—but there are certainly some existing open
source projects that would fit. It is also not the only avenue for
private grant awards. In fact, the Knight Foundation is just ramping
up its next
round of the Knight News Challenge grant contest; this one focuses on
healthcare
and medicine.
Comments (none posted)
Page editor: Nathan Willis
Inside this week's LWN.net Weekly Edition
- Security: Verifying the source code for binaries; New vulnerabilities in java, mozilla, python-swift, xen, ...
- Kernel: 3.10 statistics; Tracing triggers; Polling block drivers
- Distributions: Which bug tracker?; Xen4ContOS, ...
- Development: Naming, shaming, and patch review; Daala; systemd and control groups; Firefox 22; ...
- Announcements: FANTEC guilty of GPL infringement in Germany, KACST joins TDF advisory board, ...
Next page:
Security>>