LWN.net Logo

LWN.net Weekly Edition for June 27, 2013

Mozilla versus the cookie monster

By Nathan Willis
June 26, 2013

Back in February, Mozilla implemented a new cookie policy for Firefox, designed to better protect users' privacy—particularly with respect to third-party user tracking. That policy was to automatically reject cookies that originate from domains that the user had not visited (thus catching advertisers and other nefarious entities trying to sneak a cookie into place). The Electronic Frontier Foundation (EFF) lauded the move as "standing up for its users in spite of powerful interests," but also observed that cross-site tracking cookies were low-hanging fruit compared to all of the other methods used to track user behavior.

But as it turns out, nipping third-party cookies in the bud is not as simple as Mozilla had hoped. The policy change was released in Firefox nightly builds, but Mozilla's Brendan Eich pointed out in May that the domain-based approach caused too many false positives and too many false negatives. In other words, organizations that deliver valid and desirable cookies from a different domain name—such as a content distribution network (CDN)—were being unfairly blocked by the policy, while sites that a user might visit only once—perhaps even accidentally—could set tracking cookies indefinitely. Consequently, the patch was held back from the Firefox beta channel. Eich commented that the project was looking for a more granular solution, and would post updates within six weeks about how it would proceed.

Six weeks later, Eich has come back with a description of the plan. Seeing how the naïve has-the-site-been-visited policy produced false positives and false negatives, he said, Mozilla concluded that an exception-management system was required. But the system could not rely solely on the user to manage exceptions—Eich noted that Apple's Safari browser has a similar visited-based blocking system that advises users to switch off the blocking functionality the first time they encounter a false positive. This leaves a blacklist/whitelist approach as the "only credible alternative," with a centralized service to moderate the lists' contents.

Perhaps coincidentally, on June 19 (the same day Eich posted his blog entry), Stanford University's Center for Internet and Society (CIS) unveiled just such a centralized cookie listing system, the Cookie Clearinghouse (CCH). Mozilla has committed to using the CCH for its Firefox cookie-blocking policy, and will be working with the CCH Advisory Board (a group that also includes a representative from Opera, and is evidently still sending out invitations to other prospective members). Eich likens the CCH exception mechanism to Firefox's anti-phishing features, which also use a whitelist and blacklist maintained remotely and periodically downloaded by the browser.

Codex cookius

As the CCH site describes it, the system will publish blacklists (or "block lists") and whitelists (a.k.a. "allow lists") based on "objective, predictable criteria"—although it is still in the process of developing those criteria.

There are already four presumptions made about how browsers will use the lists. The first two more-or-less describe the naïve "visited-based" approach already tried by Safari and Firefox: if a user has visited a site, allow the site to set cookies; do not set cookies originating from sites the user has not visited. Presumption three is that if a site attempts to save a Digital Advertising Alliance "opt out cookie," that opt-out cookie (but not others) will be set. Presumption four is that cookies should be set when the user consents to them. The site also notes that CCH is contemplating adding a fifth presumption to address sites honoring the Do Not Track (DNT) preference.

Obviously these presumptions are pretty simple on their own; the real work will be in deciding how to handle the myriad sites that fall into the false-match scenarios already encountered in the wild. To that end, CCH reports that it is in the initial phase of drawing up acceptance and blocking criteria, which will be driven by the Advisory Board members. The project will then hammer out a file format for the lists, and develop a process that sites and users can use to challenge and counter-challenge a listing. Subsequently, the lists will be published and CCH will oversee them and handle the challenge process. The site's FAQ page says the project hopes to complete drawing up the plans by Fall 2013 (presumably Stanford's Fall, that is).

The details, then, are pretty scarce at the moment. The Stanford Law School blog posted a news item with a bit more detail, noting that the idea for the CCH grew out of the CIS team's previous experience working on DNT. Indeed, that initial cookie-blocking patch to Firefox, currently disabled, was written by a CIS student affiliate.

Still, this is an initiative to watch. Unlike DNT, which places the onus for cooperation solely on the site owners (who clearly have reasons to ignore the preference), CCH enables the browser vendor to make the decision about setting the cookie whether the site likes it or not. The EFF is right to point out that cookies are not required to track users between sites, but there is no debating the fact that user-tracking via cookies is widespread. The EFF Panopticlick illustrates other means of uniquely identifying a particular browser, but it is not clear that anyone (much less advertisers in particular) use similar tactics.

To play devil's advocate, one might argue that widespread adoption of CCH would actually push advertisers and other user-tracking vendors to adopt more surreptitious means of surveillance, arms-race style. The counter-argument is that ad-blocking software—which is also purported to undermine online advertising—has been widely available for years, yet online advertising has shown little sign of disappearing. Then again, if Firefox or other browsers adopt CCH blacklists by default, they could instantly nix cookies on millions of machines—causing a bigger impact than user-installed ad blockers.

The other challenges in making CCH a plausible reality include the overhead of maintaining the lists themselves. The anti-phishing blacklists (as well as whitelists like the browser's root Certificate Authority store) certainly change, but the sheer number and variety of legitimate sites that employ user-tracking surely dwarfs the set of malware sites. Anti-spam blacklists, which might be a more apt comparison in terms of scale, have a decidedly mixed record.

It is also possible that the "input" sought from interested parties will complicate the system needlessly. For example, the third presumption of CCH centers around the DAA opt-out cookie, but there are many other advertising groups and data-collection alliances out there—a quick search for "opt out cookie" will turn up an interesting sample—all of whom, no doubt, will have their own opinion about the merits of their own opt-out system. And that does not even begin to consider the potential complexities of the planned challenge system—including the possibility of lawsuits from blacklisted parties who feel their bottom-line has been unjustly harmed.

Privacy-conscious web users will no doubt benefit from any policy that restricts cookie setting, at least in the short term. But the arms-race analogy is apt; every entity with something to gain by tracking users through the web browser is already looking for the next way to do so more effectively. Browser vendors can indeed stomp out irritating behavior (pop-up ads spring to mind), but they cannot prevent site owners from trying something different tomorrow.

Comments (15 posted)

Sharefest, WebRTC, and file distribution

By Nathan Willis
June 26, 2013

The WebRTC standard is much hyped for its ability to facilitate browser-to-browser media connections—such as placing video calls without the need to install a separate client application. Relying solely on the browser is certainly one of WebRTC's strengths, but another is the fact that the media stream is delivered directly from one client to the other, without making a pit-stop at the server. But direct connections are available for arbitrary binary data streams, too, through WebRTC's RTCDataChannel. Now the first applications to take advantage of this feature are beginning to appear, such as the Sharefest peer-to-peer file transfer tool.

Changing the channel

WebRTC data channels are an extension to the original WebRTC specification, which focused first on enabling two-way audio/video content delivery in real-time. The general WebRTC framework remains the same; sessions are set up using the WebRTC session layer, with ICE and STUN used to connect the endpoints across NAT. Likewise, the RTCPeerConnection is used as a session control channel, taking care of signaling, negotiating options, and other such miscellany. The original justification for RTCDataChannel was that multimedia applications also needed a conduit to simultaneously exchange ancillary information, anything from text chat messages to real-time status updates in a multiplayer game.

But data and real-time media are not quite the same, so WebRTC defines a separate channel for the non-multimedia content. While audio/video streams are delivered over Real-time Transport Protocol (RTP), a RTCDataChannel is delivered over Stream Control Transmission Protocol (SCTP), tunneled over UDP using Datagram TLS (DTLS). Using SCTP enables a TCP-like reliability on top of UDP. After all, if one video chat frame in an RTP conversation arrives too late to be displayed, it is essentially worthless; arbitrary binary data, however, might require retransmission or error handling.

That said, there is no reason that a web application must implement a media stream in order to use an RTCDataChannel. The API is deliberately similar to Web Sockets, with the obvious difference being that the remote end of a RTCDataChannel is another browser, rather than a server. In addition, RTCDataChannels use DTLS to encrypt the channel by default, and SCTP provides a mechanism for congestion control.

Obviously there are other ways to send binary data from one browser to another, but the WebRTC API should be more efficient, since the clients can choose whatever (hopefully compact) format they wish, and use low-overhead UDP to deliver it. At a WebRTC meetup in March, developer Eric Zhang estimated [SlideShare] that encoding data in Base64 for exchange through JSON used about 37% more bandwidth than sending straight binary data over RTCDataChannels.

Browser support for RTCDataChannels is just now rolling out. Chrome and Chromium support the feature as of version 26 (released in late March); Firefox currently requires a nightly build.

Sharefest

Sharefest is a project started by a video streaming company called Peer5. A blog post introducing the tool says the impetus was two coworkers who needed to share a large file while in a coffee shop, but Dropbox took ages to transfer the file to the remote server then back again. With RTCDataChannels, however, the transfer would be considerably faster since it could take place entirely on the local wireless LAN. A few months later, Sharefest was unveiled on GitHub.

The Sharefest server presents an auto-generated short-URL for each file shared by the "uploading" (so to speak) party. When a client browser visits the URL, the server connects it to the uploader's browser and the uploader's browser transfers the file over an RTCDataChannel. If multiple clients connect from different locations, the server attempts to pair nearby clients to each other to maximize the overall throughput; thus for a persistent sharing session Sharefest does offer bandwidth savings for the sharer.

Currently the size limit for shared files is around 1GB, though there is an issue open to expand this limit. A demo server is running at sharefest.me (continuing the near-ubiquitous domination of the .me domain for web-app demos). The server side is written in Node.js, and the entire work is available under the Apache 2.0 license.

Naturally, only the recent versions of Chrome/Chromium and Firefox are supported, although the code detects the browser and supplies a friendly message illustrating the problem for unsupported browsers. This is particularly important because the RTCDataChannel API is not yet set in stone, and there are several pieces of WebRTC that are implemented differently in the two browsers—not differences distinct enough to break compatibility; mostly just varying default configuration settings.

Sharefest is certainly simple to use; it does require that the uploading browser keep the Sharefest page open, but it allows straightforward "ephemeral" file sharing in a (largely) one-to-many fashion. From the user's perspective, probably the closest analogy would be to Eduardo Lima's FileTea, which we covered in 2011. But FileTea is entirely one-to-one; Sharefest does its best to maximize efficiency by dividing each file into "slices" that the downloading peers exchange with each other.

This is useful, of course, but to many people the term "peer-to-peer" effectively means BitTorrent, which is a considerably more complex distribution model than Sharefest supports (supporting, for example, super-seeding and distributed hash trackers). One could implement the existing BitTorrent protocol on top of RTCDataChannels, perhaps, and pick up some nice features like DTLS encryption along the way, but so far no one seems to have taken that challenge and run with it.

But there are others pursuing RTCDataChannels as a file distribution framework. Chris Ball, for example, has written a serverless WebRTC file-sharing tool, although it still requires sharing the STUN-derived public IP address of the uploader's machine, presumably off-channel in an email or instant message. The PeerJS library is a project that may evolve in the longer term to implement more pieces of the file-sharing puzzle.

In the meantime, for those curious to test out the new API, Mozilla and Google have publicized several other RTCDataChannel examples involving games. But as Zhang noted in his presentation, direct client-to-client data transfer is likely to have more far-reaching effects than that: people are likely to write libraries for file I/O, database drivers, persistent memory, etc. The potential is there for many assumptions about the web's client-server nature to be broken for good.

Comments (25 posted)

The Knight Foundation and open government

By Nathan Willis
June 26, 2013

"Open government" is a relatively recent addition to the "open" family; it encompasses a range of specific topics—everything from transparency issues to open access to government data sets. Some—although arguably not all—of these topics are a natural fit with the open source software movement, and in the past few years open source software has taken on a more prominent role within open government. The non-profit Knight Foundation just awarded a sizable set of grants to open government efforts, a number of which are also fledgling open source software projects.

Like a lot of large charitable foundations, the Knight Foundation has a set of topic areas in which it regularly issues grant money. The open government grants just announced were chosen through the foundation's "Knight News Challenge," which deals specifically with news and media projects, and runs several themed rounds of awards each year. The Open Government round accepted entries in February and March, and the winning projects were announced on June 24.

Of the eight grant awardees, three are open source projects already, with a fourth stating its intention to release its source code. OpenCounter is a web-based application that cities can deploy as a self-service site for citizens starting up small businesses. The application walks users through the process of registering and applying for the various permits, filling out forms, and handling other red tape that can be both time-consuming and confusing. The live version currently online is the site developed specifically for the city of Santa Cruz, California as part of a fellowship with the non-profit Code For America, but the source is available on GitHub. GitMachines is a project to build virtual machine images that conform to the often-rigorous requirements of US government certification and compliance policies (e.g., security certification), requirements which many off-the-shelf VMs evidently do not meet. The project lead is the former founder of Sunlight Labs (another open government software-development non-profit), and some of the source is already online.

Plan in a Box is not yet available online, because the proposal is an effort to take several of the existing free software tools developed by OpenPlans and merge them into a turn-key solution suitable for smaller city and regional governments without large IT staffs. OpenPlans's existing projects address things like crowdsourced mapping (of, for example, broken streetlights), transportation planning, and community project management. Several of OpenPlans's lead developers are Code For America alumni. Procure.io is an acquisitions-and-procurement platform also developed by a former Sunlight Labs veteran. At the moment, the Procure.io URL redirects to a commercial service, although a "community version" is available on Github and the project's grant proposal says the result will be open source.

There are also two "open data" projects among the grant winners, although neither seems interested in releasing source code. The first is Civic Insight, a web-based tool to track urban issues like abandoned houses. It originated as the "BlightStatus" site for New Orleans, and was written during a Code For America fellowship—although now its creators have started a for-profit company. The Oyez Project at the Chicago-Kent College of Law currently maintains a lossless, high-bit-rate archive (in .wav format) of all of the US Supreme Court's audio recordings; its grant is supposed to fund expanding the archive to federal appellate courts and state supreme courts.

The other grant winners include Open Gov for the Rest of Us, which is a civic campaign to increase local government engagement in low-income neighborhoods, and Outline.com, a for-profit venture to develop a policy-modeling framework. Outline.com is being funded as a start-up venture through a different Knight Foundation fund. In addition to those eight top-tier winners, several other minor grants were announced, some of which are also open source software, like the Open Source Digital Voting Foundation's TrustTheVote.

Free money; who can resist?

Looking at the winners, there are several interesting patterns to note. The first is that some of the projects sharply demarcate the differences between open source and open data. There is obviously a great deal of overlap in the communities that value these two principles, but they are not inherently the same, either. One can build a valuable public resource on top of an open data set (in a sense, "consuming" the open data, like BlightStatus), but if the software to access it remains non-free, it is fair to question its ultimate value. On the flip side, of course, an open source application that relies on proprietary data sets might also be susceptible to criticism that it is not quite free enough. In either case, the closed component can impose restrictions that limit its usefulness; worse yet, if the closed component were to disappear, users would be out of luck.

Perhaps more fundamentally, it is also interesting to see just how much the grants are dominated by former members of Sunlight Labs and Code For America. Both groups are non-profits, with a special emphasis on open source in government and civic software development. But neither has a direct connection to the Knight Foundation; they simply have built up a sustainable process for applying and winning grant money. And perhaps there is a lesson there for other, existing free software projects; considering how difficult crowdfunding and donation-soliciting can be in practice, maybe more large projects ought to put serious effort into pursuing focused grants from non-profit organizations.

To be sure, the Knight News Challenge for Open Government is a niche, and might be well outside of the scope of many open source projects—but there are certainly some existing open source projects that would fit. It is also not the only avenue for private grant awards. In fact, the Knight Foundation is just ramping up its next round of the Knight News Challenge grant contest; this one focuses on healthcare and medicine.

Comments (none posted)

Page editor: Nathan Willis

Inside this week's LWN.net Weekly Edition

  • Security: Verifying the source code for binaries; New vulnerabilities in java, mozilla, python-swift, xen, ...
  • Kernel: 3.10 statistics; Tracing triggers; Polling block drivers
  • Distributions: Which bug tracker?; Xen4ContOS, ...
  • Development: Naming, shaming, and patch review; Daala; systemd and control groups; Firefox 22; ...
  • Announcements: FANTEC guilty of GPL infringement in Germany, KACST joins TDF advisory board, ...
Next page: Security>>

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds