Thwarting internet censors with Collage

By Jake Edge
September 1, 2010

Steganography is an ancient method of hiding a message in plain sight. In the digital age, steganography is often associated with hiding data inside of a binary file, typically using the low bits of an image or audio file in such a way that the message makes very little difference in the output. The Collage project looks to use steganography in conjunction with sites that host lots of user-generated content to provide a communication channel that resists censorship.

As the slides [PDF] and paper [PDF] from a recent presentation on Collage describe, there are increasing attempts to censor internet communications. It is not just repressive regimes that are guilty of such censorship either, as various democratic governments are trying—sometimes succeeding—to get into the game. Existing methods to route around things like the "great firewall of China" rely on using proxies (e.g. Tor) outside of the censorship wall. But, proxies are relatively easily identified and blocked. Worse yet, anyone attempting to use one of the proxies can be identified and punished.

By using sites that regular "law abiding" citizens use on a regular basis, Collage seeks to appear completely innocuous to the censoring devices. The specific example used is photo-sharing sites like Flickr. Many people legitimately browse the photos there, so it will be difficult to determine that a particular user may be browsing for photos that contain a steganographic message. In addition, the sheer number of photos stored on the site make it difficult for the censors to catalog those that may contain a hidden message.

It is, essentially, a form of "security through obscurity", but one that can offer a level of deniability if used properly. If a censored user frequently visited Flickr for photo uploading and browsing, and only infrequently used it to pass messages, it would be difficult to detect by anything other than a targeted monitoring of that user's traffic. Unlike proxies, there is no need for anyone to maintain an infrastructure of hosts to handle the traffic; Flickr, YouTube, and others are already doing so.

The basic idea is that a simple message is encrypted (using some key agreed upon separately), then broken into pieces, with erasure coding added so that the entire message can be re-assembled from just a subset of the pieces. Those chunks then get steganographically inserted into multiple photos, which are uploaded to a photo-sharing site.

The project also used a text steganography technique to hide messages in the text of comments on blogs, YouTube, Twitter, and so on. In either case, the presence of steganography is likely to be detectable if the censoring agency tries. But with proper encryption, the actual message text will not be recoverable. The paper also discusses the use of watermarking to hide information that may be more easily detected but is hard to remove without disrupting the containing photo or file.

In order for a message to reach its recipient, though, there needs to be some way for them to know which of the billions of photos at Flickr actually contain bits of interest. In addition, the downloads made by the user must appear to be "normal" tasks that a Flickr user might perform. The paper outlines a rather elaborate protocol that could be used to map messages to "deniable tasks" that the recipient must perform. It's a tricky problem as is acknowledged in the paper:

The challenge, of course, is finding sets of tasks that are deniable, yet focused enough to allow a user to retrieve content in a reasonable amount of time.

It is a clever technique, but there are, of course, some pitfalls. The complexity will make it challenging to use, and automated retrievals may be difficult to do in a non-suspicious manner. It could also end up pointing a finger at "innocent" users of a site like Flickr, who unwittingly just happen to perform the task associated with a Collage message. The paper notes that risk, but also points out that "organizations can already implicate users with little evidence".

Essentially Collage is a proof-of-concept that uses off-the-shelf free software to handle the encryption, encoding, and steganography pieces. So far, the code for a demonstration client, which downloads a message that the project stored in Flickr, is available. The web site does not specifically mention further code releases, but one hopes the code for the sending side will also become available. There are also some performance measurements in the paper that show "acceptable" overhead for sending small, textual messages.

The complexity is daunting, but for those who really need to communicate in a largely deniable fashion, the Collage technique certainly has some appeal. It doesn't suffer from some of the obvious "red flags" that arise when using Tor or normal encrypted traffic (e.g. SSL/TLS, ssh, GPG), which may make it disappear into the noise of normal network traffic. Collage, or something like it, may find a place in the toolkit of those trying to evade internet censorship.

Index entries for this article
Security	Encryption
Security	Privacy

Thwarting internet censors with Collage

Posted Sep 6, 2010 13:24 UTC (Mon) by ortalo (guest, #4654) [Link]

What about using covert channels instead of steganography to evade such censorship? I wonder which technique is more vulnerable to detection (both have been well studied in the litterature).

To illustrate the covert channel idea: if a black and white picture is stored, this a "0", if that's a colour picture, then it's a "1".
Of course, practical implementations should try to improve the bandwitdh, possibly with more image types or other types of shared resources: for example (landscape, portrait, fish, animal) mapping to (00,01,10,11).

Interesting but limited in use

Posted Sep 9, 2010 8:46 UTC (Thu) by renox (guest, #23785) [Link]

IHMO the main weakness of steganography is that it's (currently) restricted to man to man communications i.e you cannot browse a website with Collage.

Sure you can in theory ask someone to download a webpage for you and send you the result hidden using Collage also, but this doesn't seem very practical for 'day to day' usage.
It would be interesting to automate this..