LWN.net Logo

Content-centric networking with CCNx

By Nathan Willis
October 24, 2012

Content-centric networking (CCN) is a novel approach to networking that abstracts away the specifics of the connection, and focuses on disseminating the content efficiently. This is in contrast to the connection-oriented approach used in most IP applications, which requires establishing a channel between two nodes with known addresses. CCN excels at the comparatively common task of fetching static documents for multiple end users, which causes significant strain on the network as it is implemented in the one-to-one-connection-oriented TCP. The concept has been discussed for decades, but Palo Alto Research Center (PARC, formerly a subsidiary of Xerox) is actively developing a real-life implementation called CCNx, which is usable on Linux and other UNIX-like systems today.

What centric what?

CCNx is the brainchild of PARC's Van Jacobson, and if anyone is qualified to rethink core Internet protocols, Jacobson is. Among other things, he fixed TCP flow control and designed the IP multicast backbone. CCNx clearly draws on the lessons Jacobson has learned about network congestion over the years; in a 2006 talk at Google, he described how the NBC television network was slowed to a crawl during the Olympics by thousands of web users requesting copies of the same video clip. The data was identical and there was no secrecy required; if the backbone of the network could only recognize that the requests were identical, it could dispense with retransmitting it from the originating server — and make use of the existing copies closer to the final hop.

That said, CCNx (and CCN in general) is not a replacement for existing transport protocols; it is designed to run on top of them, and in fact to be oblivious as to which mechanisms are used underneath: TCP, UDP, IP multicast, link-level broadcast, or even point-to-point wireless. The goal is that a party sends a request for a document out into the open — with no destination address — and anyone who hears the request and has a copy of the document can respond to it. It is irrelevant whether the copy that is eventually returned originates from disk storage on a server, memory in a gateway router, or any other source. Naturally, making the network efficient means that the closest party who both hears the request and has the document should return it. In practice, CCN expects nodes to intelligently cache the documents that they route to the end-user nodes; doing so (and keeping popular documents close to the final hop of the route) is what prevents congestion.

For the scheme to work, of course, the authenticity of the content must be verifiable from the data itself. If that property holds, the most noticeable benefit is that, when popular content is requested by numerous end-users, there is far less congestion on the network — ideally no additional congestion, as routers at the edges of the network retransmit their existing copies of the content, without even needing to propagate the requests upstream. There are other benefits as well, such as the fact that participating nodes do not need static or globally-unique names. This allows low-power sensors to respond to requests (e.g., "what is the current temperature") without needing a complete multilayer network stack, and it allows clients to send such requests without knowing the topology of the network.

On the flip side, CCN does pack more information into the names and metadata of documents, incorporating things like versioning and timestamps. For example, once a server publishes a document over CCN, it no longer has control over it, because it propagates across the network. Consequently, all updates to a document must be issued as superseding publications that can be identified as updates referring to the original, and that can be verified as authentic.

CCNx specifics

CCNx tackles both the document-updating question and the authentication question in its messaging scheme. Nodes ask for content with an Interest message, in which the only required field is the name of the desired content (although time-limits, maximum number of hops, and other fields are available). Such sending nodes could be either end-user applications making the original request, or network infrastructure nodes passing along requests they cannot answer.

A Data message that can be authenticated as consistent with the original publisher is required to complete the puzzle; however the original publisher never needs to be made aware of the request. The Data message includes the requested data plus a cryptographic signature. The signature is generated against the data and an information block that contains a time stamp, the digest of the publisher's public key (which is required for nodes to verify the signature), and may optionally include other information such as the data type. Nodes are supposed to check the signatures and discard any content that fails verification; this "lazy" invalidation is intended to cut down on spoofing attacks without introducing significant overhead.

That is essentially all there is to CCNx; there are just two message types. Additional features like encryption and application state management are left entirely up to the layer above CCNx. Participating nodes are allowed to shape traffic as best they see fit. On the application side, that could mean interleaving requests for chunks of large file downloads with higher-priority requests to check mail. Because CCNx does not keep persistent connections open between nodes, Quality of service (QoS) is in the hands of the end-user.

Interestingly enough, CCNx does not impose any restrictions on the formatting of the actual document name, other than that it be a sequence of bytes and be hierarchical. The hierarchical dimension exists to allow publishers to publish related content using the same prefix. That could be interpreted as a given prefix representing a directory, or as a given prefix representing small chunks of a single file that needs to be reassembled further up in the application stack. The documentation describes an URL-like syntax for CCNx names of the form ccnx:/PARC/%00%01%02 and includes some recommended naming conventions, but they are advisory only. For example, it suggests using a DNS name for the first component in order to ease the transition, and it recommends encoding the timestamp as another component. Although optional, these conventions should allow nodes to perform efficient matching of content names by comparing the prefixes and without examining the data itself.

The strategy for running an efficient CCNx node is also left up to the implementer, although here again the project's documentation includes recommendations (under the "CCNx Node Model" sub-heading). The recommendation includes maintaining a content store (CS) indexed by document name, a table of unsatisfied Interest requests, and a table of outbound interfaces on which unsatisfied Interests have been forwarded. It is anticipated that a node will have multiple options at its disposal for forwarding Interest messages it cannot fulfill; choosing which links or routes are best at any given moment allows the node to be opportunistic.

Running CCNx

The CCNx distribution contains a handful of utilities that allow one to test CCNx on a single machine or on the local network. The latest release is 0.6.2, from October 3. It includes C source for both a simple CCNx forwarder and a content repository, a simple CCNx chat application written in Java, CCNx plugins for the VLC media player and Wireshark packet sniffer, and Android versions of the repository and chat applications. Ubuntu is the only Linux distribution tested, but the dependencies are lightweight: libcrypto, expat, libpcap, and libxml2.

With the software built, the first step is to start the CCNx daemon with bin/ccndstart. This is a script that launches the ccnd daemon and directs output messages to the terminal, although you can also monitor its status from http://localhost:9695 in a web browser. The ccnd daemon is what passes CCNx messages to other nodes; how it does so depends on the network transports defined in its configuration. For testing on one machine, ccnd does not require any configuration; however, editing the ~/.ccnx/ccnd.conf is required to forward CCNx requests between machines. The example configuration file is light on detail; its only example entry is the line

    add ccnx:/ccnx.org udp 224.0.23.170 59695
which tells ccnd to route all ccnx: URL requests that begin with ccnx.org to UDP port 59695, over the 224.0.23.170 multicast address. This address is reserved for CCNx with IANA.

The content repository can be started with the bin/ccnr binary. It defaults to running the repository on the current directory, but another location can be specified by setting the CCNR_DIRECTORY environment variable. Similarly, a name prefix for the available files can be set using the CCNR_GLOBAL_PREFIX variable. The repository's other key settings are configured in the data/policy.xml file, the most important setting being which prefixes the repository should answer for. By default, however, this prefix is empty, so the repository will answer all requests — good for testing, but not terribly practical for deployment.

The file utilities include the command-line tools ccnls, ccnputfile, and ccngetfile, as well as the graphical file browser ccnexplore. Dropping files in and rearranging them gets old after a few hours, but the chat application and VLC plugin offer more amusement. Both make it clearer how CCNx's network abstraction simplifies things from the user's perspective. To join a chat room, for example, one needs only the name of the room (e.g., ccnchat ccnx:/testroom1); the underlying transport and the network addresses of the participants never factor in.

In that sense, working in CCN is reminiscent of Zeroconf service discovery, except that there is no discrete discovery step involved. The long hierarchical document names suggest the route-embedding features of IPv6 addresses as well; similarly, the ability to retrieve a valid chunk of data from any source reminds one of Bittorrent. Of course, it is difficult to assess the congestion-prevention capabilities of CCN with just one or two machines, but the same would be true for most traffic-shaping or QoS techniques.

There are still aspects of CCNx that have yet to be finalized, how to avoid content naming collisions or spoofing for example. Perhaps the advisory naming conventions will be formalized, or perhaps if CCNx becomes an IETF standard, other techniques will arise. Also, CCN offers better aggregate throughput on the network by answering content requests with a nearby copy of the document, rather than fetching the original again. The downside is that publishers generally want to know page view statistics, so some form of reporting may need to be devised.

In his Google talk, Jacobson described CCN as a different perspective on how to use the network, rather than as a new suite of protocols. He compared it to the difference between telephone companies' circuit-switched networks and the first packet-switching data networks. The wires and the nodes were the same — the difference is in how the conversations and connections are expressed. Pessimists are understandably unhappy with the glacial pace of the IETF or of widespread IPv6 adoption, and the same people might argue that CCN will never replace the entrenched protocols like HTTP that dominate today. Perhaps it will not; it is still intriguing to experiment with, however, and one should certainly never discount the commercial Internet players' drive to adopt a new technology when it offers the prospect of saving them money — which CCN certainly could.


(Log in to post comments)

Content-centric networking with CCNx

Posted Oct 25, 2012 2:28 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

I've been following CCNx development for a few years. I don't think it's a really good solution as it requires additional routing layer with unclear scalability.

It's been suggested as a replacement even for low-level IP for real-time services ( http://conferences.sigcomm.org/co-next/2009/workshops/rea... ) but even in toy samples it requires excessive CPU time to do simple routing. That simply won't fly for something like Internet-scale layer.

Generalised BitTorrent

Posted Oct 25, 2012 7:39 UTC (Thu) by SimonO (subscriber, #56318) [Link]

It looks to me like a generalised form of bittorrent.

I wonder if it can do a hierarchical request as well, say I want the latest $distro image. I send out my interest in it on CCNx and it returns with a list of more CCNx chunks that I can send out interest for. This way I can receive the chunks from multiple sources simultaneously. Obviously, the CCNx "layer" could abstract this out for me.

/Simon

Generalised BitTorrent

Posted Oct 25, 2012 8:48 UTC (Thu) by etienne (subscriber, #25256) [Link]

Maybe distribution updates can be done through CCNx so that the first PC download new packages over ADSL and nearby PCs would not need to touch ADSL (whatever the first PC noticing it has to update/ no apt-cache config).
I would not mind if my neighbour uses my wireless to download those local files neither... or if my ARM Linux TV downloads its update from my ARM Linux phone.

What about "IP" protections?

Posted Oct 25, 2012 9:06 UTC (Thu) by zuki (subscriber, #41808) [Link]

Are redistibutors (the caching nodes) responsible for the content they are sending? If so, how do they know what they are sending and protect against being sued for content infringement?

What about "IP" protections?

Posted Oct 25, 2012 13:29 UTC (Thu) by n8willis (editor, #43041) [Link]

Bearing in mind that I in no way speak for the CCNx project, I would say that the protocol used is irrelevant to the legality of the content. I.e., a node has identical responsibilities, regardless of whether it is caching CCNx or TCP.

Nate

What about "IP" protections?

Posted Oct 27, 2012 17:58 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

There are lots of laws establishing immunity for blind redistributors of information, so as not to discourage people from building distribution networks.

Of course, there are also plenty of court cases to resolve the question of whether some is redistributing innocently or deliberately facilitating illicit distribution. So someone operating a CCNx node would want to stay clearly on the innocent side of that.

Incidentally, being sued is not the issue. Being liable is.

Content-centric networking with CCNx

Posted Oct 25, 2012 9:37 UTC (Thu) by rswarbrick (subscriber, #47560) [Link]

I'm struggling to understand this. What, fundamentally, is different between a box on the network to make CCNx work and a caching proxy? Have I missed the point?

Content-centric networking with CCNx

Posted Oct 25, 2012 11:24 UTC (Thu) by etienne (subscriber, #25256) [Link]

> What, fundamentally, is different between a box on the network to make CCNx work and a caching proxy?

Seems that you tell louder "what you want" and quieter (or not at all) "where you want it from".
Seems that PC doing CCNx would most of the time be clients *and* server.
So if a remote X connection needs a font to display some text, it would ask for the font file over CCNx, not ask the (maybe very distant) X server to provide the font file.
If a PC needs a file (package update, desktop background photo, ...) it asks over CCNx the file (giving its name, its version, some more info which identify uniquely what it wants - maybe the original reference server) and the nearest PC which has that file replies.

Content-centric networking with CCNx

Posted Oct 25, 2012 12:40 UTC (Thu) by rswarbrick (subscriber, #47560) [Link]

Ok, but if you want this to scale, you need some sort of unique ID for your document. There's the GUID approach, but that's not human-writable in the slightest. The other option is something heirarchical, where different content producers take bits of the total namespace. Of course, this is what people call a URL... (Or at least a URI)

I'm still massively not convinced that there's anything nontrivial here. I should probably go and have a careful look at ccnx.org and see if I can work out what they're doing. (and why)

Content-centric networking with CCNx

Posted Oct 25, 2012 14:13 UTC (Thu) by n8willis (editor, #43041) [Link]

Caching is predicated on the notion that there is one canonical version of the file, and the various cached versions are copies. In CCN, all versions of the file are equivalent (assuming the SHA-256 hash validates, that is). In HTTP, an end node could always request the original directly from the server if it doesn't trust the cache, but in CCN that gains you nothing since you can verify the correctness of the data yourself with the signature; consequently there is no mechanism to do so. Perhaps more importantly, intermediate nodes do not need to perform age calculations or any of the other cache-control mechanics.

Nate

Content-centric networking with CCNx

Posted Oct 25, 2012 14:46 UTC (Thu) by rswarbrick (subscriber, #47560) [Link]

Ah, I think I understand you. So the equivalent with an HTTP cache would require that (a) The contents at a given URL never changed and (b) The URL somehow also gave a hash of the contents so that you could check you'd got the right thing.

Yep, I agree that this is different from a caching proxy :-)

Network filesystems

Posted Oct 25, 2012 11:04 UTC (Thu) by epa (subscriber, #39769) [Link]

Back in the day I think Acorn's Econet network, often used in school classrooms, had a feature called the Broadcast Loader. When several computers requested the same file from the server, as often happens at the start of a lesson, it would notice and broadcast the file on the network for them all to receive simultaneously.

At first glance this looked like a similar thing, but on reading the article there isn't mention of IP multicasting or another way to avoid sending the same packet to several different recipients. Instead, it looks like a kind of distributed http cache or Bittorent. The load on the publishing server may be reduced, but global network traffic is just the same or greater. Surely the missing piece of the puzzle is a way for the response to be multicast to all those requesting it.

Streaming video seems like a bad example application since it is quite unsuited to the send-request-get-response model. The video stream is continuous; you don't request a frame at a time. And it is real-time, so you want as much predictibility as possibly in the route between you and the server so that the latency can be low and consistent. On the other hand, here we are talking about a Youtube-style recorded clip, which can be treated as a single file to download and then play.

Network filesystems

Posted Nov 3, 2012 15:57 UTC (Sat) by dmag (subscriber, #17775) [Link]

> Streaming video seems like a bad example application

Actually, that's a good example. In the past, people set up proprietary "video streaming servers" for this use case. Turns out, they don't scale, they are expensive, and they have lots of problems.

So people decided "what if we just used HTTP?" Apple made HLS (HTTP Live Streaming), and Adobe/Microsoft made similar knockoffs.

The idea was that any video (even live streams) can be encoded in chunks (usually 5s of video). Instead of a farm of specialized (and usually badly optimized) video streamers, you just use a farm of HTTP servers, copying in the segments as you go. The clients can benefit from HTTP caches, etc.

The same thing could work for CCNx: An intermediate node could forward one request for the next segment of the live stream, and satisfy multiple requesters. This may use more bandwidth at the edge (where it's usually not scarce) but the intermediate nodes will keep much less state. (I.e. once they satisfy the current segment, they keep no state until clients start requesting the next segment.)

replacement for CDNs????

Posted Oct 25, 2012 13:29 UTC (Thu) by faramir (subscriber, #2327) [Link]

It seems to me that large web sites tend to solve this problem by using a commercial CDN (content delivery network). They have been around for over a decade now and would probably have solved NBC's problem even in 2006.

I will admit that this effort is interesting as it would appear to be non-proprietary. On the other hand, if I had a large commercial web site to manage; I'm not sure that I would want to depend on the kindness of strangers to get my content out there reliably. Of course, companies like Netflix are depending on other people's networks to route their packets (which hasn't always happened) so maybe that isn't such a big deal. Perhaps if this gets sufficiently well entrenched due to use by non-profits like Wikipedia, it might fly.

One big difference I see

Posted Oct 31, 2012 17:43 UTC (Wed) by jmorris42 (subscriber, #2203) [Link]

Ok, looks like one big difference I can see right up front. This would put piracy for the masses back to being semi-anonymous.

If you ask for "hitmovie.avi" from BitTorrent your IP address is instantly visible in the torrent cloud and your ISP gets a C&D within hours these days. This scheme would have you hit your ISP's CCNx cache for it and only it would have a record of who asked for it, but even that wouldn't be conclusive since any node can and would also be a cache, only difference is the size and connectivity of the cache.

This moves things back towards the more broadcast model of UseNet. Imagine a UseNet server that could instantly pull a group when any user or downstream server read it.

Or another (weak) comparison to existing things would be a Bittorrent client that added torrents in the background. So you start it up to download debian.iso and as it peers with people with that torrent also pitches in with any other torrents any of the peers on that file are working on.

What I'm confused by is the insistence on avoiding what will eventually be required to make this work in the real world. A canonical URL field indicating where the content can be obtained if all attempts to retrieve a cached copy fail. So that at some point a boss level CCNx server can realize it has nowhere else to look and can just go there and retrieve a copy. Otherwise the first request made for something is going to incur some 'latency'. Likewise for rarely accessed content.

Content-centric networking with CCNx

Posted Nov 1, 2012 12:07 UTC (Thu) by Imroy (guest, #62286) [Link]

I'm surprised no one has mentioned Freenet yet. Minus the encryption and obfuscation of both what has been transferred and stored, CCNx sounds very similar to Freenet.

Content-centric networking with CCNx

Posted Nov 2, 2012 17:49 UTC (Fri) by cboursinos (guest, #87601) [Link]

Hello i want to ask you if someones knows! how Mac addresses used in CCNx if they used! thanks

Content-centric networking with CCNx

Posted Nov 3, 2012 15:58 UTC (Sat) by dmag (subscriber, #17775) [Link]

I sounds like MAC addresses aren't used at all. This all happens at higher levels (TCP/IP/HTTP/etc.)

CCNx WSN

Posted Apr 2, 2013 20:27 UTC (Tue) by Vinicius (guest, #90206) [Link]

I didn't understand CCN completely yet. For example, the interaction interest-data is more for requesting data right? And about posting data to somewhere, how it works?

In a scenario like WSN (wireless sensor network), it can have a lot of sensors collecting data (humidity, light, temperature, etc) but not storing them, they simply send this information to somewhere using IP, how CCN would fit? Anyone see any benefit of CCN in this kind of topology which is basically oriented to data?

Vinicius

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds