|
|
Subscribe / Log in / New account

The Tahoe secure filesystem

By Jake Edge
April 30, 2008

The Tahoe filesystem is designed as a secure, distributed filesystem that is available as free software. Tahoe is also designed for fault tolerance so that data remains available even in the presence of missing or malicious peers. In March, the project released a 1.0 version which makes this a good time to take a peek.

The basics of Tahoe are somewhat similar to GNUnet or Freenet in that the data is encrypted and spread around to multiple nodes in the network. Unlike those, though, Tahoe does not seek to provide anonymity. The nodes making up a Tahoe filesystem are called a "grid". Grids consist of some number of peers acting as storage server nodes along with an "introducer" that knows all of the other nodes and is the central point of contact for the grid.

Files are stored in Tahoe by first being encrypted on the local machine using AES. They are then broken into "shares", ten by default, that are distributed to different servers in the grid. Before that happens, though, the encrypted file is encoded in such a way that the whole file can be recovered even if only a subset of the shares can be retrieved.

This encoding, known as "erasure coding", is the key to the fault-tolerance of the Tahoe system. By default, Tahoe encodes the shares such that retrieving three of the ten is sufficient to recover the entire file. It also increases the size of the file by the expected 10/3 ratio.

The suggested use case for Tahoe is a "friendnet" where some group of friends share their storage with each other in a way that reduces or eliminates the need for backups. Tahoe also has ways to share data in either read-only or read-write (immutable or mutable in Tahoe-speak) modes. Tahoe is used as a commercial backup system by Allmydata, sponsor of the Tahoe project.

Tahoe is designed to be secure, which means that it protects the integrity and confidentiality of the data stored in it. SHA-256 is used extensively to ensure consistency of the plaintext, ciphertext, and shares. Files stored in the system are identified by long identifiers called capabilities, that look something like:

URI:CHK:yeyur23dw7cg3mxmsl2kiqvtt4:sdtrgczwtntzyfg2uapbfytxvyqsn45j4jpgrhcey7ebzpaoznya:3:10:107833344
For mutable files, there are two versions of the capability, one that allows only reading, while the other allows writing as well. Anyone who does not have a capability string for a particular file cannot access it at all.

Multiple user interfaces are available for Tahoe, including a web interface, a command-line interface, a FUSE extension and a web API. Tahoe is written in Python, using some C extensions for efficiency. It uses the Twisted framework for event handling, pycryptopp (a Python interface to the Crypto++ library) for its encryption needs, and zfec for the erasure coding. All of the Tahoe code is available under the GPL.

Installing Tahoe was fairly straightforward—there were a few hiccups which have since been resolved—using the installation guide. Joining the test grid was as easy as putting an introducer identifier into a file and starting Tahoe from the command line. In some basic testing, it seems to work quite well, overall, though it did not seem to use available bandwidth as efficiently as it might.

This brief overview only scratches the surface of the information available about Tahoe; there is much more on the documentation page. For anyone interested in distributed, secure, and/or fault-tolerant filesystems, Tahoe is definitely worth a look.


Index entries for this article
SecurityEncryption/Filesystems
SecurityNetworking/Filesystems


to post comments

The Tahoe secure filesystem

Posted May 1, 2008 4:36 UTC (Thu) by louie (guest, #3285) [Link]

Whoa. I (and some friends) have been looking for something like this for a while- this is
very, very interesting- thanks for the pointer.

The Tahoe secure filesystem

Posted May 2, 2008 3:11 UTC (Fri) by zooko (guest, #2589) [Link]

Here are some links to nicer-looking pages for the Tahoe-related spin-off projects (trac pages instead of the source tree root directories linked in the article):

And here's another useful library that is related to Tahoe (although it isn't actually a "spin-off" since it predates Tahoe -- more of a "spin-on" I guess):

The Tahoe secure filesystem

Posted May 7, 2008 13:33 UTC (Wed) by DRBaldock (guest, #30881) [Link] (1 responses)

So, what are the legal ramifications if someone has copyrighted material (Music or Movies, for
instance) on their system?  Doesn't that "share" it with others who are part of the Grid?

Just Wondering,
David Baldock

The Tahoe secure filesystem

Posted May 8, 2008 10:45 UTC (Thu) by renox (guest, #23785) [Link]

Well given that you received encrypted data only, I don't see how a sensible person could hold
you responsible for any copyright violation in this case..

What could be more problematic is the following: I think that in UK the cops can ask you to
give them the key for any encripted data present in your computer and in this case you don't
even have the key!


From a (more sensible) technical perspective, sharing with ten servers means that regularly
you have to poll the server to check that the copies are still ok, I wonder how this is done?


Copyright © 2008, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds