The Tahoe secure filesystem
The Tahoe filesystem is designed as a secure, distributed filesystem that is available as free software. Tahoe is also designed for fault tolerance so that data remains available even in the presence of missing or malicious peers. In March, the project released a 1.0 version which makes this a good time to take a peek.
The basics of Tahoe are somewhat similar to GNUnet or Freenet in that the data is encrypted and spread around to multiple nodes in the network. Unlike those, though, Tahoe does not seek to provide anonymity. The nodes making up a Tahoe filesystem are called a "grid". Grids consist of some number of peers acting as storage server nodes along with an "introducer" that knows all of the other nodes and is the central point of contact for the grid.
Files are stored in Tahoe by first being encrypted on the local machine using AES. They are then broken into "shares", ten by default, that are distributed to different servers in the grid. Before that happens, though, the encrypted file is encoded in such a way that the whole file can be recovered even if only a subset of the shares can be retrieved.
This encoding, known as "erasure coding", is the key to the fault-tolerance of the Tahoe system. By default, Tahoe encodes the shares such that retrieving three of the ten is sufficient to recover the entire file. It also increases the size of the file by the expected 10/3 ratio.
The suggested use case for Tahoe is a "friendnet" where some group of friends share their storage with each other in a way that reduces or eliminates the need for backups. Tahoe also has ways to share data in either read-only or read-write (immutable or mutable in Tahoe-speak) modes. Tahoe is used as a commercial backup system by Allmydata, sponsor of the Tahoe project.
Tahoe is designed to be secure, which means that it protects the integrity and confidentiality of the data stored in it. SHA-256 is used extensively to ensure consistency of the plaintext, ciphertext, and shares. Files stored in the system are identified by long identifiers called capabilities, that look something like:
URI:CHK:yeyur23dw7cg3mxmsl2kiqvtt4:sdtrgczwtntzyfg2uapbfytxvyqsn45j4jpgrhcey7ebzpaoznya:3:10:107833344For mutable files, there are two versions of the capability, one that allows only reading, while the other allows writing as well. Anyone who does not have a capability string for a particular file cannot access it at all.
Multiple user interfaces are available for Tahoe, including a web interface, a command-line interface, a FUSE extension and a web API. Tahoe is written in Python, using some C extensions for efficiency. It uses the Twisted framework for event handling, pycryptopp (a Python interface to the Crypto++ library) for its encryption needs, and zfec for the erasure coding. All of the Tahoe code is available under the GPL.
Installing Tahoe was fairly straightforward—there were a few hiccups which have since been resolved—using the installation guide. Joining the test grid was as easy as putting an introducer identifier into a file and starting Tahoe from the command line. In some basic testing, it seems to work quite well, overall, though it did not seem to use available bandwidth as efficiently as it might.
This brief overview only scratches the surface of the information available about Tahoe; there is much more on the documentation page. For anyone interested in distributed, secure, and/or fault-tolerant filesystems, Tahoe is definitely worth a look.
| Index entries for this article | |
|---|---|
| Security | Encryption/Filesystems |
| Security | Networking/Filesystems |
