|
|
Subscribe / Log in / New account

Giving Upspin a spin

March 8, 2017

This article was contributed by Nur Hussein

In late February 2017, Google announced the Upspin project on its security blog. It is an open-source project from Google researchers that was released under the 3-clause BSD license. Upspin is a set of protocols and services that allow users to share files with a degree of privacy and security. Upspin is not a filesystem but instead provides a global namespace to securely address and retrieve files across networks along with access control to limit who can read and write those files. Users' identities are defined via keys; access is allowed based on the ownership of the file and for other users as specified by the owner. You can find the project's web page at upspin.io.

The objectives of the project as described in the overview document are universal naming, data sharing, and security. With Upspin, users can share their data securely; all of the data that users share will use a naming mechanism that will globally identify each object and which user that object belongs to. End-to-end encryption ensures data remains private and is not able to be seen without user consent even by the service provider hosting it.

The targeted use case for Upspin is personal, not corporate use, so the focus is on individual sharing while maintaining user privacy. A typical example is friends and family sharing photos among themselves on the internet, but with privacy settings to not allow the pictures to be viewed by unauthorized users. This is motivated by the fact that most users upload data onto web sites and apps such as Facebook, Twitter, and Instagram; it is hard to extract and share this data with other applications and services, or to apply access-control mechanisms to the data once it's uploaded. Upspin was devised as a way to name, store, and share data without resorting to uploading it onto a multitude of internet silos, and ensuring that no one (not even the service provider) can view or tamper with the data without authorization from the user.

Structure

Upspin is implemented as set of servers and a client written in Go. The code itself is open source and hosted on GitHub; that mirrors a repository on Google's Git infrastructure. The code is a reference implementation, and the developers "expect and hope that many other implementations will arise" as stated in the overview document.

An Upspin object identifier is composed of a unique user ID in the form of an email address, and a path to the object like a filesystem path. For example:

    alice@domain.com/dir/datafile
A validated email address prefix guarantees a unique global identifier for every user, so the string that follows is the virtual path name of the object in question. Files are organized in a directory tree starting from a "/" root directory, much like a Unix filesystem. Given an object identifier, Upspin will read or write a file to the network store if the user has permission to do so.

Upspin comprises the following components:

  • A key server that holds user identification keys and a pointer to a directory server.
  • One or more directory servers that store the users' file directory structure and pointers to the storage servers where they can be found.
  • One or more storage servers.
  • Optional caching servers.

While the directory structure is maintained on the directory server, it does not map to the actual layout of the physical filesystem the files are stored on. Rather the leaf nodes on the directory tree are pointers to objects stored on the storage servers. Those objects, in turn, are stored not by their filenames but referenced with a SHA-256 hash of their contents. The decoupling of the directory description and actual physical storage of the files creates flexibility for implementing any number of caching and data-distribution techniques.

When given a universal identifier such as alice@domain.com/dir/datafile, an Upspin client will send the request to its configured key server. The key server will look up the user alice@domain.com in its records, which will then return the location of the directory server the user's files are described in. The client will then query the directory server, which knows where all of Alice's files are. The directory server will evaluate the path name /dir/datafile to locate the file datafile and return the storage server it is kept on. The client then retrieves the file from the storage server.

[Upspin file lookup]

The concept is reminiscent of other efforts to provide a solution in this problem space, such as AFS, IPFS, Tahoe-LAFS, and Plan 9's namespaces. On the Upspin mailing list, Brian Van Klaveren asked if AFS was an influence on Upspin. Developer Andrew Gerrand replied that the team was aware of AFS, but it was not a direct influence. The Plan 9 influences are perhaps the most prominent on the design of Upspin, as Dave Presotto and Rob Pike were involved in the development of Upspin. Namespaces, and running a service as a combination of a set of different servers highly resembles Plan 9's design. Pike said that Upspin is unique in its combination of universality, security, and ability to share with "fine-grained" security options.

The global namespace key server is run by Google. This server keeps the public keys of all Upspin users, half of a private-public key pair generated with a P-256 elliptic curve algorithm. The server authenticates a user using their public key; a valid user will be able to use their private key to verify their identity and create a session. It is possible to run your own key server and form a private Upspin installation, but the design goal is for global participation in the Upspin ecosystem to prevent the fragmentation of the global namespace. As developer Eduardo Pinheiro explained:

That could fragment the namespace if not done carefully. One would need to know the authoritative key server for a user and then reconcile/reject discrepancies. But it's something to consider. We're just not there yet as it increases complexity.

One frequently asked question is whether the key server is always going to be a single point of failure; Pike replied:

While I share your concern about a single point of failure, it is important to the project that there be only one space of user names. A distributed user name set introduces technical difficulties but more important the possibility of name conflicts, which would be fatal to the project.

In time it may become important to build a distributed, federated key server that shares the load, but it should always form a single name space and guarantee its correctness. That project is for the future, though.

The idea is that every user in the global Upspin ecosystem should be universally identifiable and addressable, which is currently implemented with the central key server run by Google at key.upspin.io.

Access control

Access control, security, and privacy are enforced by Upspin via end-to-end encryption. All files stored are encrypted and decrypted on the client side, so user file contents cannot be read by hosting providers. By default, all of a user's files are encrypted and only accessible by the owner.

To share files, a user can place a simple, signed text file named "Access" with an access-control list in the directory that contains the files the user wants to share. The Access file contains a list of users (identified by their Upspin email) they wish to share the contents of the directory with; it specifies whether those users can read or write to the directories or files contained in the directory. File permissions are either read-only or read/write, while directories have options to have their contents listed ("listability") or to allow objects (including subdirectories) to be added or deleted from them. A directory that is not listable does not prevent users from accessing a file in that directory if they know its name; a user with read or read/write permissions to a file can still access it, even if they cannot list the rest of the files in the hierarchy.

The permissions apply to all files in the directory containing the Access file and all of its subdirectories unless there is an Access file deeper in the hierarchy that overrides them. A user may also define user groups in a special directory called Group, which needs to reside at the top-level directory. Groups are then defined in named files in the Group directory. An Access file may either contain users or groups.

By default, all stored files are encrypted with a random symmetric key using the AES-256 cipher, and the random symmetric key itself is encrypted with the file owner's public key. When a file is shared to other users, the random key is extracted and re-encrypted with each of the public keys of all the users the file is shared with. This is handled by the client, which will request all of the users' public keys from the key server. Integrity checks are done using the SHA-256 algorithm.

This access control model is simple, and resembles the access-control lists of Windows shared folders. However, because of the dependence on special files or directories named with common words like Access and Group, there is a likelihood that there will be a name collision with a user's personal files, as pointed out by Brian Swetland on the Upspin mailing list. Pike noted simplicity as the reason for that in his reply:

Regarding the access control files, all those issues were thought about but we decided to keep it simple. Our target is regular users and funny characters are not friendly to regular users. Still, we are certainly aware of the possibilities for collisions. We also worry about the parochial nature of English words here.

If the key server, directory server, or storage server were compromised, an attacker will be able to observe metadata retrieval, but not actual data thanks to the end-to-end encryption. Other security scenarios and encryption information can be found in the Upspin security document.

Trying it out

Both the Upspin clients and servers are available as an open source download. Go version 1.8 or later is needed for them to work. Go's go get mechanism can be used to fetch and install upspin:

    $ go get -u upspin.io/...

Once it is installed, there is a sign-up script to run; it expects to be given the user's email address and the location of the directory and storage servers the user wants to use. The script will then generate a private and public key pair, which are used for encryption and signing of the user's files. The default encryption algorithm uses the P-256 elliptic curve, but other options may arise as time goes on. The private key is kept locally, while the public key is sent to key.upspin.io for registration. The user then gets an email with a confirmation link, which finishes the two-step sign-up process.

The sign-up script will also create a configuration file in the user's home directory, which is stored in $HOME/upspin/config. It contains various options for Upspin in an easy to read YAML-formatted text file. The configuration knobs allow the user to edit the location of the three servers (key, directory, and storage), plus there is an option to edit encryption options. A user can disable encryption (but leave integrity check signing on), enable both encryption and integrity check (the default), or disable both.

Setting up the server side is also relatively simple. Upspin needs to be run on a host that accepts connections on port 443 and a domain name that the administrator can add DNS records to. A set-up script will generate a pair of keys on the server. The keys belong to a "server user", which will also register itself with the key server. The server set-up script generates a signature using the private key, which needs to be added as a DNS TXT record for the domain to prove to the key server that the administrator does indeed have authorization to add users for that domain. Once that is done, the server administrator can then add authorized users (called Writers) to the domain. The users are identified by the email addresses they used to sign up on the key server.

The server need not (and should not) be run as root. An upspin user should be created to be used for running the server. The upspinserver binary listens on port 443 (with the appropriate capability to do this set with setcap), and gets TLS certificates from Let's Encrypt.

Adding, retrieving and deleting files from an Upspin account can be done via commands to the upspin client, or via a FUSE interface that mounts the installation as a filesystem.

Conclusion

Upspin is currently experimental software. As such, it is rather rough around the edges. When I tried to set up my own installation with the client running on my workstation in Malaysia and the servers on Amazon Web Services on the west coast of the US, RPC timeouts occurred when storing files larger than a few megabytes using the Upspin command-line interface. The FUSE-based filesystem that the installation comes with also did not work for me, as it ended up hanging whenever the Upspin filesystem was mounted. An strace of the operation indicated that it was waiting for a read operation, most likely from the network, which would be in line with the RPC timeouts experienced using the Upspin commands.

In addition to needing a host to try out Upspin, setting up a server also requires a domain and server administration. Casual users will not want to invest time and resources to do this, so at this stage Upspin is still very much for developers and advanced users.

The Upspin project is trying to address a problem (sharing files) that already has a few solutions. After all, a universal naming mechanism for objects already exists in the form of the W3C's Uniform Resource Identifiers (URIs). One can imagine an Upspin-like service that can be built entirely with existing protocols such as HTTP, and using any of the existing key-server technologies. We shall have to see where Upspin's fits in as time goes on.

It is still unclear how Upspin will ultimately be provided to a critical mass of users; it is still in its early stages and the design issues are being worked out. It is likely that it will catch on first among cloud providers and special purpose wide-area networks. If that happens, the biggest selling point might be the simple access-control list mechanism and the automatic encryption that Upspin provides.


Index entries for this article
SecurityEncryption/Filesystems
GuestArticlesHussein, Nur


to post comments

Giving Upspin a spin

Posted Mar 9, 2017 9:08 UTC (Thu) by xav (guest, #18536) [Link] (4 responses)

Really nice, but the centralized key server is a weakness point IMHO. Not from a reliability point-of-view (I trust Google can run a server) but from a privacy one: there's just one point where someone can plug and observe all requests for keys from one user to another.
See https://github.com/upspin/upspin/issues/216

Giving Upspin a spin

Posted Mar 9, 2017 15:40 UTC (Thu) by droundy (subscriber, #4559) [Link]

I will add to this that the central key server is a place where a bad person can break the end to end encryption. If an attacker lies about my mother's public key, they can read anything I share with her. Perhaps I'll notice when she came see the files I share with her, but if I'm sharing with my a large group, maybe I'll not recognize this as an attack versus a problem on her computer.

Giving Upspin a spin

Posted Mar 12, 2017 11:03 UTC (Sun) by HIGHGuY (subscriber, #62277) [Link] (2 responses)

That keyserver should be replaced by a blockchain.
All transactions can surely be mapped to an equivalent operation, giving more control, privacy and security.

Giving Upspin a spin

Posted Mar 12, 2017 16:41 UTC (Sun) by nix (subscriber, #2304) [Link] (1 responses)

Also *far* higher resource consumption (CPU due to proof-of-work, disk space and network bandwidth due to storing the thing) and probably a very low system-wide maximum transaction rate, at least if blockchains are implemented anything like the Bitcoin one.

Blockchains are not a good fit for this problem. (They look like a terrible fit, actually. Why do you need nonrepudiability of fs operations? Why do you need it so much that you're willing to pay blockchain's enormous costs?)

Giving Upspin a spin

Posted Mar 12, 2017 18:58 UTC (Sun) by HIGHGuY (subscriber, #62277) [Link]

I'm only talking about the key servers, everything else is fine as is. It also doesn't exclude central servers doing some form of caching.

Regular users can't edit text files

Posted Mar 9, 2017 15:45 UTC (Thu) by droundy (subscriber, #4559) [Link] (2 responses)

"Our target is regular users and funny characters are not friendly to regular users."

This is a naive argument. Regular users can't edit text files, and probably don't know what a text file is. Any user interface needs to provide a means of editing the access control, and these files Access and Groups should therefore be hidden, and their names don't matter any more to a regular user.

Regular users can't edit text files

Posted Mar 10, 2017 2:41 UTC (Fri) by foom (subscriber, #14868) [Link] (1 responses)

Not to mention it doesn't even have a file extension.

If the file was named Access.txt, at least users could click on it and have an editor open.

Regular users can't edit text files

Posted Mar 12, 2017 22:43 UTC (Sun) by lsl (subscriber, #86508) [Link]

That's exactly what happens anyway regardless of the file's name. At least as long as the first couple bytes of the file don't confuse your file manager into taking it for something else.

Giving Upspin a spin

Posted Mar 9, 2017 21:43 UTC (Thu) by pspinler (subscriber, #2922) [Link]

A couple of comments, all impacting on project lifespan:

* Google has a history of deciding to end of life projects. If they were to do so with upspin, what would happen to the global keyserver?

* Is there thought given to replacing the SHA-256 hash in the future? See the current debates about replacing SHA1 in git, for instance. How about the key algorithm, similarly?

* Also, with regard to replacing a person's key, what if a key should get compromised? For instance, if someone is compelled to give it up crossing a border?

-- Pat


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds