Offline applications with Earthstar

By Daroc Alden
January 29, 2025

Earthstar is a privacy-oriented, offline-first, LGPL-licensed database intended to support distributed applications. Unlike other distributed storage libraries, it focuses on providing mutable data with human-meaningful names and modification times, which gives it an interface similar to many non-distributed key-value databases. Now, the developers are looking at switching to a new synchronization protocol — one that is general enough that it might see wider adoption.

The library

Earthstar is written in TypeScript (making it part of the JavaScript ecosystem), and designed to run both on native clients and in the browser. It has ten contributors, although the project's founder, Sam Gwilym, is certainly the most active. It has also gone through ten major releases since the project was started in 2020, and is now entering the final steps of preparing for the eleventh. Despite the library's rapid releases, the interface it presents has remained largely the same during that short time — at least, for a JavaScript library.

The Earthstar user guide explains the API for applications to build on. In short, after creating a key pair and establishing one or more connections to other peers, an application can write a binary blob to a particular key, and that data will be synchronized to all the other online peers that have permission to read that data. In this way, the total set of all values held by participants in the system forms a distributed key-value database. The database is offline-first and eventually consistent, so it doesn't make any guarantees about how long it will take for an update to be propagated to interested peers, but in practice the synchronization is fast enough to use for real-time chat applications when all of the involved peers are online.

The library does not yet support discovering new peers over the internet. There's no technical reason that this can't be supported in the future, but for now applications will need to implement their own peer discovery (such as by having peers put their public IP addresses into the application's Earthstar database). It does support automatically discovering peers on the local network, however. The project's documentation also covers how to create a relay server for an application.

There are a number of small examples that use the library, both in the documentation and generally available on the internet. Two of the most useful existing programs that use Earthstar are buntimer, a timer application that allows you to keep a todo list of timers in sync across multiple devices, and famstar, a simple photo-sharing and messaging application.

The protocol

Earthstar's new protocol is called Willow — a joint project of Gwilym and Aljoscha Meyer. Willow is a meta-protocol: it specifies most of what one needs for a distributed application, but leaves the choice of several key parameters, including encryption, up to the application. The new version of Earthstar will therefore be only one possible use of Willow, and other projects may change how the protocol is instantiated to fit their needs. Willow is split up into a handful of separate specifications for different pieces of the protocol. The central one is Willow's data model, which describes the kinds of data Willow can sync and how it's organized.

Willow stores arbitrary sequences of bytes organized by four different attributes: a path, made up of arbitrary human-readable path components; a timestamp, used to provide human-understandable modification times; a "subspace" that has a certain set of associated read and write permissions; and a "namespace" that separates data from different applications. Applications can query for values by any of those factors, with a few complications.

The first complication is timestamps. Paths in Willow are made up of a series of path components — analogous to directories in a filesystem. Writing a newer value to a prefix of an existing value's path (e.g., a new value for /a when there is an existing value for /a/b) overwrites the previous value. This is used to implement both updates and deletion; allowing for prefixes allows for the deletion of entire directories. While applications can query for values newer than a given timestamp, only the most recent version of a document will be returned.

The decision to use timestamps in a distributed system is an unusual one — most comparable protocols use vector clocks or systems like append-only logs. So Willow has a published justification for the decision. In short, the protocol doesn't actually rely on timestamps for correctness — even if all of the timestamps were zero, the protocol would still be able to pick a single, consistent document to call the "most recent" version. It might not be the version a human would have chosen, however.

Ultimately, the purpose of timestamps in the protocol is to make the idea of "the newest version" of a mutable resource in a distributed system more human-meaningful. In complex systems, understanding why a particular version is deemed newer than another can be difficult — it is much less so when the program can refer to the notion of time.

The next complication is read and write permissions. Many protocols allow a participant to discover what data a peer has available; so even if a peer didn't have permission to read some data, it could certainly tell whether the data existed. Willow doesn't work like that — it uses a simple zero-knowledge proof (detailed here) to make it so that peers can only exchange information about subspaces that they both have permission to read. In a typical use of Willow, each subspace will be controlled by a separate user (although users can have more than one subspace), who then grants read permission to whoever they desire.

The exact choice of authentication algorithm is one of the things that Willow leaves up to the implementation; all that it requires is that there is a function for determining whether an identity is permitted to read from or write to a subspace. Earthstar uses a capability system called Meadowcap to determine this. Some applications will want to allow a user to create their own subspace with no coordination; others will want identities to be managed by a centralized authority. Meadowcap supports both use cases in different ways. In particular, it supports granting not only read or write permissions, but also the ability to allow other users to also grant permissions for a particular set of keys. This makes the system more suitable for potential offline use.

Meadowcap also supports having credentials expire at a given time. Unfortunately, this is more complicated than it might initially seem, because Willow is designed to operate offline; a client can use an expired credential by just lying about the time, and pretending that an update was only now added to the network. So properly using expiring credentials in an application demands some care. The easiest way to handle it is to update the timestamp on a document to just after the expiry time of a credential when it expires — then, even if a client lies and pretends to have an update constructed in the past, it will be discarded in favor of the newer entry.

In any case, Willow presents a simple, flexible base for distributed storage. While the protocol doesn't handle conflict resolution at all (documents are always "last write wins"), leaving that up to the application, it does handle synchronization. Willow supports both active exchanges of information between peers, and the creation of "drops" of data that can be used to implement a sneakernet. Unlike online synchronization, using drops requires transmitting a whole portion of the database, not just the updates; on the other hand, portable storage is usually sufficiently dense to mitigate that problem. Compared to other protocols for distributed storage, Willow offers more flexibility and privacy, along with true mutability and deletion.

The downsides

But Willow's design decisions do come with disadvantages. For one thing, the fact that peers need read permissions to a subspace in order to even learn about the existence of it has huge privacy benefits — but it also means that peers without those read permissions can't act as relays for those that do. In order to synchronize the documents in a subspace with other users, the users need to be online simultaneously, or use the space-inefficient sneakernet method.

Another problem is with the choice to leave conflict resolution up to the application. While this makes sense for a generic protocol that aims to be used across many different areas, it does mean that there is still a decent amount to do to build an application on top of Willow. The most straightforward approach is to assign each user a separate subspace, and rely on them to not introduce conflicts in their own files, but that's not a robust or general solution.

Finally, while Willow is designed to make true deletion of data possible — something that is difficult in systems based around append-only logs — this is in some respects an impossible promise to keep. Deletion of data only works if all of the participating peers that have the data stored actually receive and honor the update. While Willow's eventual consistency guarantees mean that this will happen if all of the peers are eventually online again, in the real world that might not happen. Also, nothing stops a peer from keeping copies of deleted data on its own. So the inclusion of provisions for deleting data in the protocol are useful for making moderation and removal of illegal content possible in situations where all of the protocol participants are honest, but it isn't a solid guarantee that the application developer can rely on.

Comparison

Earthstar's documentation has a comparison to a few different methods of synchronizing data across the internet. Its use of relay servers for long-distance connection and direct connections for local peers is similar to Scuttlebutt, the decentralized social media platform. Its use of paths to identify opaque blobs of data is more similar to the InterPlanetary File System (IPFS). I think the library that provides the closest comparison, however, is gun. Both Earthstar and gun are written using JavaScript, are designed for offline-first use, and present similar interfaces.

Earthstar is not a large project. Compared to existing solutions for distributed applications, it's still missing some important features. Its data model is both more simplistic and more restrictive than other tools. But in exchange for those tradeoffs, it presents a compelling, human-friendly solution for distributed data. It will not be the right solution for every use case, but it may be the right solution for some of them.

Deletion?

Posted Mar 5, 2025 10:30 UTC (Wed) by smurf (subscriber, #17840) [Link]

> [deletion as] an impossible promise to keep

Well. Willow shares that property with about every other protocol. You can't force-delete data from somebody who doesn't want to honor that request no matter which underlying protocol you use.