LWN.net Logo

The Cr-48 and Chrome OS: Google's vision of the net

The Cr-48 and Chrome OS: Google's vision of the net

Posted Jan 23, 2011 2:24 UTC (Sun) by njs (guest, #40338)
In reply to: The Cr-48 and Chrome OS: Google's vision of the net by jhs
Parent article: The Cr-48 and Chrome OS: Google's vision of the net

> So yes, if you pull again and you see B3, no problem. Just put it on the revision tree. B3 and B2_Deleted are siblings with parent B2.

Sure, that makes sense.

> The application is still probably more interested in the difference between the *data* rather than the ancestral history. [...] For example, what if B3 stores "delete_account: true" which means in this application that the user wishes to completely remove his account and delete all of his data. Clearly that fact is dominant in the merge/conflict-resolution strategy and it hardly matters what the history graph looks like.

Hrm, so what I'm basically hearing is that CouchDB's data/synchronization model in practice is:
-- You have a bunch of records which must be independent (there's no way that synchronization can respect any kind of referential integrity)
-- When synchronizing, CouchDB will automatically identify the "latest" version of any given record (via "fast-forward merge"); if anything more complicated happens, then the app is on its own. And if the app *wanted* to use a proper merge algorithm to, say, notice that the user added a phone number to this contact on their phone and also added a user picture on their computer and those edits can easily be combined -- then it's sort of doomed, because the needed history information just isn't recorded; it's expected that apps will mostly do the equivalent of two-way merge, or that divergence will be rare enough that even a crippled three-way-merge will be good enough.

Does that sound right?

My wild guess is that in practice this architecture works fine for data that's inherently loosely coupled, and where edits are rare relative to the size of the data store and the synchronization frequency. This probably covers all the main data people are replicating these days -- bookmarks, contact lists, mail with their phone -- but I'm not sure how far you can stretch it. And it's possible to do *much* better, without adding much -- if any -- complexity.

Monotone, for instance, implements a distributed data store with complex structure (a tree of files/directories, each of which can have an arbitrary set of attributes), referential integrity constraints (can't have two files with the same names, no directory loops, etc.), and a very cheap, fully history-sensitive automated merger with a rigorous mathematical foundation ("mark merge").

> There is nothing stopping an application-level `git-rerere` implementation--perhaps as a library which all developers could use.

Sure, but git-rerere is a total hack whose purpose is to prevent Linus from seeing merge nodes. I don't imagine your end-users really have any aesthetic preferences about the shape of the graph buried inside CouchDB :-). Wouldn't it make even *more* sense for CouchDB to just store the relevant information out of the box?

Your point about not wanting to confuse developers is well-taken, but I feel like they'd be better off overall if they had better tools, instead of having to implement these things themselves.

So, hmm. Thanks for giving me something to think about!


(Log in to post comments)

The Cr-48 and Chrome OS: Google's vision of the net

Posted Jan 23, 2011 2:45 UTC (Sun) by jhs (guest, #12429) [Link]

I think your assessment is close enough so that further nitpicking of the minutia wouldn't be productive. Two final thoughts:
  1. If an improvement to the architecture is possible, the community would likely be very open to that. That would be a big change, requiring good justification, but the community is very open and flexible. I think the main problem right now is, tooling ("merge" libraries, development/debugging tools) is so much more primitive than the core database, that it's somewhat moot. The wiki's description of conflict resolution is a piece of pseudocode. In the future, it will say "If you use C, use this library; if you use Ruby, use this other." When that happens, the pain point of the history graph may become dominant.
  2. Tracking true "merges" is possible in "user space" if you will. Like a shadow government, the client can simply track its own history graph using its own mechanism. (In this case, it's just like Couch except merges are recorded as such.) The data is simply a normal key/val part of the record. If the algorithm proves to be superior, it could be baked into couch. (The advantage of Couch's revision tree is, like Unix dotfiles, it is not transmitted to the client unless explicitly requested. Otherwise it's a normal key/val datum called IIRC `revs_info`.)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds