> So yes, if you pull again and you see B3, no problem. Just put it on the revision tree. B3 and B2_Deleted are siblings with parent B2.
Sure, that makes sense.
> The application is still probably more interested in the difference between the *data* rather than the ancestral history. [...] For example, what if B3 stores "delete_account: true" which means in this application that the user wishes to completely remove his account and delete all of his data. Clearly that fact is dominant in the merge/conflict-resolution strategy and it hardly matters what the history graph looks like.
Hrm, so what I'm basically hearing is that CouchDB's data/synchronization model in practice is:
-- You have a bunch of records which must be independent (there's no way that synchronization can respect any kind of referential integrity)
-- When synchronizing, CouchDB will automatically identify the "latest" version of any given record (via "fast-forward merge"); if anything more complicated happens, then the app is on its own. And if the app *wanted* to use a proper merge algorithm to, say, notice that the user added a phone number to this contact on their phone and also added a user picture on their computer and those edits can easily be combined -- then it's sort of doomed, because the needed history information just isn't recorded; it's expected that apps will mostly do the equivalent of two-way merge, or that divergence will be rare enough that even a crippled three-way-merge will be good enough.
Does that sound right?
My wild guess is that in practice this architecture works fine for data that's inherently loosely coupled, and where edits are rare relative to the size of the data store and the synchronization frequency. This probably covers all the main data people are replicating these days -- bookmarks, contact lists, mail with their phone -- but I'm not sure how far you can stretch it. And it's possible to do *much* better, without adding much -- if any -- complexity.
Monotone, for instance, implements a distributed data store with complex structure (a tree of files/directories, each of which can have an arbitrary set of attributes), referential integrity constraints (can't have two files with the same names, no directory loops, etc.), and a very cheap, fully history-sensitive automated merger with a rigorous mathematical foundation ("mark merge").
> There is nothing stopping an application-level `git-rerere` implementation--perhaps as a library which all developers could use.
Sure, but git-rerere is a total hack whose purpose is to prevent Linus from seeing merge nodes. I don't imagine your end-users really have any aesthetic preferences about the shape of the graph buried inside CouchDB :-). Wouldn't it make even *more* sense for CouchDB to just store the relevant information out of the box?
Your point about not wanting to confuse developers is well-taken, but I feel like they'd be better off overall if they had better tools, instead of having to implement these things themselves.
So, hmm. Thanks for giving me something to think about!