Containers and persistent data

Posted May 29, 2015 23:37 UTC (Fri) by jberkus (guest, #55561)
Parent article: Containers and persistent data

Some updates based on feedback from Diego Ongaro. Apparently learning Raft in 1/2 hour doesn't make me an expert. ;-)

First, the quote about Paxos was apparently him quoting someone else, a paper reviewer. The full quote is as follows:

This came from an anonymous NSDI reviewer in a
paper rejection: "I'm really glad that the authors have created a more
understandable algorithm for distributed consensus. The dirty little
secret of the NSDI community is that at most five people really, truly
understand every part of Paxos ;-)."

Second, he requested a link to his full list of publications: https://raftconsensus.github.io/#pubs

Third: Raft is designed for *immediate* consistency, not *eventual* consistency.

Fourth, explanation for the Paxos validation issues which I mention tersely above. Ongaro provided me with a quote from the paper Paxos Made Live which explains it:

"There are significant gaps between
the description of the Paxos algorithm and the needs of a real-world
system. In order to build a real-world system, an expert needs to use
numerous ideas scattered in the literature and make several relatively
small protocol extensions. The cumulative effort will be substantial
and the final system will be based on an unproven protocol."

Containers and persistent data

Posted May 30, 2015 9:23 UTC (Sat) by ms (subscriber, #41272) [Link] (1 responses)

> "There are significant gaps between the description of the Paxos algorithm and the needs of a real-world system. In order to build a real-world system, an expert needs to use numerous ideas scattered in the literature and make several relatively small protocol extensions. The cumulative effort will be substantial and the final system will be based on an unproven protocol."

Yes, completely agree. Irritatingly, the "Paxos Made Live" paper also misses out all sorts of concerns and issues too. My own write up on Paxos can be found at http://wellquite.org/blog/paxos_notes.html

Now in terms of proofs and so forth, if you're working in this area then IMO, you really need to be model checking your implementation at a minimum. I.e. exhaustively searching every possible permutation of events and checking the right thing happens. Learning TLA+ and building a Paxos model in it takes at most a couple of days. The model you're left with is reasonably code-like in that you could get your TLA+ model and your code looking pretty similar and then convince yourself line-by-line that it's the same thing. Or, the alternative is you build your own simulation environment for your programming language of choice and then you can drive you real production-ready code through your simulator and demonstrate that that exact code does the right thing in every scenario you've generated and tested.

Obviously, model checking is not the same as a proof. TLA+ does support some proof extensions but otherwise you're looking at the usual candidates for proof assistants (https://en.wikipedia.org/wiki/Proof_assistant) or pencil and paper. I would suggest though that even if an algorithm has been proved correct, that does not really help prove that your implementation matches the algorithm. The typical extensions to Paxos (Fast Paxos, Cheap Paxos) are papers by Lamport and I would expect proofs or TLA+ models to be available online (certainly Fast Paxos has a TLA+ implementation at the back of the paper). Finally, there is an argument to me made that if you're writing distributed systems software and you are unable to reason mentally about how and why your software works then you might not wish to continue working in this area. You only need to look at aphyr's work at https://aphyr.com/tags/Jepsen to see how hopelessly awful almost all software is in this area. It mainly always fails to work as claimed.

Containers and persistent data

Posted May 31, 2015 8:00 UTC (Sun) by paulj (subscriber, #341) [Link]

On designing distributed network protocols, I'd say that if you can't show how your protocol is self-stabilising, then it is broken. (Even just for a weaker, "easier" form of self-stabilisation that doesn't have to worry about arbitrary bit-flips in stored state within nodes - just partitions, heals, {dropped,re-ordered,delayed} messages at any arbitrary time / any length of time / any point).