Why Uber dropped PostgreSQL
Why Uber dropped PostgreSQL
Posted Aug 5, 2016 11:21 UTC (Fri) by niner (subscriber, #26151)In reply to: Why Uber dropped PostgreSQL by brong
Parent article: Why Uber dropped PostgreSQL
Posted Aug 5, 2016 12:10 UTC (Fri)
by brong (guest, #87268)
[Link] (4 responses)
can you please enumerate the sort of corruptions that occur with statement based replication?
The only sort I can think of are cases where the transactions get re-ordered in the statement log compared to the order they were actually applied on the master due to concurrency, and hence the replica falls out of sync.
Or cases where you flat out allow the two ends to be out of sync by manually fiddling replication log position so that you skip transactions. You can't really call that a bug in statement based replication though.
Posted Aug 5, 2016 15:11 UTC (Fri)
by paulj (subscriber, #341)
[Link] (1 responses)
With the low-level binary log replication, bugs that lead to corruption can replicate.
With the logical level replication, bugs that lead to logical level corruption can also cause inconsistent state. E.g., an update doesn't get applied to slaves because it isn't accepted, which could affect application consistency. Bugs at the binary log level may not replicate of themselves, but could cause a logical level replication to fail to replicate and cause inconsistent state.
Isn't it the case that the logical layer replication system has _two_ layers at which bugs can strike and cause significant problems? You now have two layers that need to be robust? And bugs in the lower layer can still take down the upper layer?
Posted Aug 5, 2016 21:58 UTC (Fri)
by brong (guest, #87268)
[Link]
If your low level data structures are corrupted - better have a good fsck and/or good backups, because you have have no replica with consistent state any more.
Posted Aug 7, 2016 16:43 UTC (Sun)
by krakensden (subscriber, #72039)
[Link]
Posted Aug 11, 2016 7:50 UTC (Thu)
by ringerc (subscriber, #3071)
[Link]
MySQL works around this somewhat by special-casing some functions, like now(). It evaluates them on the master and stores the results in the binlog, then ensures the invocations on the replica(s) return the same results as the master.
PgPool-II for PostgreSQL does something similar in statement based replication mode.
Clever, but solves only narrow cases. For example, in MySQL SYSDATE() still doesn't work safely. So you have to code very carefully to avoid breakage. (See https://dev.mysql.com/doc/refman/5.7/en/replication-featu...) .
By contrast, PostgreSQL's block-level replication leaves the replica an identical copy.
That's why in practice the most practical MySQL replication option is row-based replication or hybrid row/statement based replication. Many people who are talking about "statement based" replication here are really thinking of row-based replication, or the MIXED replication mode that MySQL can use to hybridize the two. Rather cleverly, I must say. ( https://dev.mysql.com/doc/refman/5.7/en/replication-forma..., https://dev.mysql.com/doc/refman/5.7/en/binary-log-mixed.... ).
That's what I'm involved in working on for PostgreSQL too, at 2ndQuadrant, in the form of BDR and pglogical. There's ongoing work to get this into PostgreSQL core. Though we're not planning on any sort of mixed replication mode at this point.
Posted Aug 7, 2016 3:54 UTC (Sun)
by giraffedata (guest, #1954)
[Link]
But are corruptions of that class as dangerous?
I take the complaint to be that with the WAL-based replication, a single trigger of a bug can cost you the whole cluster. But with logical replication, for all it's opportunities to fail, the most you will lose is one replica, and at worst you'll have to blow away that replica and replace it.
Is there a class of bug specific to MySQL that corrupts the entire cluster at once?
Why Uber dropped PostgreSQL
Why Uber dropped PostgreSQL
Why Uber dropped PostgreSQL
Why Uber dropped PostgreSQL
Why Uber dropped PostgreSQL
Why Uber dropped PostgreSQL
At the same time, logical replication like MySQL does bring a whole class of corruptions that are simply not possible in the same way with Postgres' WAL based replication.