The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 9, 2012 20:36 UTC (Thu) by dlang (guest, #313)
In reply to: The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided. by Wol
Parent article: XFS: the filesystem of the future?

atomic, your scheme won't work if you need to make changes to two records (the ever popular "subtract $10 from account A, add $10 to account B" example)

consistency, what if part of your updates get to disk and other parts don't? what if the OS (or drive) re-orders your updates so that the write to the record for person happens before the write to building?

As far as durability goes, if you don't tell the OS to flush it's buffers (which is what fsync does), then in a crash you have no idea what may have made it to disk and what didn't.

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 10, 2012 16:17 UTC (Fri) by Wol (subscriber, #4433) [Link] (7 responses)

The ever popular "subtract $10, add $10" ...

Well, if you define the transaction as an entity, then it gets written to its own FILE. If the system crashes then you get a discrepancy that will show up in an audit. It makes sense to define it as an entity - it has its own "primary key" ie "time X at teller Y". Okay, you'll argue that I have to run an integrity check after a crash (true) while you don't, but I can probably integrity-check the entire database in the time it takes you to scan one big table :-)

Consistency? Journalling a transaction? Easily done.

And yes, your point about flushing buffers is good, but that really should be the OS's problem, not the app (database) sitting on top. Yes I know, I used the word *should* ...

Look at it from an economic standpoint :-) If my database (on equivalent hardware) is ten times faster than yours, and I can run an integrity check after a crash without impinging on my users, and I can guarantee to repair my database in hours, which is the economic choice?

Marketing 101 - proudly announce your weaknesses as a strength. The chances of a crash occuring at the "wrong moment" and corrupting your database are much higher with SQL, because any given task will typically require between 10s and 100s more transactions between the db and OS than Pick. So SQL needs ACID. With Pick, the chances of a crash happening at the wrong moment and corrupting data are much, much lower. So expensive strong ACID actually has a prohibitive cost. Especially if you can get 90% of the benefits for 10% of the effort.

I'm not saying ACID isn't a good thing. It's just that the cost/benefit equation for Pick says strong ACID isn't worth it - because the benefits are just SO much less. (Like query optimisers. Pick doesn't have an optimiser because it's pretty much a dead cert the optimser will save less than it costs!)

Cheers,
Wol

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 10, 2012 18:43 UTC (Fri) by dlang (guest, #313) [Link] (6 responses)

so that means that you don't have any value anywhere in your database that says "this is the amount of money in account A", instead you have to search all transactions by all tellers to find out how much money is in account A

that doesn't sound like a performance win to me.

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 11, 2012 2:30 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

Well, git works exactly the same way. Is it fast enough for you?

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 11, 2012 5:48 UTC (Sat) by dlang (guest, #313) [Link] (3 responses)

what gives you reasonable performance for a version control system with a few updates per minute is nowhere close to being reasonable for something that measures it's transaction rate in thousands per second.

besides, git tends to keep the most recent version of a file uncompressed, it's only when the files are combined into packs that things need to be reconstructed, and even there git only lets the chains get so long.

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 11, 2012 13:44 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

git/svn/... use store intermediate versions of the source code, so that applying all patches becomes O(log N) instead of O(N). But that's just an optimization.

NoSQL systems work in a similar way - they can store the 'tip' of the data, so that they don't have to reapply all the patches all the time. However, the latest data view can be rebuilt if required.

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 12, 2012 15:57 UTC (Sun) by nix (subscriber, #2304) [Link] (1 responses)

Actually, even the most recent stuff is compressed. It just might not be deltified in terms of other blobs (which is what you meant, I know).

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 12, 2012 18:29 UTC (Sun) by dlang (guest, #313) [Link]

yes, everything stored in git is compressed, but it only gets deltafied when it gets packed.

and it's frequently faster to read a compressed file and uncompress it than it is to read the uncompressed equivalent (especially for highly compressible text like code or logs), I've done benchmarks on this within the last year or so

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 12, 2012 13:38 UTC (Sun) by Wol (subscriber, #4433) [Link]

Okay, it would need a little bit of coding, but I'd do the following ...

Each month, when you run end-of-month statements, you save that info. When you upate an account you keep a running total.

If the system crashes you then do "set corruptaccout = true where last-month plus transactions-this-month does not equal running balance". At which point you can do a brute force integrity check on those accunts.

(If I've got a 3rd state of that flag, undefined, I can even bring my database back on line immediately I've run a "set corruptaccount to undefined" command!)

And in Pick, that query will FLY! If I've got a massive terabyte database that's crashed, it's quite likely going to take a couple of hours to reboot the OS (I just rebooted our server at work - 15-20 mins to come up including disk checks etc). What's another hour running an integrity check on the data? And I can bring my database back on line immediately that query (and others like it) have completed. Tough luck on the customer who's account has been locked ... but 99% of my customers can have normal service resume quickly.

Thing is, I now *know* after a crash that my data is safe, I'm not trusting the database company and the hardware. And if my system is so much faster than yours, once the system is back I can clear the backlog faster than you can. Plus, even if ACID saves your data, I've got so much less data in flight and at risk.

But this seems to be mirroring the other debate :-) the moan about "fsync and rename" was that fsync was guaranteeing (at major cost) far more than necessary. The programmer wanted consistency, but the only way he could get it was to use fsync, which charged a high price for durability. If I really need ACID I can use BEGIN/END TRANSACTION in Pick. But 99% of the time I don't need it, and can get 90% of its benefits with 10% of its cost, just by being careful about how I program. At the end of the day, Pick gives me moderate ACID pretty much by default. Why should I have to pay the (high) price for strong ACID when 90% of the time, it is of no benefit whatsoever? (And how many SQL programmers actually use BEGIN/END TRANSACTION, even when they should?)

Cheers,
Wol