LWN.net Logo

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 10, 2012 18:43 UTC (Fri) by dlang (✭ supporter ✭, #313)
In reply to: The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided. by Wol
Parent article: XFS: the filesystem of the future?

so that means that you don't have any value anywhere in your database that says "this is the amount of money in account A", instead you have to search all transactions by all tellers to find out how much money is in account A

that doesn't sound like a performance win to me.


(Log in to post comments)

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 11, 2012 2:30 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

Well, git works exactly the same way. Is it fast enough for you?

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 11, 2012 5:48 UTC (Sat) by dlang (✭ supporter ✭, #313) [Link]

what gives you reasonable performance for a version control system with a few updates per minute is nowhere close to being reasonable for something that measures it's transaction rate in thousands per second.

besides, git tends to keep the most recent version of a file uncompressed, it's only when the files are combined into packs that things need to be reconstructed, and even there git only lets the chains get so long.

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 11, 2012 13:44 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

git/svn/... use store intermediate versions of the source code, so that applying all patches becomes O(log N) instead of O(N). But that's just an optimization.

NoSQL systems work in a similar way - they can store the 'tip' of the data, so that they don't have to reapply all the patches all the time. However, the latest data view can be rebuilt if required.

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 12, 2012 15:57 UTC (Sun) by nix (subscriber, #2304) [Link]

Actually, even the most recent stuff is compressed. It just might not be deltified in terms of other blobs (which is what you meant, I know).

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 12, 2012 18:29 UTC (Sun) by dlang (✭ supporter ✭, #313) [Link]

yes, everything stored in git is compressed, but it only gets deltafied when it gets packed.

and it's frequently faster to read a compressed file and uncompress it than it is to read the uncompressed equivalent (especially for highly compressible text like code or logs), I've done benchmarks on this within the last year or so

The entire noSQL family of servers is based on relaxing the reliability constraints of the classic ACID protections that SQL databases provided.

Posted Feb 12, 2012 13:38 UTC (Sun) by Wol (guest, #4433) [Link]

Okay, it would need a little bit of coding, but I'd do the following ...

Each month, when you run end-of-month statements, you save that info. When you upate an account you keep a running total.

If the system crashes you then do "set corruptaccout = true where last-month plus transactions-this-month does not equal running balance". At which point you can do a brute force integrity check on those accunts.

(If I've got a 3rd state of that flag, undefined, I can even bring my database back on line immediately I've run a "set corruptaccount to undefined" command!)

And in Pick, that query will FLY! If I've got a massive terabyte database that's crashed, it's quite likely going to take a couple of hours to reboot the OS (I just rebooted our server at work - 15-20 mins to come up including disk checks etc). What's another hour running an integrity check on the data? And I can bring my database back on line immediately that query (and others like it) have completed. Tough luck on the customer who's account has been locked ... but 99% of my customers can have normal service resume quickly.

Thing is, I now *know* after a crash that my data is safe, I'm not trusting the database company and the hardware. And if my system is so much faster than yours, once the system is back I can clear the backlog faster than you can. Plus, even if ACID saves your data, I've got so much less data in flight and at risk.

But this seems to be mirroring the other debate :-) the moan about "fsync and rename" was that fsync was guaranteeing (at major cost) far more than necessary. The programmer wanted consistency, but the only way he could get it was to use fsync, which charged a high price for durability. If I really need ACID I can use BEGIN/END TRANSACTION in Pick. But 99% of the time I don't need it, and can get 90% of its benefits with 10% of its cost, just by being careful about how I program. At the end of the day, Pick gives me moderate ACID pretty much by default. Why should I have to pay the (high) price for strong ACID when 90% of the time, it is of no benefit whatsoever? (And how many SQL programmers actually use BEGIN/END TRANSACTION, even when they should?)

Cheers,
Wol

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds