LWN.net Logo

Cool new Free software

Cool new Free software

Posted Dec 20, 2012 13:43 UTC (Thu) by paulj (subscriber, #341)
In reply to: Cool new Free software by man_ls
Parent article: Status.net service to phase out, replaced by pump.io

At least for me, I'm not at all an expert in how to do transactions. If I wanted to store data, and have it remain self-consistent with some schema, wouldn't it be easiest for me to start with using a DB that already provided consistency-features? It could be that it will be fast enough for my purposes, no? Wouldn't it be premature optimisation for me to go with a system that provided more performance by not providing consistency? While a good developer who's been around data-bases and relational data storage a good while may know how to implement consistency themselves, what are the chances of arbitrary developers being able to better implement consistency themselves than the database developers?

Unless of course the argument is that often the performance-impacting consistency guarantees are not required. But then, aren't ACID DBs tweakable? You can make, e.g., PostGres play a lot more loose with data than it does by default, I gather, and gain a lot of performance.

?


(Log in to post comments)

Cool new Free software

Posted Dec 24, 2012 9:58 UTC (Mon) by man_ls (subscriber, #15091) [Link]

There is a subtle difference between premature optimization and sensible design. Premature optimization tends to be low level, while sensible design is more of a big picture thing. You cannot commit all your data and your code to live in PostgreSQL and one day migrate magically to Riak; or go back and forth as you need it. Even migrating between similar databases is an error prone task, and JDBC and similar libraries that try to abstract the underlying database are just an excuse for managers. So changing databases is not an "optimization", and therefore it cannot be premature.

NoSQL stores usually allow you to use different schemas on the same table. Isn't it better to enforce a single schema? It depends; after all you will have to make sure that your data have the correct format before writing to the store (or after reading from it), so having the database reject your data is not a substitute for thorough testing. Also, having a single schema goes directly against reversible DevOps, as it entails offline data migration. Not everyone can afford downtime to migrate data between schemas.

As to consistency, the advantage of NoSQL stores is that they allow you to choose the degree of consistency that you need. You can either read everything from many different "tables" or you can just store every piece of data multiple times. With relational databases you can also denormalize data, but they are usually less flexible as to how it is stored. (It is harder e.g. to store an array inside a table, you can just store the first n items.) If you need total consistency, then by all means go to a relational store since it will give you better guarantees. But consistency is again not a magic pixie dust you can sprinkle on your data; it has to be there from the start.

Same with transactions: if you need them, go to a transactional store. But first think if you need them, and if you do then design them properly. Do not just trust your store to do the right thing because there are 100 ways to mess it up.

You can try to use PostgreSQL as a NoSQL store, but you will be swimming upstream for the rest of your career. How do you share the load, or replicate between nodes? How do you deal with consistency if you need it in a non-relational table? How do you optimize a single database for both consistency and lack of it?

Cool new Free software

Posted Dec 25, 2012 5:26 UTC (Tue) by raven667 (subscriber, #5198) [Link]

> You can try to use PostgreSQL as a NoSQL store, but you will be swimming upstream for the rest of your career. How do you share the load, or replicate between nodes? How do you deal with consistency if you need it in a non-relational table? How do you optimize a single database for both consistency and lack of it?

There are certainly specific use cases that the various NoSQL databases were designed for and are optimized for, if you have one of those cases then by all means use the right tool for the job, but there seem to be a lot of inexperienced developers who think that traditional SQL databases are slow creaking dinosaurs, unsuitable for any purpose when in fact they contain highly optimized data stores which have been developed over decades by people competent in the relevant math and computer science. Different does not always mean better.

Cool new Free software

Posted Dec 25, 2012 17:38 UTC (Tue) by man_ls (subscriber, #15091) [Link]

Well, designing a relational database is not trivial: modeling n-m relationships can be challenging for beginners. On the other hand, using a document store like MongoDB is trivial until you get to the serious stuff -- or to transactions. No wonder it is the most popular of the NoSQL bunch.

Other NoSQL families, like key-value (Dynamo-like) or graph databases, are more specialized and need more effort to keep going. But I would argue that none are near the level of sophistication of a normalized database. Most of you probably don't feel it because you have worked with SQL for many years, but it is a contorted language that uses a highly unnatural data model. Yes, relational databases are highly optimized, but you pay the price every time you read or write anything to them.

Cool new Free software

Posted Dec 25, 2012 17:56 UTC (Tue) by Wol (guest, #4433) [Link]

Which is why it sounds like Pick is a good fit for people who understand both relational and NoSQL.

Your "contorted language" is spot on. Basically, in relational you cannot STORE that most natural of data concepts, the list. You have to MODEL it. At which point your database becomes unnecessarily complex and complicated.

Referring back to the comment in your earlier post about "You can either read everything from many different "tables" or you can just store every piece of data multiple times. With relational databases you can also denormalize data, but they are usually less flexible as to how it is stored."

But that's exactly what Pick does! NFNF is what relational purists would call "denormalised". The fact that it can be mechanically normalised by the DB seems to have passed them by. So in Pick, I don't "read from many tables OR store the data many times", I just store the data ONCE in ONE table. imho it is *relational* that needs to "store the data many times in many tables" - just try to store a list, then ask yourself how many times you have to store the name of the list (or its id, the same thing...) As a system that requires (as part of its definition) that you don't store duplicate data, the relational system is very poor at living up to its own definition!

Cheers,
Wol

Cool new Free software

Posted Jan 4, 2013 16:56 UTC (Fri) by nix (subscriber, #2304) [Link]

I might note that you can in fact store lists in PostgreSQL. It's had arrays since before it was called PostgreSQL. :)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds