SCALE 8x: Relational vs. non-relational
Posted Mar 4, 2010 5:35 UTC (Thu) by ringerc
Parent article: SCALE 8x: Relational vs. non-relational
Also: Whichever database system(s) you choose, you need to learn about them and understand them.
This is particularly true for more powerful (and complex) systems like relational databases, but still applies to even simple key-value stores. You don't just need to understand the system, though, but also established techniques and ideas about how to best use the system(s) you've chosen.
I see this a lot on the PostgreSQL mailing lists. For example, people will ask "why is my bulk data load so slow" and it'll turn out they're doing a million individual INSERT statements, one per record, over a high latency network - and they're running each as a standalone transaction. Each sequentially-executed statement requires at least one network round trip, plus an fsync() to make sure the data has hit disk before the server can reply with a completion notice, so of course it's slow. Get them to use COPY, or batch the inserts into large multi-VALUEd blocks inside one transaction and suddenly it's a thousand times as fast.
In addition to that sort of thing - learning about how to best use the technology you've chosen - you also need to understand how to manage the data. Some simpler non-relational databases may not require formal schema definitions, but you still need to understand how data will be stored, retrieved, pruned, etc - and failing to plan for that will lead to a nightmare down the track.
to post comments)