> Transactions are tricky anyway you look at them. Developers not thinking about integrity will end messing up their transactions and leaving things in mid-air. Automatic transactions in a middleware layer (e.g. web services) help, but then those web services have to be designed to encompass atomic transactions. I have seen this situation far too often even in large banks, where system failures resulted in thousands of unfinished transactions that had to be repaired by hand.
Transactions in the usual database system only persist across session end if they are started as two phase commit transactions. Which you only use if *have* to because you need to coordinate critical changes between distributed systems where divergence is not an option (NB: 2PC != 2PL).
> Also, modern databases force everything to be in a transaction, which is horrible to performance. Why force developers to lock tables when sometimes there is no need to?
Uhm. a) Most RDBMS's allow you to have transactions that automatically only span a single statement b) Why do transactions, unless used together with synchronous commits, imply being horribly slow. c) What does that have to do with locking tables? No sensible database forces you to exlusively lock a table to make modifications to the contents of said table. Most require some lock that prevents it from being ALTERed or DROPed, but thats something different.
Just some trivial number towards b), on my workstation from 2009, with other stuff going on, in an absolutely unoptimized configuration, PG does 320k readonly transactions per sec and 140k trivial writing ones. With a data-loss window of 1/3s. Now, I'll freely admit that the latter number only is that high because I didn't create an index but even after that its still 90k.
Now, I know from experience that there are scenarios where RDBMS's don't work very well, but - sorry to be harsh here - I don't see too many people making those 'RDBMS/ACID/whatever sucks' arguments here have actual experience with that kind of workloads. Also note that most shops using fancy KV stores and such *also* use relational databases for parts of their data. Its simply a question of the right tool for the right job.
Disclaimer: I do work with/on/arround PG, so I am certainly biased. But some other RDBMS's are certainly able to produce similar numbers.
Disclaimer-2: Yes, those numbers are not realistic for any real-world workload.
Posted Dec 20, 2012 15:15 UTC (Thu) by nix (subscriber, #2304)
[Link]
on my workstation from 2009, with other stuff going on, in an absolutely unoptimized configuration, PG does 320k readonly transactions per sec and 140k trivial writing ones.
How? With an absolutely unoptimized default configuration, each commit involves a sync, which involves waiting for the disk. There's no way you can do more than a few hundred such transactions per second unless you have a battery-backed RAID array so you don't need to wait for on-the-average half a disk rotation to bring the data under the head (and then possibly another avg half rotation for associated metadata). Doing 320,000 transactions per second with PostgreSQL is hard to imagine.
I am obviously missing something. (Again. It's not my week...)
Cool new Free software
Posted Dec 20, 2012 15:52 UTC (Thu) by andresfreund (subscriber, #69562)
[Link]
> How? With an absolutely unoptimized default configuration, each commit involves a sync, which involves waiting for the disk.
unoptimized != default. It was just the system wide installation I had lying around, it sure has some adjustments, just not optimized for anything. I don't know from the top of my head which ones exactly, I can look it up though if you want.
> There's no way you can do more than a few hundred such transactions per second unless you have a battery-backed RAID array so you don't need to wait for on-the-average half a disk rotation to bring the data under the head
Yes, I had synchronous_commit turned off (I do that on all my development systems) thats why I wrote that I had a 1/3 second window of data-loss. Thats ok enough in many scenarios.
You btw. can get into the lower thousands on a single spindle these days (since 9.2) with synchronous commit as well, due to the way WAL flushes are batched/grouped across different sessions. Requires quite a bit of concurrency though.
> (and then possibly another avg half rotation for associated metadata)
PG writes the WAL sequentially into pre-allocated files to avoid that...
> Doing 320,000 transactions per second with PostgreSQL is hard to imagine.
Those were the read only ones. 320k writing transactions to logged (i.e crash-safe) tables is probably impossible independent from the hardware right now due to the locking around the WAL.
Sorry, should have given a bit more context, but it really just was 5 minutes of benchmarking without any seriousness to it.
Cool new Free software
Posted Dec 20, 2012 23:23 UTC (Thu) by man_ls (subscriber, #15091)
[Link]
So there is your "completely unoptimized" database: a heavily customized installation by an expert. Oh, and don't use INSERTs like you were told, just COPYs. And a schemaless specific datatype. And hope that performance is still good... At that point you might as well use a data store which at least has been designed with that scenario in mind, and which any idiot (e.g. me) can use to do thousands of writes per second -- out of the box.
Cool new Free software
Posted Dec 21, 2012 0:15 UTC (Fri) by andresfreund (subscriber, #69562)
[Link]
Wait what? "Heavily customized"? I now checked and I changed 3 performance critical parameters (and loads of logging/debugging ones, but those don't increase performance).
1) synchronous_commit = off. Is it ok to loose the newest (0.2 - 0.6s) transactions in a crash. Older transactions are still guaranteed to be safe. Thats a choice *YOU* have to make, it really depends the data youre storing (and can be toggled per session & transaction). Obviously things are going to be slower if you require synchronous commits.
2) shared_buffers = 512MB. How much memory can be used to buffer reads/writes. Would I have optimized I would have probably set it to 4-6GB.
3) checkpoint_segments = 40. How much disk can we use for the transaction log. Would I have optimized for benchmarking/write heavy it would have been set up to 300.
I don't know how those could be determined automatically. They depend on what the machine is used for.
And I used plain INSERT & SELECT, no COPY.
Cool new Free software
Posted Dec 20, 2012 19:21 UTC (Thu) by raven667 (subscriber, #5198)
[Link]
> Now, I know from experience that there are scenarios where RDBMS's don't work very well, but - sorry to be harsh here - I don't see too many people making those 'RDBMS/ACID/whatever sucks' arguments here have actual experience with that kind of workloads. Also note that most shops using fancy KV stores and such *also* use relational databases for parts of their data. Its simply a question of the right tool for the right job.
One thing I see from people who do a lot of work with Postgres and understand it very well is that there is a lot of flexibility on the data structures used and methods to access those data structures that have been optimized by developers competent in the math over the course of decades. So whatever algorithms are used to drive these NoSQL databases and make them fast are probably already implemented as good or better in Postgres, but it does take a lot of familiarity with how the data is stored to make use of this fact. You also have the ability to make plugins to add new data types and storage methods for whatever use case you need.
I guess to distill the point is that many of the techniques which make NoSQL databases fast can also be used with SQL databases if you know how, and you get the benefit of the decades of research and development that has gone into the traditional SQL engines.
a lot of familiarity with how the data is stored
Posted Dec 20, 2012 22:23 UTC (Thu) by Wol (guest, #4433)
[Link]
:-)
Which is EXPLICITLY FORBIDDEN by the relational definition!
Exactly my point about Postgres taking a lot of ideas from the NoSQL crowd.
In Pick, I know *exactly* how my data is stored, because I told the database how to do it. Which is why Pick databases can be a mess because it's not properly normalised :-( The programmer needs to normalise his FILEs properly or you can lose the benefits. I've seen too many badly normalised FILEs that are a pain to work around.