So how does PostGreSQL compare to MySQL?

Posted Jan 1, 2010 12:17 UTC (Fri) by ringerc (subscriber, #3071)
In reply to: So how does PostGreSQL compare to MySQL? by rvfh
Parent article: The ongoing MySQL campaign

"My main question so far is: is the syntax 100% compatible?"

No, it isn't. It's probably not even 90% compatible. However, if you've stuck to the SQL standard you won't have many things - if any - to change. However, there are some very commonly used MySQL extensions (like AUTO_INCREMENT) which you must do a different way in PostgreSQL, so you must be prepared for a few changes even to fairly simple code.

(Note that there are good reasons why PostgreSQL uses sequences and the SERIAL pseudo-type instead of using AUTO_INCREMENT. It's one of those things that MySQL has but PostgreSQL doesn't implement because it's actually pretty broken in transactional systems.)

The PostgreSQL documentation is extremely complete and discusses the relationship of its syntax with the standard in the documentation for each command/function. So you can learn a lot from there. There's also a lot of information about MySQL to PostgreSQL migrations all around the 'net.

Is is just some corner cases that are not (those we probably do not use anyway)?

If you're using MyISAM tables, then you're not asking much from your database. You don't need it to be reliable or to support basic transactional features, for example. So it's fairly likely you won't have too many issues, though you'd need to handle AUTO_INCREMENT and probably move to sql-standard datatype names for a few things.

Is it comparable performance-wise?

Usually I'd just say "yes"... because MySQL and PostgreSQL have comparable performance for "serious database" uses. For some workloads and queries one is faster, for others another, but there's no clear and obvious winner. However, that's assuming you're using InnoDB tables on MySQL and are relying on transactional behaviour.

You're using MySQL in quick-and-dirty MyISAM mode, where it's not even crash-safe let alone transactional. If you don't need reasonable safety you could get comparable performance from PostgreSQL by setting fsync = off in postgresql.conf but this is extremely unsafe and will damage your database if you have an unsafe server shutdown. Just like with MyISAM there's no guarantee the database can be repaired in such a case, so you may have to restore from backups.

That's really not the right approach, though. If you want that sort of thing, stick with MySQL. It's designed for it, whereas PostgreSQL is "safety first". If you actually want your data to be safe, then you'll want to (a) Move to InnoDB tables and adapt your app so that it does all database-modifying work in transactions, then (b) port over to PostgreSQL if you want that as an option.

You'll never get the raw queries-per-second performance out of PostgreSQL that you do from MyISAM tables in MySQL. Of course, your data will never be that unsafe, either ;-) . If you're prepared to do the work to make your app smarter - let the database intelligently do work you're currently doing by reading stuff into your app and processing it app-side, etc - then you might find that you get better performance out of Pg, because it's very good at more complex and difficult queries and can often do things much faster than you can do with the data in your app.

So how does PostGreSQL compare to MySQL?

Posted Jan 1, 2010 20:28 UTC (Fri) by nevyn (guest, #33129) [Link] (1 responses)

However, there are some very commonly used MySQL extensions (like AUTO_INCREMENT) which you must do a different way in PostgreSQL

I've only ever used PostgreSQL, so I'm wondering how AUTO_INCREMENT is different? Looking at the docs. the only "interesting" property I could see was it's use with more than one primary key for the table (but I'd assume/hope this wasn't common).

So how does PostGreSQL compare to MySQL?

Posted Jan 2, 2010 2:13 UTC (Sat) by ringerc (subscriber, #3071) [Link]

Transactional behaviour.

Postgresql's sequences are lock-free and don't block other transactions. They allow high INSERT concurrency, but don't guarantee gapless sequences (ie: on ROLLBACK or statement failure, you might have a "missing" number, which you shouldn't care about for a primary key).

Looking at some MySQL documentation, though, it looks like for recent versions of InnoDB AUTO_INCREMENT works much like PostgreSQL's sequences. I don't know enough about the MySQL side to go into depth on any differences. It looks like InnoDB's AUTO_INCREMENT these days may behave very like a sequence if InnoDB has the configuration param "innodb_autoinc_lock_mode = 2" set, but that's about all I can say.

http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increm...

( PostgreSQL doesn't support statement level replication, so the statement-replication concerns about auto-increment don't apply ).

So how does PostGreSQL compare to MySQL?

Posted Jan 3, 2010 7:44 UTC (Sun) by nicku (guest, #777) [Link] (3 responses)

The other issue with PostgreSQL is the need for applications to avoid the need for full VACUUMs, which require an outage. All our PostgreSQL databases seem to eventually require this. Avoiding the need for full VACUUMs is a matter of application design, but we don't seem to have the hang of that yet :-)

So how does PostGreSQL compare to MySQL?

Posted Jan 3, 2010 9:34 UTC (Sun) by ringerc (subscriber, #3071) [Link] (2 responses)

VACUUM or something like it is pretty much required in any MVCC design, and should occur in the background without interrupting anything. It's a cost of operation that should be amortized as smoothly as possible across time.

If you're having to use VACUUM FULL rather than plain VACUUM to control table bloat, though, then something is wrong with your configuration. Are you using an ancient version of Pg without built-in autovacuum? Is your autovacuum set to be so conservative that it's useless? Are you running out of free_space_map room so VACUUM is losing track of where it needs to do work?

Some people configure autovacuum to run less, thinking that by doing so they'll somehow reduce the amount of work to be done. If you've done so, there's your problem. In fact, autovacuum should run MORE if you want your database faster, so the work is done promptly (avoiding table and index bloat) and spread evenly rather than occurring in big user-annoying chunks. It's like garbage collection in that regard.

See VACUUM FULL on the Pg wiki.

Note that vacuum and autovacuum improve with every version. 8.4 for example eliminates the free_space_map parameter, making it automatically (and better) managed by the server. If you're on an old version, upgrade. Unfortunately Pg isn't yet clever enough to fully automatically manage the autovacuum parameters, so you may still need to tweak autovacuum params on particularly UPDATE-heavy tables and/or tell autovacuum to run more frequently in general.

MVCC implementation

Posted Jan 3, 2010 20:21 UTC (Sun) by butlerm (subscriber, #13312) [Link] (1 responses)

<em>VACUUM or something like it is pretty much required in any MVCC
design</em>

Not quite. Oracle's MVCC, for example, works something like the following
(I believe InnoDB uses much the same scheme):

1. The latest version of any row is stored in the primary data block, along
with some transaction related meta data.
2. The information necessary to recover prior versions of any row is stored
in separate rollback segments
3. Changes to both are written to the redo log
4. When a query is processed, if the version of the row currently stored in
a data block isn't the one needed, a prior version is recovered by
referring to the appropriate rollback segment.
5. When a transaction commits, or soon thereafter, the rollback data for
that transaction is discarded. No disk I/O required.
6. Alternatively, if a transaction is rolled back, prior versions of
modified rows are recovered from the rollback segment and restored to their
original positions in the primary data blocks.

The main advantage of this scheme is that data blocks generally only get
modified once per transaction, rather than once initially and once some
time later, during the vacuum process. However, redo log overhead is
probably twice as high, and if a transaction is large or long enough, the
rollback entries will physically hit the the disk before they are discarded
as well.

MVCC implementation

Posted Jan 4, 2010 4:12 UTC (Mon) by ringerc (subscriber, #3071) [Link]

Good point.

A redo log does have a large disk I/O cost if you have a high update/delete volume and/or any long-running SERIALIZABLE transactions, but it doesn't require the same sort of work as Pg's in-place storage of superceded rows. On the other hand, it has a HUGE cost to queries - in the form of random disk I/O - if they have to hit the redo log on disk.

I don't know enough to comment on which is "better". I suspect that like most things, it's probably a trade-off where some workloads benefit from one and others from the other.