|
|
Log in / Subscribe / Register

So how does PostGreSQL compare to MySQL?

So how does PostGreSQL compare to MySQL?

Posted Jan 3, 2010 7:44 UTC (Sun) by nicku (subscriber, #777)
In reply to: So how does PostGreSQL compare to MySQL? by ringerc
Parent article: The ongoing MySQL campaign

The other issue with PostgreSQL is the need for applications to avoid the need for full VACUUMs, which require an outage. All our PostgreSQL databases seem to eventually require this. Avoiding the need for full VACUUMs is a matter of application design, but we don't seem to have the hang of that yet :-)


to post comments

So how does PostGreSQL compare to MySQL?

Posted Jan 3, 2010 9:34 UTC (Sun) by ringerc (subscriber, #3071) [Link] (2 responses)

VACUUM or something like it is pretty much required in any MVCC design, and should occur in the background without interrupting anything. It's a cost of operation that should be amortized as smoothly as possible across time.

If you're having to use VACUUM FULL rather than plain VACUUM to control table bloat, though, then something is wrong with your configuration. Are you using an ancient version of Pg without built-in autovacuum? Is your autovacuum set to be so conservative that it's useless? Are you running out of free_space_map room so VACUUM is losing track of where it needs to do work?

Some people configure autovacuum to run less, thinking that by doing so they'll somehow reduce the amount of work to be done. If you've done so, there's your problem. In fact, autovacuum should run MORE if you want your database faster, so the work is done promptly (avoiding table and index bloat) and spread evenly rather than occurring in big user-annoying chunks. It's like garbage collection in that regard.

See VACUUM FULL on the Pg wiki.

Note that vacuum and autovacuum improve with every version. 8.4 for example eliminates the free_space_map parameter, making it automatically (and better) managed by the server. If you're on an old version, upgrade. Unfortunately Pg isn't yet clever enough to fully automatically manage the autovacuum parameters, so you may still need to tweak autovacuum params on particularly UPDATE-heavy tables and/or tell autovacuum to run more frequently in general.

MVCC implementation

Posted Jan 3, 2010 20:21 UTC (Sun) by butlerm (subscriber, #13312) [Link] (1 responses)

<em>VACUUM or something like it is pretty much required in any MVCC
design</em>

Not quite. Oracle's MVCC, for example, works something like the following
(I believe InnoDB uses much the same scheme):

1. The latest version of any row is stored in the primary data block, along
with some transaction related meta data.
2. The information necessary to recover prior versions of any row is stored
in separate rollback segments
3. Changes to both are written to the redo log
4. When a query is processed, if the version of the row currently stored in
a data block isn't the one needed, a prior version is recovered by
referring to the appropriate rollback segment.
5. When a transaction commits, or soon thereafter, the rollback data for
that transaction is discarded. No disk I/O required.
6. Alternatively, if a transaction is rolled back, prior versions of
modified rows are recovered from the rollback segment and restored to their
original positions in the primary data blocks.

The main advantage of this scheme is that data blocks generally only get
modified once per transaction, rather than once initially and once some
time later, during the vacuum process. However, redo log overhead is
probably twice as high, and if a transaction is large or long enough, the
rollback entries will physically hit the the disk before they are discarded
as well.

MVCC implementation

Posted Jan 4, 2010 4:12 UTC (Mon) by ringerc (subscriber, #3071) [Link]

Good point.

A redo log does have a large disk I/O cost if you have a high update/delete volume and/or any long-running SERIALIZABLE transactions, but it doesn't require the same sort of work as Pg's in-place storage of superceded rows. On the other hand, it has a HUGE cost to queries - in the form of random disk I/O - if they have to hit the redo log on disk.

I don't know enough to comment on which is "better". I suspect that like most things, it's probably a trade-off where some workloads benefit from one and others from the other.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds