PostgreSQL: the good, the bad, and the ugly

Posted May 21, 2015 22:43 UTC (Thu) by jberkus (guest, #55561)
In reply to: PostgreSQL: the good, the bad, and the ugly by kleptog
Parent article: PostgreSQL: the good, the bad, and the ugly

So we haven't actually change the on-disk format for a while (since 8.3). That's not the real issue with upgrading, since pg_upgrade* will allow you to upgrade-in-place with a relatively short downtime, certainly not hours.

The issues with upgrading are:

1. Even 5 minutes of downtime is a lot for someone's production-critical database.

2. Applications have to be extensively tested for compatibility with the new versions.

Point 2 means that upgrading Postgres is a lot like upgrading the major version of your main programming language, something which people *also* put off doing for years. For example, RHEL still ships Python 2.6, as (I believe) does OSX. So any development plan which doesn't take into account that users will stay on the same version of Postgres for 3-5 years at a time is doomed to fail.

(* there's also that pg_upgrade isn't trusted by many users due to bugs in its past)

PostgreSQL: the good, the bad, and the ugly

Posted May 21, 2015 22:46 UTC (Thu) by andresfreund (subscriber, #69562) [Link]

> So we haven't actually change the on-disk format for a while (since 8.3).

I'd personally consider the catalogs a part of the on-disk format. It's just that we have a offline migration tool for it (pg_dump in binary upgrade mode).

PostgreSQL: the good, the bad, and the ugly

Posted May 21, 2015 22:58 UTC (Thu) by Lennie (subscriber, #49641) [Link]

"Applications have to be extensively tested for compatibility with the new versions."

So I guess having a good low-overhead replication system between versions helps with setting up test environments as well.

Which is easier than using a pg_dump and pg_restore.

PostgreSQL: the good, the bad, and the ugly

Posted May 21, 2015 23:12 UTC (Thu) by pizza (subscriber, #46) [Link] (5 responses)

> 1. Even 5 minutes of downtime is a lot for someone's production-critical database.

...then WTF are you upgrading a production-critical database at all?

Never, never, never do live upgrades of critical stuff.

PostgreSQL: the good, the bad, and the ugly

Posted May 22, 2015 0:10 UTC (Fri) by flussence (guest, #85566) [Link]

> ...then WTF are you upgrading a production-critical database at all?

Working under people who won't budget enough to allow the developers to do their job sanely?

PostgreSQL: the good, the bad, and the ugly

Posted May 22, 2015 19:04 UTC (Fri) by dlang (guest, #313) [Link] (3 responses)

> Never, never, never do live upgrades of critical stuff.

you don't always have the storage to be able to replicate everything to a new copy for the upgrade.

Even if you can, how do you make the new copy have everything the old copy has if the old copy is continually being updated.

At some point you have to stop updates to the old copy so that you can be sure the new copy has everything before you cut over to it. If you have real-time replication to a copy that's got the same performance/redundancy as your primary (the slony approach that's mentioned several times here), then you can make the outage very short.

But if you aren't setup that way, you have to either never convert, or convert in place.

PostgreSQL: the good, the bad, and the ugly

Posted May 22, 2015 22:14 UTC (Fri) by pizza (subscriber, #46) [Link] (2 responses)

> you don't always have the storage to be able to replicate everything to a new copy for the upgrade.

You mean to tell me that you don't have any sort of backup for your system, at all? "pg_dump | xz > dump.xz" takes less space than the live PG database.

Again, I stand by my assertion that performing live upgrades of mission-critical stuff is a horrendously bad idea. Justifying it with excuses about lacking sufficient resources to do this properly is even worse, because that tells me you're one failure away from being put out of business entirely.

> But if you aren't setup that way, you have to either never convert, or convert in place.

I'm not saying converting in place is necessarily the wrong thing to do.. just that doing it on a live system with no fallback option is insane -- What it there turns out to be an application bug with the new DB?

PostgreSQL: the good, the bad, and the ugly

Posted May 22, 2015 22:18 UTC (Fri) by andresfreund (subscriber, #69562) [Link]

> Again, I stand by my assertion that performing live upgrades of mission-critical stuff is a horrendously bad idea.

I think that's a statement pretty far away from reality. If downtimes cost you in some form or another, and dump/restore type upgrades take a long while due to the amount of data, in-place isn't a bad idea.

> just that doing it on a live system with no fallback option is insane -- What it there turns out to be an application bug with the new DB?

Why would inplace updates imply not having a fallback?

PostgreSQL: the good, the bad, and the ugly

Posted May 23, 2015 0:14 UTC (Sat) by dlang (guest, #313) [Link]

I didn't say there is no backup. I said that there was no extra system with all the CPU, RAM, fast disks, etc needed to run the full system in parallel while you are replicating.

A backup can be on much slower storage, and can be compressed (or without indexes that get created at reload time, etc)

There are lots of ways that a large database could be backed up so that it could be restored in a disaster that don't give you the ability to create a replacement without taking the production system down

PostgreSQL: the good, the bad, and the ugly

Posted May 22, 2015 10:43 UTC (Fri) by niner (subscriber, #26151) [Link] (3 responses)

"So we haven't actually change the on-disk format for a while (since 8.3). That's not the real issue with upgrading, since pg_upgrade* will allow you to upgrade-in-place with a relatively short downtime, certainly not hours."

Oh it can certainly take hours, if it finishes at all. Ever tried to upgrade a database containing > 90000 tables and about the same number of views and indexes? To my knowledge the only feasible way is to run multiple instances of pg_dump and then psql in parallel with xargs (in our case the tables are neatly divided into ~ 1200 schemas with no interdependencies). pg_upgrade would do a schema dump with pg_dump. This process cannot be parallelized and may run into all kinds of memory limits. And it takes hours or longer.

Database upgrades are pain and the only occasion where we have planned down time of our cluster :/

PostgreSQL: the good, the bad, and the ugly

Posted May 22, 2015 14:58 UTC (Fri) by andresfreund (subscriber, #69562) [Link] (2 responses)

It shouldn't take hours. There've been a couple known efficiencies (dpeendenciy walking, parallelism inside pg_upgrade) fixed. It'd be interesting to do the exercise again and tell us what's slow and help us fixing it.

PostgreSQL: the good, the bad, and the ugly

Posted May 22, 2015 15:35 UTC (Fri) by niner (subscriber, #26151) [Link] (1 responses)

Before I posted my comment, I started pg_upgrade 9.4.1 with a copy of our production database (currently running 9.2) on a test machine. It's been running since and spending its time in "Creating dump of database schemas". That's about 5 hours now.

I would certainly like to help improve pg_upgrade to a point where it becomes useful for us :)

The fixed known efficiencies, are they fixed in 9.4 or 9.5?
Where's the appropriate place to continue this discussion?

PostgreSQL: the good, the bad, and the ugly

Posted May 22, 2015 15:52 UTC (Fri) by andresfreund (subscriber, #69562) [Link]

That's obviously not how it should be. I'd suggest a bugreport or a mail to the hackers mailing list. The dependency computation inefficiency "unfortunately" is already fixed in 9.4.1.