|
|
Subscribe / Log in / New account

New features in PostgreSQL 9.1

May 2, 2011

This article was contributed by Josh Berkus

The first beta release of PostgreSQL 9.1, the next annual release of the well-known open source relational database, came out on May 2nd. The new version contains many new features sure to please the hearts of database geeks around the world. There are enough new features, in fact, that we can't cover them all in this article. Here's four of the more interesting ones:

Transaction-controlled synchronous replication

Last year, PostgreSQL 9.0 introduced asynchronous binary replication in PostgreSQL for the first time. Building on that foundation, version 9.1 includes an option for synchronous replication as well. Synchronous replication means that transactions do not return to the application until they have been received both on the master server and on a replica server. This ensures that data from any committed transaction cannot be lost even if the master server goes offline permanently.

This feature culminates a three-year-long development effort by NTT Open Source and 2ndQuadrant, which will allow Japanese telecommunications giant NTT to finally replace most of their Oracle servers with PostgreSQL. The PostgreSQL clustering project PostgresXC is the remaining part of this replacement effort.

There is also a substantial penalty in response time for synchronous replication, since the transaction needs to be written to disk on two servers before returning. To alleviate this, PostgreSQL goes beyond the synchronous replication offered by other database systems, and offers the ability to control whether commits are synchronously or asynchronously replicated on a per-transaction basis. This allows application developers to distinguish between critical data which must not be lost (like financial transactions), and less critical data for which response time is more important, to optimize the performance of the system.

Each replica (or "standby") gives itself a name in its recovery.conf file when it connects to the master:

    primary_conninfo = 'host=master01 user=replicator application_name=replica1'

Then the master sets a priority list of synchronous replicas in its postgresql.conf file:

    synchronous_standby_names = 'replica1'

Then you can commit a transaction as synchronous or asynchronous:

    -- commit synchronously to the standby
    SET synchronous_commit = 'on';
    UPDATE user_balance SET balance = 573.29 WHERE user_id = 40021;
    -- messages are not important, send asynchronously to the standby
    SET synchronous_commit = 'local';
    INSERT INTO user_messages VALUES ( 40021, 57843, 'HI!' );

PostgreSQL also supports having a priority list of synchronous replicas, thus supporting higher-availability configurations. The new pg_stat_replication database view allows database administrators (DBAs) to monitor the status of database replicas, and other administrative settings control what to do if the standby goes offline. In future versions, PostgreSQL plans to support different synchronization modes, as well as the ability to have "quorum" synchronization. The latter is for "synchronize to two out of these five servers" replication, for improved response time without lowering data integrity.

Unlogged tables

The burgeoning popularity of non-durable "NoSQL" databases like Redis and MongoDB, along with the well-established Memcached, have demonstrated that there is a class of data for which data loss after a crash is less important than response time. EnterpriseDB developer Robert Haas has added a feature to PostgreSQL to handle this kind of data: unlogged tables.

The word "unlogged" refers to the transaction log, which guarantees durability of writes. With an unlogged table, if the database server shuts down, on restart the table will be empty. However, by not syncing to disk, you can write to unlogged tables up to 20 times as quickly as to durable tables. They can also be thought of as global temporary tables or "in memory" tables.

Unlogged tables are intended to hold data which is high-volume but not very valuable. For example, many PostgreSQL users currently log website user session activity to Memcached. Unlogged tables are also useful for holding raw bulk load data during batch processing in data warehouses. Both of these uses will simplify life for PostgreSQL users by allowing them to reduce the number of single-purpose databases in their application stack.

SELinux integration

Another multi-year development effort that is being first released in PostgreSQL 9.1 is the unification of data access control in PostgreSQL and SELinux. NEC programmer Kaigai Kohei has spent the last three years developing SE-PostgreSQL, which integrates the concept of label-based data access rules into PostgreSQL. PostgreSQL is the first SQL database system to have this level of integration.

This feature is intended for high-security environments where even the DBA may have limited access to data, such as for national security databases. Security policies established at the system or network level through SELinux may now be applied to table, view, function, and column permissions. This enables the establishment of a single consistent security policy regardless of whether data is in files or inside the database.

Regular readers of LWN will be aware that SE-PostgreSQL had a lot of trouble getting integrated into mainstream PostgreSQL. Indeed, the SE-PostgreSQL which appears in 9.1 beta is almost completely rewritten from the original version. First, with the help of other hackers, Kohei rewrote all of the calls to SELinux permissions lookups as hooks which are enabled at PostgreSQL compile time, or not, in order to remove any impact on non-SE-enabled installations. Second, all of the management and tools for SE-PostgreSQL were moved to an optional loadable module.

Finally, certain actions had to be exempted from Mandatory Access Control because of difficulty of implementation. Chief among these is row-level access control, since there are a number of unresolved theoretical issues about how to implement it. This means that, of course, Kohei still has work to do.

Extensions and PGXN

Speaking of optional modules, one of the primary strengths of PostgreSQL has always been its extensibility. Users and developers have defined data types, operators, functions, and aggregates to support a variety of types of specialty data, including genomic, inventory, mathematical, astronomical, time series, and geological data. Other plug-ins support integration with queues, compatibility with other database systems, and experimental new features. The most famous plug-in to PostgreSQL is probably the PostGIS geographic database.

However, most of these "plug-ins" to PostgreSQL have been difficult to find, install, back-up, and upgrade, limiting their use or even causing users to reinvent them several times. That's changing with version 9.1.

First, Dimitri Fontaine of 2ndQuadrant has created a packaging concept in PostgreSQL 9.1 called Extensions. By packaging a collection of database objects as an Extension, the administration of adding and removing plug-ins, and most importantly upgrading them, becomes easily scriptable. Users can even create their own extensions for in-house database customizations which need to be versioned.

Second, David Wheeler of PostgreSQL Experts created PGXN, "The PostgreSQL EXtension Network". PGXN is patterned on online repositories for programming languages like Perl's CPAN and Ruby GEMs. PGXN will give PostgreSQL users a central place to find and download PostgreSQL plug-ins not distributed with the core database. Currently in development is a unified download and installation tool with dependency tracking, like CPAN or apt-get.

PostgreSQL development

The PostgreSQL project has been experimenting for the last couple of years with changes to its development cycle, patch review, and release processes. The biggest problem the project struggles with is the constant torrent of patches and feature ideas being submitted, which has been growing at a much faster rate than the pool of reviewers and committers has. 9.1, for example, will contain more new major features than 9.0 did. 9.0, in turn, had more features than 8.4.

As a result, contributors to the project frequently discuss tinkering with the development cycle in order to optimize reviewer, contributor, and developer time. A recent discussion grappled with those topics.

Should the project branch off version 9.2 right after beta or wait until the release candidate? Historically, PostgreSQL has refused to begin development on a new version before development on the prior version was completed, and that seems likely to continue to be the policy. PostgreSQL developers have found that working on more than one version at a time causes a great deal of backporting, as well as lack of attention to fixing bugs.

Should PostgreSQL continue to have CommitFests (periods of intensive focus on patch review and merging rather than development) every other month or move to a one-week-a-month cycle? Doing a week-a-month would give committers more of their own time as well as giving developers faster feedback. However, there are some serious doubts as to whether it's even possible to implement given the distributed, part-time nature of most of PostgreSQL's contributor base.

What should the schedule be for 9.2 development? This depends, of course, rather strongly on answers to both of the above. 9.0 and 9.1 both began development in July and released 12 to 14 months later. Some contributors feel that a regular annual schedule is a benefit while others don't find it important. This discussion is far from over, and may even result in changes which are tried and then reverted if they don't work.

In related news, the PostgreSQL code base has passed 1 million lines of code with version 9.1. And long-time project contributor Magnus Hagander has been elected to the PostgreSQL Core Team.

There are, as mentioned, many other features in this beta release, which you can read about in the release notes. Several of these features are innovations which will appear in PostgreSQL before any other relational database. Since this is a beta version, it is considered not yet ready for production use. The PostgreSQL developers expect the final version to be out in two to four months, depending on testing and bug-fixing.


Index entries for this article
GuestArticlesBerkus, Josh


to post comments

Development process issues

Posted May 2, 2011 18:37 UTC (Mon) by tchernobog (guest, #73595) [Link] (6 responses)

The biggest problem the project struggles with is the constant torrent of patches and feature ideas being submitted, which has been growing at a much faster rate than the pool of reviewers and committers has.

I know they are really used to it, but having a public bug-tracker would make development *much* easier in my opinion. Handling bugs in mailing lists always looked like a... er... "strange" idea to me. Dependencies, severity and history tracking, not to mention things like automatic commit monitoring and closing of bugs (for example, Redmine does it) are things that are not easy to do with plain-text e-mails. Also, everyone subscribed to MLs receive the whole corpus of all bugs, and searching for dupes is fairly harder.

Development process issues

Posted May 2, 2011 21:41 UTC (Mon) by andresfreund (subscriber, #69562) [Link]

While I don't disagree with the notion that a "real" bugtracker might be useful I doubt that it would solve the current shortage of experienced developer's time.

Development process issues

Posted May 3, 2011 1:20 UTC (Tue) by jberkus (guest, #55561) [Link] (1 responses)

There are other reasons why having a public bugtracker would be a good idea. Speeding up reviewing is definitely not one of them though. New features are not bugs, and vice-versa.

We *do* have a web tool for reviewing: http://commitfest.postgresql.org

FWIW, the bugtracker thing is a longstanding PostgreSQL project issue. Unfortunately, half the contributors don't want one, and the other half can't agree on which bugtracker to use. We'll get one eventually, it's just not going to be soon.

Development process issues

Posted May 17, 2011 21:01 UTC (Tue) by oak (guest, #2786) [Link]

> New features are not bugs, and vice-versa.

That's semantics. One person's bug is another person's feature, and vice-versa. Also, fixing bugs properly sometimes requires implementing new features.

Btw. AFAIK Mozilla uses Bugzilla for tracking features / roadmaps and Bugzilla has extension(s) for task time estimation & tracking if one wants to add some project management on top of it...

Development process issues

Posted May 4, 2011 16:10 UTC (Wed) by josh (subscriber, #17465) [Link] (2 responses)

The Git project doesn't use a bug tracker either, and they claim it as a feature. To summarize their opinions:

Benefit of a bug tracker: bugs and feature requests don't get lost when people stop working on them. Downside of a bug tracker: bugs and feature requests don't get lost when people stop working on them.

Bug trackers do incur a certain management overhead, and bugs can rot in a bug tracker if that management doesn't occur.

Development process issues

Posted May 5, 2011 4:16 UTC (Thu) by pabs (subscriber, #43278) [Link] (1 responses)

I think the usefulness of a bug tracker really depends on how active a project is. For Linux, Git and PostgreSQL they are less useful than in that random game that got abandoned a number of years ago that you want to revive. In the latter situation a pre-existing roadmap (bug list) can be useful.

Development process issues

Posted May 6, 2011 16:00 UTC (Fri) by giraffedata (guest, #1954) [Link]

There's another benefit to a bug tracker: it relieves some of the need for developers to work on the bug at all.

When I report bugs in bug trackers, even with fixes, I rarely expect to see a fix in the distribution. I just want my work to be available to other users so they don't have to duplicate it. Users can see "it's not just me, the code/documentation is broken" and find workarounds and patches to fix it locally.

In fact, often the bug I reported is already in the database, but by using different words to describe it, I make it easier for the next user not to miss it.

In a project where developers keep up with bug reports (and actually respond rather than just dismissing them), a bug tracker is probably a net loss to the community compared to email.

New features in PostgreSQL 9.1

Posted May 2, 2011 19:33 UTC (Mon) by rleigh (guest, #14622) [Link] (1 responses)

The PGXN work looks very interesting; I'll certainly be looking at getting my postgresql-debversion datatype to use it.

Will the extension code be backported to 9.0 (or earlier) so that it can be used to aid upgrades to 9.1?

New features in PostgreSQL 9.1

Posted May 2, 2011 21:35 UTC (Mon) by andresfreund (subscriber, #69562) [Link]

> Will the extension code be backported to 9.0 (or earlier) so that it can be used to aid upgrades to 9.1?
No, that would be way to invasive. But you can do CREATE EXTENSION ... FROM unpackaged; in the 9.1 install after restoring to convert an unpackaged extension into a packaged one.

http://developer.postgresql.org/pgdocs/postgres/sql-creat...
and
http://developer.postgresql.org/pgdocs/postgres/extend-ex...

Synchronous replication! Yay!

Posted May 2, 2011 22:17 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Synchronous replication! Yay! At last!!!

I think I'm in love with PostgreSQL. I can finally ditch the last our DB2 servers.

Synchronous replication! Yay!

Posted May 5, 2011 17:27 UTC (Thu) by intgr (subscriber, #39733) [Link] (2 responses)

Not sure what kind of synchronous replication you're looking for, but note that PostgreSQL 9.1's synchronous replication only guarantess durability of replicated data, not query consistency between master and slaves.

In simpler terms, it guarantees that no committed data will be lost when the master crashes, but slave nodes can still return stale data after rows are modified/deleted on the master. Replication lag still exists (though to a lesser degree).

Synchronous replication! Yay!

Posted May 5, 2011 20:25 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Ok, I'm confused.

What happens when I commit a transaction? I understand that by the time COMMIT completes, the data is replicated to all the slaves. Right?

Synchronous replication! Yay!

Posted May 6, 2011 7:30 UTC (Fri) by intgr (subscriber, #39733) [Link]

Yes, by the end of COMMIT, the WAL data from the transaction is replicated to the slaves, but changes from the WAL haven't been applied to slave databases yet -- so not visible to queries. Applying/recovering the WAL on slaves still takes place asynchronously.

In the event of a failover/promotion, PostgreSQL applies all outstanding WAL changes, so no data is lost.


Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds