September 21, 2010
This article was contributed by Josh Berkus
Version 9.0 of the PostgreSQL database management system was released on September
20, with considerably more "buzz" than a PostgreSQL release has had in a
while. The PostgreSQL Project does a major release every year, but this one
is special. It has more than a dozen major features and nearly 200 minor
improvements, so the release has more new goodies in it for database geeks than any release before it, hence the "9.0" version number.
It includes some things which users have been requesting for years, such as built-in replication. Given this large number of new features, I'm not even going to try to cover them all; instead, I'll pick five very different features as examples and you can read about the rest on PostgreSQL.org.
Binary replication
PostgreSQL 9.0 includes a new mechanism for replicating entire databases which uses binary logs to replicate between servers. Since this is very similar to Oracle's Hot Standby Databases, it is often also called "hot standby". While PostgreSQL already has several other replication tools, including Slony-I, pgPool2, and Bucardo, binary replication supports greater scalability and availability for the majority of PostgreSQL users. It is also simpler to configure and initialize for the most basic configuration: two identical databases with failover.
This feature is also expected to help push PostgreSQL in cloud hosting and software as a service (SaaS) environments. Overhead for each standby database is very low, allowing a single master to support tens or even hundreds of standby databases. Binary replication also works quite well together with virtual machine and filesystem cloning tools.
Binary replication works by copying the binary change logs, called "transaction logs", from one server to another. This builds on the mechanism used for "warm standby" in PostgreSQL for several years, with two important differences: you can now run queries on the standby server(s), and logs can now be shipped continuously over a database port connection. The latter also decreases latency between servers, allowing the standby server to be only a fraction of a second behind.
The new replication was a large, community-wide effort which took two years. Most of the work was done by developers working for 2nd Quadrant and NTT Open Source, but also includes contributions by developers for EnterpriseDB and Red Hat. Plans for 9.1 include administration tools, standby promotion, and synchronous replication.
New triggers
PostgreSQL applications frequently put a lot of programming into the database itself. This includes triggers, which are scripts that execute whenever data changes. PostgreSQL 9.0 expands this capability with two new kinds of triggers: column triggers and conditional triggers.
Column triggers are defined in the ANSI SQL Standard. These triggers execute only when data in that specific column changes. Imagine a database where you want to log a message every time a user changes their password, but not for other kinds of changes. You could define a trigger like so:
CREATE TRIGGER password_changed
AFTER UPDATE OF password ON users
FOR EACH ROW
EXECUTE PROCEDURE password_changed_log();
Conditional triggers are a PostgreSQL-only feature. A conditional trigger executes whenever a specific condition is met, but not otherwise. For example, imagine that you have a database of e-mail users and you want to freeze accounts whenever total storage goes over 5MB:
CREATE TRIGGER freeze_account
AFTER UPDATE OF mail_accounts
WHEN ( total_storage > 5000000 )
FOR EACH ROW
EXECUTE PROCEDURE freeze_account_full();
New security features
Given the widespread concerns about database security in the industry, it's no surprise that PostgreSQL is adding new features to control access to data. First, PostgreSQL now supports RADIUS authentication, in addition to password, SSL, LDAP, PAM, and GSSAPI authentication. Second, large objects in PostgreSQL now have settable access permissions.
But the biggest new security feature is new commands to set blanket access permissions across an entire database or schema. This will make it much easier for web developers to make effective use of database permissions and prevent SQL injection attacks and other exploits. The first of these commands is GRANT ON ALL, which allows you to set permissions for a specific type of object over the whole database:
GRANT SELECT,INSERT,UPDATE ON ALL TABLES IN cmsdata TO webcms;
The second command is SET DEFAULT PRIVILEGES, which allows you to define a set of permissions for each new database object created by users:
ALTER DEFAULT PRIVILEGES FOR ROLE webcms IN SCHEMA cmsdata
GRANT SELECT,INSERT,UPDATE ON TABLES;
Either of these tasks would previously required writing a script to iterate over each object in the database.
HStore: key-value PostgreSQL
The renaissance of non-relational databases, or "NoSQL movement", has had an influence on PostgreSQL as it has on everyone else. One such influence is the HStore optional module included in version 9.0, which implements a key-value store inside PostgreSQL.
HStore has been available in previous versions of PostgreSQL, but built-in limitations made it only useful for a trivial set of tasks. Those limitations are now gone and there are new features of HStore, like key manipulation functions, as well. Application developers can now use HStore for most key-value tasks instead of adding an additional key-value database to their application infrastructure.
For example, say you wanted to store user profile data as a set of key-value data. First, you'd load HStore, since it's an optional module. Next, you'd create a table:
CREATE TABLE user_profile (
user_id INT NOT NULL PRIMARY KEY REFERENCES users(user_id),
profile HSTORE
);
Then you can store key-value data in that table:
INSERT INTO user_profile
VALUES ( 5, hstore('"Home City"=>"San Francisco","Occupation"=>"Sculptor"');
Notice that the format for the HStore strings is a lot like hashes in Perl, but you can also use an array format, and simple JSON objects will probably be supported in 9.1. You probably want to index the keys in the HStore for fast lookup:
CREATE INDEX user_profile_hstore ON user_profile USING GIN (profile);
Now you can see what keys you have:
SELECT akeys(profile) FROM user_profile WHERE user_id = 5;
Look up individual keys:
SELECT profile -> 'Occupation' FROM user_profile;
Or even delete specific keys:
SELECT profile - 'Occupation' FROM user_profile WHERE user_id = 5;
UPDATE user_profile SET profile = profile - 'Occupation'
WHERE user_id = 5;
JSON & XML explain plans
Like other enterprise SQL databases, PostgreSQL allows users to examine how each query is to be executed in order to optimize database access. The query plan is called an "EXPLAIN plan", and previously was available only as an idiosyncratic text format. For example, a simple single-table sort query might be displayed like so:
Sort (cost=1.05..1.06 rows=305 width=11)
Sort Key: forum_name
-> Seq Scan on forums (cost=0.00..1.03 rows=305 width=11)
While the above text is fine for a database administrator who is troubleshooting specific queries, it is terrible for any form of automated processing, especially since the text changes slightly for each version of PostgreSQL. So developers Robert Haas and Greg Sabino-Mullaine added some new formats for EXPLAIN plans, including XML:
<explain xmlns="http://www.postgresql.org/2009/explain">
<Query>
<Plan>
<Node-Type>Sort</Node-Type>
<Startup-Cost>1.05</Startup-Cost>
<Total-Cost>1.06</Total-Cost>
<Plan-Rows>3</Plan-Rows>
<Plan-Width>11</Plan-Width>
<Sort-Key>
<Item>forum_name</Item>
</Sort-Key>
...
and JSON:
QUERY PLAN
[
{
"Plan": {
"Node Type": "Sort",
"Startup Cost": 1.05,
"Total Cost": 1.06,
"Plan Rows": 3,
"Plan Width": 11,
"Sort Key": ["forum_name"],
"Plans": [
...
This feature is designed to encourage the development of new tools to analyze explain plans and suggest database improvements.
Changes in the PostgreSQL project
It's not just the PostgreSQL database which is changing rapidly; there have been changes in the project and community as well.
During September, the PostgreSQL project is finally moving to the git version control system from CVS. This move was planned before the 9.0 release, but technical difficulties have prevented it from being completed yet; this may be the topic of another article. The PostgreSQL home page is also migrating from an ah-hoc web framework written in PHP to the popular Django web framework.
Most of all, though, what has changed is project processes and attitudes,
which have allowed the PostgreSQL project to add dozens of new contributors
over the last two years. Selena Deckelmann, PostgreSQL major contributor,
described the changes this way:
We started the process toward 9.0 last year when we added new committers and invited many new people into the CommitFest process (our way of getting lots of patches reviewed, approved and committed every two months). What we've found is that we can engage new developers by providing a clear way for them to help in small, well-defined ways.
As a group, we work really hard to recruit and maintain long-term
relationships with developers, and that investment in people has paid off
really well in 9.0. We have long term commitments from volunteers and
independent businesses to implement features that take multiple years to
see through to completion. The binary replication is a clear example of
that, and we have many other projects underway that are only possible
because developers trust our core development team to see them through.
The future of PostgreSQL
The PostgreSQL project's developers are already on to version 9.1; in fact, the second CommitFest for 9.1 has already started. Features currently under development include synchronous replication, SQL/MED federated tables, security label support, per-column collations, predicate locking, benchmarking tools, and new XML functions. Additionally, Google Summer of Code students created draft patches for JSON support and the MERGE operation.
"We have seen incredible increase in the volume of activity within the PostgreSQL community, in user base, development, and infrastructure." commented core team member Bruce Momjian.
The future certainly seems bright for what was once the "other open source database".
(
Log in to post comments)