April 15, 2009
This article was contributed by Bruce Byfield
The PostgreSQL project has released the beta of
its 8.4 version, with the final release expected in late June or July. Like
previous PostgreSQL releases, 8.4 features dozens of enhancements
throughout the code, but Bruce Momjian, a member of PostgreSQL's core team
since the project was founded in 1996, described it as a "more surgical
release" with improvements tending to cluster in select areas. In
particular, that area seems to be administrative features. However, almost
as interesting as the features in the beta are the ones that failed to meet
the cut this time, and the project's struggle to control the release
process as it undergoes major growth.
Momjian explained the release's more targeted approach as the result of the
project's advanced state. "You see a real consolidation in this
release," he said, "and I think that's because you're seeing a
much more complete feature set. You're seeing a real maturity of the code
base, which is kind of surprising, because, if you looked a few years ago,
you'd see changes all over the map."
At the same time, Momjian characterized the PostgreSQL code as being
constantly revised, so that the project is unlikely to need any time soon the kind of major revision that KDE underwent last year.
That's
always been a fear of ours, but we've actually never had to do it. And I
think the reason is that, in general, we're always restructuring our source
code. So we're always having to re-engineer things and clean them up.
For example, by the end of the Windows port [in 2003], the code was
cleaner than when we started. You'd think that after adding a Windows port,
the code would be just — you know — spaghetti code central,
right? But we end up abstracting a lot of the Unix-specific behavior into a
portability library. so now, a lot of the assumptions that you make about
Unix are now codified in a separate place. And then you put the Windows
pieces in there, and it works really well, too.
Nor is the project
hesitant about altering behavior or deprecating legacy code in the name of
what Momjian called "high standards and the promise of
reliability."
New features
PostgreSQL releases tend to be one to two years apart, with far too many
features to mention in any detail. Many of these features are highlighted
in a PDF presentation by Momjian entitled, "Upcoming
PostgreSQL Performance Features [PDF]", including Column Level Permissions
and Per-Database Locales. However, if you ask active contributors what the
major enhancements are in the 8.4 release, their answers emphasize
automated administrative features.
Selena Deckelmann, user groups leader in the project, emphasized changes to
Free
Space Map, which maps unused space in a database. In previous releases,
PostgreSQL could only detect newly freed space by an administrator manually
running the Vacuum
utility. By contrast, in the 8.4 release, freed space is re-mapped
automatically, saving both time and effort.
Deckelmann also called attention to a new feature called Visibility
Map. While in previous releases, Vacuum had to re-map all rows in a
table, regardless of whether they had changed, Visibility Map improves
performance by allowing the utility to skip rows that have not changed.
Yet another automated feature mentioned by Deckelmann is Auto
Explain, which captures the explain plan for a
query — that is, how the query finds results — and saves the
results, information that can be used to improve system load and the
efficiency of queries, and, ultimately, to reduce the costs of a database
operation.
"It's something that a lot of admins end up doing anyway,"
Deckelman observed, "They write a script that looks for long-running
queries, and then they go in and manually figure with each one what's
going on. It's kind of a neat feature that came from the Japanese
PostgreSQL team."
Josh Berkus, another core team member, noted that Oracle's Statspack, a set
of tools to generate statistics useful to database administrators, inspired
pgstat. As Berkus
explained the situation, users migrating from Oracle expect to see
comparable tools in PostgreSQL, and Statspack:
...allows you to see
exactly what's going on with your server internally in terms of how much
memory it's using, what queries it is running, and all those other things
that you need to know so that, when the load on the server starts going up,
you know what to do about it. We've had some hackish tools for a long time,
and have had some sophisticated activity logging, but activity logging is
not very interactive. So we're adding some new interactive tools. We're
trying to achieve an analog of what Statspack provides.
Berkus noted, though, that pgstat will not be stable enough to be installed
by default in 8.4. Instead, it will be shipped in postgresql-contrib, the
repository for tools that, for one reason or the other, are not part of the
regular installation. Some tools in contrib may be too specialized for most
installations, or illegal to ship under American restrictions on the
exporting of cryptographic tools, while others, like pgstat, are still in
development.
According to Deckelmann, statistics in PostgreSQL receive another boost in
the 8.4 release with increased control over them. To reduce system
overhead, statistics are no longer automatically collected and written to
a file, operations that can have significant system overhead if done
regularly. Instead, statistics can now be configured to run from a RAM disk
to improve performance.
However, if the latest release has a single outstanding feature, it is
parallel restore. As the name suggests, parallel restore allows admins to
restore a database with multiple processes, rather than a single thread.
Berkus, who runs his own PostgreSQL consulting business, said,
"[Parallel restore] is the feature i've been making the most use
of. I've been using it in beta already — pretty heavily. Because if
you have 300 gigabyte databases, upgrading them single-threaded is lethally
slow. It's a real issue."
He went on to describe parallel restore as a particularly difficult feature
to implement, all the more so because it had to accommodate past changes in
the PostgreSQL file format. While employed as a supervisor at Sun
Microsystems, Berkus said, he had two employees working on a similar
feature for a year and a half "without coming up with more than a
rough prototype." By contrast, PostgreSQL developed its version in
three weeks, followed by three months of debugging.
Features for the next release
As mentioned earlier on LWN,
PostgreSQL's core team had hopes of other major features being in the 8.4
release, particularly Hot Standby and SE-PostgreSQL.
Hot Standby is a feature that is a major step in improving PostgreSQL's
replication. Replication is an area in which PostgreSQL lags behind MySQL;
it has been identified by the
core team as one of the major priorities for the project, according to
Berkus. PostgreSQL does have some replication, Berkus said, but it is
needlessly complicated and "not for someone who can't make a large
time investment in finding out how it works. And that's a real problem for
the sort of average-case web developer that has two servers and just wants
to make sure that PostgreSQL is mirrored."
Hot Standby is an important improvement in replication that allows
administrators to run queries on a database that is being recovered from an
archive. The module allows replication of the database logs in order to
create read-only duplicates of the database. Unfortunately, funding for
development of the feature only came through in August 2008, leaving only a
few months before the November code freeze. Nor could Hot Standby be
finalized in the extended testing period that followed.
SE-PostgreSQL is an even greater innovation, which will add the SE Linux
security model, making PostgreSQL the first database to use the same
security model as many distributions, such as Fedora. The problem is,
Berkus explained
Because we are the trendsetters, it is very hard
to advance in this area. There's no standard defined syntax, and all the
papers on the topic are highly academic and speculative, so it means that
we really have to spend a lot of time implementing things down to the API
level and spending long periods of time arguing over details of which
security features should be implemented in any context.
Having failed to be ready for the 8.4 release, both Hot Standby and
SE-PostgreSQL are high on the priority list for the next release. In fact,
according to Momjian, 8.4 is partly designed to ease their later
implementation. Momjian suggested that, given the maturity of PostgreSQL's
code, these might be simply the first of many new features that are too
large to be implemented in a single release. Meanwhile, he planned on
promoting SE-PostgreSQL in the release notes in the hopes of
encouraging interest in it and perhaps attracting a few developers with SE
Linux experience.
"To continue to grow and continue to have the sort of reputation we
have, you have to make some hard decisions," Momjian said. Like
Berkus and Deckelmann, Momjian clearly regretted the necessity for omitting
these features. However, having faced long delays because of the refusal to
jettison unfinished features before — notably in the implementation
of the Windows port in version 7.4 and more generally in the sheer number
of features in version 8.3 — the PostgreSQL core team has learned the
hard way not to insist that a release include all the features they hoped
it would have.
Lessons from the release
Besides the features, the 8.4 release has been important to the PostgreSQL
team in its effort to regain control of its development processes. The project has
always had an extensive review process, in which veteran developers were
counted on for the final review of contributions. However, ever since the
8.3 release, Momjian explained:
We're getting really major, large,
complex patches almost every week. And obviously that's very hard for the
veterans to digest. One of the challenges is that we've not grown our
veterans' group as quickly as we have our submitters. It's like a snake
swallowing a mouse — it takes a while to go through. You have this
bulge of activity, and it's really struggling just to digest so many
complex patches.
To make matters worse, the review process can be daunting, especially to
new contributors. Consequently, many were working in private and submitting
their code only at the last moment before a freeze.
To reduce these problems, PostgreSQL implemented a series of what it calls
CommitFests — alternate months in which no contributions are allowed,
and project members concentrate on reviewing existing contributions. This
solution helped reduce the problems, but did not eliminate them.
In addition, Berkus said that CommitFests did not address complaints he had
heard about people who would review a patch, only to find that a veteran
had reviewed and submitted the patch before they had finished. Another
challenge was how to enlist the aid of those who said that they were
willing to help review out of public-spiritedness, but did not want the
trouble of picking out patches to review by themselves. Solving such issues
seemed central to increasing the efficiency of reviewing and in keeping
CommitFests from extending beyond their originally allotted time.
Realizing that greater efforts were needed, the core team is now using a
wiki to coordinate each CommitFest. In addition, over the last half year or
so, Berkus created a team of what he calls "Round-Robin
Reviewers" — those who wanted their reviews assigned. He
also implemented reviews for routine matters such as patch structure by
less experienced contributors. The core team and veterans still have to do
the majority of the work, but the process has reduced the work load, and
added 25 reviewers to the previous pool of 15.
The changes can still make for a slow process, but Berkus apparently judges
them a qualified success, noting that the 8.4 release will likely take
several months less than the 8.3 release.
Conclusion
Overall, Momjian seems satisfied with the general direction of
PostgreSQL. He even suggested that the project is starting to gain
increased recognition.
In the past year or so, I'm been seeing PostgreSQL put up as —
I won't say a model open source project, but one that's getting the kind of
respect for office disruption that OpenOffice is getting. It is starting to
be seen as a valid competitor to Oracle. I've heard people say, 'What is
the business case for buying Oracle when 80% of its functionality can be
found in PostgreSQL, and in some cases it's easier to use and easier to
administer, and, in most cases, cheaper?'
Momjian speculated that
the recession may be driving this increased respect.
Berkus pointed, too, to the increased sponsorship of the program, which now
means that more developers are being paid to work on the project full-time,
and to the increased number of PostgreSQL conferences and user groups world
wide.
"Compared to other open source databases, we are still developing
very quickly," Berkus said. "We still have developer
momentum." As you look over the feature list for 8.4 as well as the
features that were dropped, it is hard to disagree. While stopping short of
being revolutionary, the new release suggests sustained, steady progress in
development.
(
Log in to post comments)