|
|
Subscribe / Log in / New account

Development

Removing the PostgreSQL contrib tree

By Jake Edge
June 10, 2015

The PostgreSQL contrib tree contains a number of useful tools and other features that are not part of the database core for various reasons. In some ways, the contrib tree can act something like the kernel's staging tree, providing a way to try out and improve features before they get "promoted" into the PostgreSQL core. But the tree is shipped in parallel with the core, adding to the maintenance burden for the core developers, so a reorganization of the contents of contrib has been proposed—with the idea of eliminating the contrib tree itself entirely.

Joshua D. Drake raised the idea of eliminating contrib in a post to the pgsql-hackers mailing list. In his post, he prefaced his proposal by noting the explanation from the documentation of why the contents of the contrib directory are not part of the core: "mainly because they address a limited audience or are too experimental to be part of the main source tree". That, however, "does not preclude their usefulness", it continues. In addition, Drake said, the contrib tree has long been considered to be the place to keep features that are eventually destined for the core.

The discussion of an auditing extension (pg_audit) that was committed to contrib by Stephen Frost in mid-May (which has since been reverted) was the proximate cause of Drake's proposal. The discussion started in the pgsql-committers mailing list but soon moved to pgsql-hackers. Some developers were not pleased with the code and documentation quality of pg_audit, were concerned about potential security holes it introduced, and were unhappy with how it got committed, which they thought circumvented the normal PostgreSQL community development process.

If pg_audit (and, by extension, other similar efforts) were not committed into the contrib tree in parallel with the core, some of these problems would go away, Drake argued. The code and documentation quality questions, along with any circumvention of the normal development process, would be non-issues if these new features were put into some other repository. That would also reduce the amount of code that the core developers need to maintain and test. He suggested going through the list of 45 modules to determine which should move into the core and which should move to either a new project (perhaps called "contrib") elsewhere or, if they are extensions, to the PostgreSQL Extension Network (PGXN). He suggested the criteria for where the contrib modules ended up come down to having a visible community (so, narrowly focused features would not make the cut) as well as having been included into contrib for at least two releases.

Frost agreed that some clarification of the mission of contrib is probably in order, but he noted that there is no location in the existing core tree where extensions could be placed. An "extensions" directory could be created, of course. He also wondered how the new contrib project would differ from PGXN. In addition, Peter Eisentraut concurred with the general idea, but suspected it would be a big undertaking.

But Drake didn't think the problem was all that large and claimed that it should be "obvious" to simply include many of the contrib modules into core. Another option might be to freeze contrib: "What is in there now, is all there will ever be in there and the goal is to slowly reduce it to the point that it doesn't matter."

Frost was not in favor of the freezing approach. But he did go through the list making recommendations on which modules belonged in the core. For his part, Drake largely agreed with Frost's suggestions. But others are not so sure. Fabian Coelho wondered if there was enough benefit to go through the exercise: "Reaching a consensus about what to move here or there will consume valuable time that could be spent on more important tasks... Is it worth it?" Jeff Janes was also concerned about how users would decide which modules and extensions to trust.

But Drake would be willing to see all of contrib move into the core and be installed by default as part of the standard installation. He simply doesn't think that the contrib distinction in the main tree is useful:

I care about this idea that contrib exists. It isn't needed and leads to a discussion like this one (or the pg_audit), almost every release.

Contrib made sense years ago. It does not any longer. Let's put the old horse down and raise a new herd of ponies on a new pasture.

On the other hand, though, Robert Haas thinks that it doesn't make sense to talk about getting rid of contrib when each new PostgreSQL release adds new modules to it. Those features are generally "pretty good stuff". There needs to be a place for that code: "We wouldn't have been better off rejecting it, and we wouldn't have been better off putting it into the main tree." Contrib has already been cleaned up along the way, he said, and simply renaming contrib is not particularly productive either. Even just categorizing the modules may not be as straightforward as it seems:

One thing that may be worth doing yet is separating the code that is just intended as a POC [proof of concept] (like worker_spi, auth_delay and test_decoding) from the stuff that you are really intended to run in production (like tcn and hstore). But that distinction is fuzzier than you might think, because while auth_delay was intended as a POC, I've subsequently heard rumors of it being used in production with satisfactory results. It's very easy to get the idea that you know "what PostgreSQL users use" but usually that tends to mean "what I use" and the community is broad enough that those things are Not The Same.

Haas sees contrib as a place for "things we want to include in the core distribution without baking them irrevocably into the server". Trying again, Drake restated his position, trying to clarify some of the points that Haas and others had disagreed with. But Andres Freund didn't agree with Drake's reasoning, nor did Haas.

One of the main reasons Drake cites for the change is the disagreement about pg_audit, which is, he said, the same argument that has come up frequently over the last fifteen years. But both Freund and Haas see that issue differently. Moving things into core will just move the argument from what goes into contrib to what goes into core, Freund said. Haas was more specific:

The argument about pg_audit has little to do with contrib. It is primarily about code quality, and secondarily about whether one committer can go do something [unilaterally] when a long list of other committers and contributors have expressed doubts about it.

As might be guessed, Frost did not agree with that characterization. By then, however, he had already reverted the change.

In another sub-thread, Jim Nasby elaborated on the concern raised by Janes: users (and distribution packagers) tend to trust the contrib tree because it comes with PostgreSQL. But that could all be made more explicit, Nasby said:

I think the real problem here that we're skirting around is this idea of 'blessed extensions', because that's really the only user benefit contrib brings: the idea that this stuff is formally blessed by the community. If that's really what we're after then we should just be explicit about that. Then we can decide if the best way to approach that is keeping it in the main repo (as opposed to say, publishing a list of [explicit] PGXN package versions and their checksums).

Personally, I'd rather we publish a list of formally vetted and approved versions of PGXN modules. There are many benefits to that, and the downside of not having that stuff as part of make check would be overcome by the explicit testing we would need to have for approved modules.

There was general agreement with Nasby's idea; a vetted list of extensions and other modules would be highly useful. In fact, Neil Tiffin said that he would never install anything from PGXN because it is hard to tell "what is actively maintained and tested, and what is an abandoned proof-of-concept or idea". Furthermore, it is not clear what version of PostgreSQL a PGXN module runs on or has been tested on, he said. That is a problem for open-source software in general, David E. Wheeler pointed out. Beyond that, the PGXN Tester does provide some of the information Tiffin is seeking.

Overall, it would seem that there aren't too many other core developers who see the problems the way that Drake does. The changes he suggested may not even address the main problem he sees. But there clearly are some changes that could be made—Nasby's suggestion chief among them. Whether a list of vetted and "blessed" extensions becomes a reality is unclear; no one seemed to volunteer for that particular mission. On the flipside, though, it is clear that not everyone is on the same page about the purpose of the contrib tree; that is probably worth clarifying one way or another.

Comments (none posted)

Brief items

Quotes of the week

Demos, the great God of demonstrations and examples. He preferred to teach other Gods, as opposed to humans. Usually he could be found skulking nearby or with API, the God of documentation.

Conf, the lesser known God of configurations and setups. He made sure that everything was well defined in a single place in a concise format, such as acceleration due to gravity, etc.

"hnyc" (Thanks to Paul Wise)

I guess that's why the GNU autoconf/configure system has always advised testing for particular wanted features, instead of looking at versions and then relying on carnal knowledge to know what those versions imply.
Neil Jerram, on reinventing build-system wheels (Thanks to James Bottomley)

Comments (none posted)

GNU Octave 4.0.0 Released

GNU Octave, which is a high-level programming language for numerical computations that is largely compatible with MATLAB, has made its 4.0 release. There are lots of new features in this major release, which are described in the release notes. Some of those features include defaulting to the graphical user interface instead of the command-line interface, OpenGL graphics and Qt widgets by default, a new syntax for object-oriented programming using classdef, audio functions, better MATLAB compatibility, and more.

Full Story (comments: 14)

Git v2.4.3 available

Git 2.4.3 has been released. Although primarily a bugfix release, this update does roll in several new minor features and a host of documentation updates. The fixes also include some user-visible changes, such as correcting the broken "verbosity" behavior found in git clone 2.4 and fixing the SSH transport-initiation code.

Full Story (comments: none)

Discourse 1.3 released

Version 1.3 of the Discourse web-commenting framework is now available. The new release supports desktop-notification integration, anonymous commenting, and a "civilized" mute function for ignoring specific commenters. Whether the latter two features are related is left up to the reader to decide.

Comments (none posted)

Newsletters and articles

Development newsletters from the past week

Comments (none posted)

As open source code, Apple's Swift language could take flight (ITWorld)

ITWorld reports that Apple will release its Swift programming language under an open source license. "When Swift becomes open source later this year, programmers will be able to compile Swift programs to run on Linux as well as on OS X and iOS, said Craig Federighi, Apple’s head of software engineering, during the opening keynote of Apple’s Worldwide Developers Conference Monday in San Francisco. The source code will include the Swift compiler and standard library, and community contributions will be “accepted—and encouraged,” Apple said."

Comments (28 posted)

Inside NGINX: How We Designed for Performance & Scale

The folks behind the NGINX web server have put up a highly self-congratulatory article on how the system was designed. "NGINX scales very well to support hundreds of thousands of connections per worker process. Each new connection creates another file descriptor and consumes a small amount of additional memory in the worker process. There is very little additional overhead per connection. NGINX processes can remain pinned to CPUs. Context switches are relatively infrequent and occur when there is no work to be done."

Comments (14 posted)

Page editor: Nathan Willis
Next page: Announcements>>


Copyright © 2015, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds