By Jonathan Corbet
February 17, 2010
As some LWN readers will know, this site is implemented with a combination
of free technologies, including the Python language and the PostgreSQL
relational database management system. Anybody who has tried to combine
those two tools will have encountered the variety of modules out there
intended to serve as the glue between them. It's the sort of variety that
nobody wants, though: lots of options, none of which has the full support
of the community or works as well as one might like. The good news is that
this situation may not last a whole lot longer.
The conversation started when Bruce Momjian noted that the state of Python/PostgreSQL
support was not as good as it could be. The PostgreSQL Python page
and the Python PostgreSQL
page agree on one thing: there are several adapters available, none of
which is truly dominant, but many of which are seemingly unmaintained.
How, he asked, is a developer to choose between them? Your editor, who has
tried a number of them, could only nod in sympathy. Bruce requested:
What is really needed is for someone to take charge of one of these
projects and make a best-of-breed Python driver that can gain
general acceptance as our preferred driver. I feel Python is too
important a language to be left in this state.
The purpose of a database adapter module is to make the capabilities of the
database available to Python applications. To that end, it accepts SQL
queries from the application, passes them to the database, and hands the
results (if any) back to the application. Application writers like the
idea of database independence; it holds the promise of being able to move
easily to a different database should the need arise. To enable this
independence, language development communities define standards for how
database adapters should operate. The Python version of this standard is
the Python Database API
Specification, often called DB-API.
One of the problems, as identified by Greg
Smith, is that the DB-API fails to cover much of the needed functionality,
meaning that each adapter ends up making its own (incompatible, naturally)
extensions. One of your editor's favorite quirks is the specification of
five different "styles" by which parameters can be passed into queries; the
application is expected to support all five and use whichever one the
adapter is prepared to accept that day. The end result of all this is that
adapters tend to diverge from each other, portability between them is
problematic, and none becomes the standard.
That said, there currently seem to be two serious competitors in this area:
- Psycopg almost certainly has
the widest support among Python applications. It is reasonably solid
and performs well, but some potential users may have been daunted by
the fact that its web page took the form of an anti-Trac rant for some
time (it's still in
the Google cache as of this writing).
- PyGreSQL has been around for a
long time; it predates the DB-API and still does not implement it
completely. Development on the code has been slow for some time, and its
performance is not as good as Psycopg.
One might think that Psycopg would be the clear leader among these two, and
it would have been, except for one little problem: Psycopg is GPL with a
bunch of exceptions. The PostgreSQL community feels fairly strongly about
permissive licenses, to the point that a GPL-licensed adapter is seen as a
deal breaker. So Greg lamented:
If everybody who decided they were going to write their own from
scratch had decided to work on carefully and painfully refactoring
and improving PyGreSQL instead, in an incremental way that grew its
existing community along the way, we might have a BSD driver with
enough features and users to be a valid competitor to psycopg2.
But writing something shiny new from scratch is fun, while worrying
about backward compatibility and implementing all the messy parts
you need to really be complete on a project like this isn't, so
instead we have a stack of not quite right drivers without any
broad application support.
As a way toward a solution, Greg put together a
Python PostgreSQL driver TODO page describing the issues with both
Psycopg and PyGreSQL. For Psycopg, wishlist items included some testing,
refactoring, and documentation work. The list for PyGreSQL is longer and
more daunting. The conclusion found on that page is:
A relicensed Psycopg seems like the closest match to the overall
goals of the PostgreSQL project, in terms of coding work needed
both in the driver and on the application side (because so many
apps already use this driver).
Authors of GPL-licensed code often tend not to react well to requests for
relicensing. That can be true even in cases like a database adapter, which
is normally a relatively small component in a much larger application. In
this case, though, Psycopg hacker Federico Di Gregorio acknowledged that,
perhaps, GPL wasn't the best license for this module. So, he has announced, the next Psycopg release will carry
the LGPLv3 license (plus the obligatory exceptions involved in using
libssl) instead. That is probably enough to tip the scales in that
direction and, finally, lead to a situation where there is an obvious
default choice for developers.
There will be, beyond doubt, no end of lessons from this episode on how to
run a successful free software project. There is one which stands out,
though: until well into this discussion, there had been little or no
communication between the PostgreSQL development community and the people
working on Python adapters. Given how tightly coupled the two efforts are,
a lack of communication for years can only make the creation of top-quality adapters
harder. Once the relevant developers started talking to each other, it
only took a few days to find a path toward a satisfactory conclusion.
(
Log in to post comments)