|
|
Subscribe / Log in / New account

Python and PostgreSQL

By Jonathan Corbet
February 17, 2010
As some LWN readers will know, this site is implemented with a combination of free technologies, including the Python language and the PostgreSQL relational database management system. Anybody who has tried to combine those two tools will have encountered the variety of modules out there intended to serve as the glue between them. It's the sort of variety that nobody wants, though: lots of options, none of which has the full support of the community or works as well as one might like. The good news is that this situation may not last a whole lot longer.

The conversation started when Bruce Momjian noted that the state of Python/PostgreSQL support was not as good as it could be. The PostgreSQL Python page and the Python PostgreSQL page agree on one thing: there are several adapters available, none of which is truly dominant, but many of which are seemingly unmaintained. How, he asked, is a developer to choose between them? Your editor, who has tried a number of them, could only nod in sympathy. Bruce requested:

What is really needed is for someone to take charge of one of these projects and make a best-of-breed Python driver that can gain general acceptance as our preferred driver. I feel Python is too important a language to be left in this state.

The purpose of a database adapter module is to make the capabilities of the database available to Python applications. To that end, it accepts SQL queries from the application, passes them to the database, and hands the results (if any) back to the application. Application writers like the idea of database independence; it holds the promise of being able to move easily to a different database should the need arise. To enable this independence, language development communities define standards for how database adapters should operate. The Python version of this standard is the Python Database API Specification, often called DB-API.

One of the problems, as identified by Greg Smith, is that the DB-API fails to cover much of the needed functionality, meaning that each adapter ends up making its own (incompatible, naturally) extensions. One of your editor's favorite quirks is the specification of five different "styles" by which parameters can be passed into queries; the application is expected to support all five and use whichever one the adapter is prepared to accept that day. The end result of all this is that adapters tend to diverge from each other, portability between them is problematic, and none becomes the standard.

That said, there currently seem to be two serious competitors in this area:

  • Psycopg almost certainly has the widest support among Python applications. It is reasonably solid and performs well, but some potential users may have been daunted by the fact that its web page took the form of an anti-Trac rant for some time (it's still in the Google cache as of this writing).

  • PyGreSQL has been around for a long time; it predates the DB-API and still does not implement it completely. Development on the code has been slow for some time, and its performance is not as good as Psycopg.

One might think that Psycopg would be the clear leader among these two, and it would have been, except for one little problem: Psycopg is GPL with a bunch of exceptions. The PostgreSQL community feels fairly strongly about permissive licenses, to the point that a GPL-licensed adapter is seen as a deal breaker. So Greg lamented:

If everybody who decided they were going to write their own from scratch had decided to work on carefully and painfully refactoring and improving PyGreSQL instead, in an incremental way that grew its existing community along the way, we might have a BSD driver with enough features and users to be a valid competitor to psycopg2. But writing something shiny new from scratch is fun, while worrying about backward compatibility and implementing all the messy parts you need to really be complete on a project like this isn't, so instead we have a stack of not quite right drivers without any broad application support.

As a way toward a solution, Greg put together a Python PostgreSQL driver TODO page describing the issues with both Psycopg and PyGreSQL. For Psycopg, wishlist items included some testing, refactoring, and documentation work. The list for PyGreSQL is longer and more daunting. The conclusion found on that page is:

A relicensed Psycopg seems like the closest match to the overall goals of the PostgreSQL project, in terms of coding work needed both in the driver and on the application side (because so many apps already use this driver).

Authors of GPL-licensed code often tend not to react well to requests for relicensing. That can be true even in cases like a database adapter, which is normally a relatively small component in a much larger application. In this case, though, Psycopg hacker Federico Di Gregorio acknowledged that, perhaps, GPL wasn't the best license for this module. So, he has announced, the next Psycopg release will carry the LGPLv3 license (plus the obligatory exceptions involved in using libssl) instead. That is probably enough to tip the scales in that direction and, finally, lead to a situation where there is an obvious default choice for developers.

There will be, beyond doubt, no end of lessons from this episode on how to run a successful free software project. There is one which stands out, though: until well into this discussion, there had been little or no communication between the PostgreSQL development community and the people working on Python adapters. Given how tightly coupled the two efforts are, a lack of communication for years can only make the creation of top-quality adapters harder. Once the relevant developers started talking to each other, it only took a few days to find a path toward a satisfactory conclusion.


to post comments

Python and PostgreSQL

Posted Feb 18, 2010 4:38 UTC (Thu) by malefic (guest, #37306) [Link] (1 responses)

There is an excellent driver created by James William Pye: py-postgresql. It implements both DB-API and a "native" PostgreSQL API which is much more powerful and elegant. py-postgresql is written in pure Python (with optional chunks in C to speed things up in certain bottlenecks). AND it is BSD! It is for Python 3.x only, though. But if that's not a problem, I highly recommend py-postgresql.

Python 3

Posted Feb 18, 2010 14:11 UTC (Thu) by corbet (editor, #1) [Link]

I didn't mention it in the article, but Python 2.x support is considered to be crucial at this point, with 3.x being (still) a future thing. FWIW, Psycopg is evidently not very far away from supporting Python 3.

Python and PostgreSQL

Posted Feb 18, 2010 7:35 UTC (Thu) by cry_regarder (subscriber, #50545) [Link]

> As some LWN readers will know, this site is
> implemented with a combination of free technologies

That's like calling in fire on your self... :-)

Python and PostgreSQL

Posted Feb 18, 2010 9:56 UTC (Thu) by multani (guest, #56193) [Link]

> Psycopg almost certainly has the widest support among Python applications. It is reasonably solid and performs well, but some potential users may have been daunted by the fact that its web page took the form of an anti-Trac rant for some time (it's still in the Google cache as of this writing).

Since then, Daniele Varrazzo did an awesome work of documenting how to use Psycopg : there is now a great documentation of using it as any DB-API driver, as well as its extra functionalities (dict-like cursors, COPY TO/FROM support, etc.). Federico Di Gregorio, the main author of Psycopg, started to add a FAQ some days ago

So finally, this driver is getting the love it would have needed since a very long time!

Python and PostgreSQL

Posted Feb 18, 2010 11:06 UTC (Thu) by gouyou (guest, #30290) [Link]

On a side note, a new driver will have to work with existing libraries and frameworks. Django is only supporting psycopg, SQLAlchemy is supporting psycopg and pg8000 (a pure python and BSD driver), SQLObject is supporting psycopg and pygresql.

Python and PostgreSQL

Posted Feb 18, 2010 11:28 UTC (Thu) by kpfleming (subscriber, #23250) [Link]

> will carry the LGPLv3 license (plus the obligatory exceptions involved in using libssl)

We (the Asterisk project) license our code under GPLv2 but allow people to distribute it linked against OpenSSL, and we call this "additional permissions", rather than "exceptions". It seems to more clearly indicate the intent.

Python and PostgreSQL

Posted Feb 18, 2010 12:47 UTC (Thu) by ikm (guest, #493) [Link] (8 responses)

> this site is implemented with a combination of free technologies, including the Python language and the PostgreSQL relational database management system.

Why did you choose PostgreSQL over MySQL? The latter seem to have much more streamlined bindings and support, and also looks quite sufficient feature-wise.

Python and PostgreSQL

Posted Feb 18, 2010 14:27 UTC (Thu) by vonbrand (subscriber, #4458) [Link] (4 responses)

Back in the timeframe when LWN started, a friend did a undergraduate thesis comparing MySQL and Postgres (among others). MySQL was faster, but only for very simple queries... and developing more complex software than simple queries for/with Postgres was much easier. His conclusions were that MySQL was easier for starting up, but once the applications grew and got more complex (including not simple lookups but also data additions) it became much harder to add functionality with MySQL.

Python and PostgreSQL

Posted Feb 18, 2010 14:48 UTC (Thu) by ikm (guest, #493) [Link] (3 responses)

My understanding is that LWN was retooled circa 2002, so I guess that's when the comparison should've been made. Still, yes, mysql was different back then.

Python and PostgreSQL

Posted Feb 18, 2010 15:14 UTC (Thu) by maney (subscriber, #12630) [Link] (2 responses)

Yes, I was choosing a database and platform back around that time - I remember seeing Our Editor occasionally in the Quixote mailing list. My recollection is that this was not so very long after some highly placed person (developer?) from MySql had written an aggressive apologia that said, in effect, that MySQL didn't need transactions or any of that other stuff, and neither did you.

Like Our Editor, I agreed to disagree and chose PostgreSQL. I eventually drifted away from Quixote, however...

Long-lasting damage

Posted Feb 19, 2010 20:58 UTC (Fri) by man_ls (guest, #15091) [Link] (1 responses)

I remember that rant -- essentially he said that you had to implement transactions yourself, and if you wanted a database which implemented them it was because you were stupid. I have steered clear of MySQL ever since. I know I did need transactions. Actually, transactions can be pretty cool.

Long-lasting damage

Posted Mar 16, 2010 22:39 UTC (Tue) by rodgerd (guest, #58896) [Link]

That rant probably did more to drive PostgreSQL uptake than anything else either dev team had done.

Python and PostgreSQL

Posted Feb 18, 2010 23:09 UTC (Thu) by Tet (guest, #5433) [Link]

also looks quite sufficient feature-wise

I haven't looked at MySQL for a while, and I understand it's improved. But certainly a few years ago, it was very much insufficient in terms of features for anything but the least demanding tasks.

Python and PostgreSQL

Posted Feb 26, 2010 16:07 UTC (Fri) by abadidea (guest, #62082) [Link] (1 responses)

Considering the present situation with MySQL, perhaps the choice to go with Postgre will have some unexpected benefits in the future.

Python and PostgreSQL

Posted Feb 26, 2010 20:04 UTC (Fri) by ikm (guest, #493) [Link]

MySQL has too much of an installation count for something bad to happen to it.

Python and PostgreSQL

Posted Feb 18, 2010 13:59 UTC (Thu) by mitchskin (subscriber, #32405) [Link]

What? Consensus around one existing project? Obviously that won't work. We're talking about the python community, here: writing another new implementation is the way things are done.

Python and PostgreSQL

Posted Feb 19, 2010 9:08 UTC (Fri) by Frej (guest, #4165) [Link]

In theory, when you develop against the python DB-API, isn't your software a seperate work from the
actual adapter?
Actually, that's a question.. so..

Q: If a python project just used the db-api would it actually put dependent works in GPL? Does it
matter which license the adapter has?

Say you start out your project against a bsd licensed adapter, and the change to the GPL one, isn't
that the same as the nvidia linux drivers?

Ofcourse an LGPL license would be clearer, but does it actually matter in this specific case? There
seems to be quite a few non GPL web/db frameworks out there for python ;)


Copyright © 2010, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds