LWN.net Logo

Cracks in the Foundation (PHP Advent)

Cracks in the Foundation (PHP Advent)

Posted Dec 19, 2011 7:52 UTC (Mon) by smurf (subscriber, #17840)
In reply to: Cracks in the Foundation (PHP Advent) by sxpert
Parent article: Cracks in the Foundation (PHP Advent)

An alternate solution would be to use a programming language whose authors (and users) cared enough about security that they'd implement the thing sometime during the last N years.

A reasonable database layer allows one to switch database backends as needed. It allows me to change the backend driver (a one-line edit), and everything should still work afterwards.

A global replace of s/mysql_/pg_/ is not reasonable.


(Log in to post comments)

PHP includes such "reasonable database layer"...

Posted Dec 19, 2011 8:17 UTC (Mon) by khim (subscriber, #9252) [Link]

A reasonable database layer allows one to switch database backends as needed.

It's called PDO and PHP had it for years.

It allows me to change the backend driver (a one-line edit), and everything should still work afterwards.

It allows you to change the backend driver… and this starts long arduous task of fixing the code. Even trivial tasks (like addition of one row) is done differently with different servers. For example in mysql you usually add row using insert and then call PDO::lastInsertId to find the ID, but in Postgresql it does not work and the best alternative is to use “INSERT… RETURNING foo_id”. Or you can use PDO::lastInsertId, but this is slower and requires changes anyway (you must specify name of a sequence object with Postgresql).

Sorry, but magic wand does not work with SQL servers: they all are so different that "multiple backends" paradigm works only if you test your program with all of them. Or you can use some kind of ORM library, but this creates it's own problems.

PHP includes such "reasonable database layer"...

Posted Dec 19, 2011 11:36 UTC (Mon) by alankila (subscriber, #47141) [Link]

I agree that the requirement that *everything works afterwards* is probably too steep. But having to change a few selects to return the last insert id, or just use currval('foo_id_seq') in a few queries is the sort of price that seems acceptable. Much better, anyway, than using database-specific functions.

PHP includes such "reasonable database layer"...

Posted Dec 19, 2011 13:28 UTC (Mon) by nix (subscriber, #2304) [Link]

Well, the example you gave (RETURNING versus ::lastInsertId) looks like the sort of trivial-yet-important thing that any reasonable RDBMS-independent database driver really should abstract over. If it doesn't bother, that says volumes...

Heh

Posted Dec 19, 2011 14:28 UTC (Mon) by khim (subscriber, #9252) [Link]

Well, the example you gave (RETURNING versus ::lastInsertId) looks like the sort of trivial-yet-important thing that any reasonable RDBMS-independent database driver really should abstract over.

s/reasonable/unreasonable/. If it's database driver and not ORM then it's good idea to give you access to raw functions of SQL database. For example ODBC just does not give access to these capabilities so you need to use two SQL requests instead of one. Hardly an improvement.

If it doesn't bother, that says volumes...

Sure. It spokes volumes about unreasonable expectations.

Duh?

Posted Dec 19, 2011 14:46 UTC (Mon) by smurf (subscriber, #17840) [Link]

Returning the last_insert_id is something every reasonable SQL database must do. It therefore makes sense to have a common interface for that, just as you have a common interface for opening a cursor and emitting a SELECT statement -- no matter how cursors are implemented internally or whether the database is a file or a network connection.

I have no problem with getting access to the raw functions. But that kind of implementation detail wants to be abstracted. It's not something a Web programmer should have to think about.

Wrong again...

Posted Dec 19, 2011 15:05 UTC (Mon) by khim (subscriber, #9252) [Link]

Returning the last_insert_id is something every reasonable SQL database must do.

Puhlease. Oracle offers totally different way of doing the same thing. You never pull last_insert_id to your program at all: instead you use “currval” in the next use “insert” (“update”, etc).

I have no problem with getting access to the raw functions. But that kind of implementation detail wants to be abstracted.

Well, it only makes sense if you don't care about resource usage. Because practially the only way to abstract these kind of things is to create interface which works awfully slow for all supported databases. Perhaps some wizard can invent something which will support all common databases in the same fashion and still will work fast - but on practice I'm yet to see such implementation.

Wrong again...

Posted Dec 19, 2011 16:27 UTC (Mon) by nix (subscriber, #2304) [Link]

Your belief that RETURNING doesn't work in Oracle is untrue. It does (though you have to RETURNING a .nextval, indeed), and is the recommended approach rather than using .currval, because it saves a roundtrip to the database.

Writing an abstraction layer which abstracts over all SQL differences is of course over the top... but abstracting over trivial, always-needed stuff like 'what is the sequence number of the row I just inserted' is, well, trivial. I've written such an abstraction layer myself more than once, and that was a little one-man thing, and it was not very much effort. That PHP couldn't do it says volumes.

This is funny...

Posted Dec 20, 2011 0:04 UTC (Tue) by khim (subscriber, #9252) [Link]

Your belief that RETURNING doesn't work in Oracle is untrue. It does (though you have to RETURNING a .nextval, indeed), and is the recommended approach rather than using .currval, because it saves a roundtrip to the database.

Of course “RETURNING” works, but... how exactly will it save the roundtrip? When you use “nextval”/“currval” idiom you can just batch all your operation in one statement using “BEGIN”/“END” - thus using one single roudtrip. How exactly last_insert_id emulation can beat this is mystery to me.

Sure, sometimes you want to do some manipulations in your application before continuing - and in this case it's natural to use last_insert_id with MySQL (but not with Oracle).

Writing an abstraction layer which abstracts over all SQL differences is of course over the top... but abstracting over trivial, always-needed stuff like 'what is the sequence number of the row I just inserted' is, well, trivial.

Actually it's not trivial. In MySQL LAST_INSERT_ID only gives you one number but in Postresql or Oracle you can generate few different numbers. And since one roundtrip can include quite a few elemental operations it's not easy or simple to transform one to another.

I've written such an abstraction layer myself more than once, and that was a little one-man thing, and it was not very much effort.

WOW! Cool.

How exactly have you processed Oracle's “insert” statements which used two distinct sequences in MySQL?

That PHP couldn't do it says volumes.

About your unreasonable expectations? Sure. About SQL driver? Not much. DBI example which produces suboptimal results does not expire confidence: I'd rather handle such problems at the application level rather then in intense debugging session which will show that that my perfectly tuned code on some other database uses “insert” over unindexed table "because driver had no other choice". Thnks... but no, thnks.

This is funny...

Posted Dec 20, 2011 11:38 UTC (Tue) by nix (subscriber, #2304) [Link]

As for last_insert_id et al, well, every abstraction layer has costs. One cost of this one was that you had to use a not-completely-disgusting data model, i.e., that having more than one sequence tracking a single table was not supported. I never felt the lack: compound PKs are all very well, but if you use a sequence (mysql: autoincrement) as a PK it means that nothing else would do. Why on earth would you want to use a compound PK consisting of multiple sequences? That smacks of very bad table design to me. I thought of various more or less ugly ways to fix this but never implemented them because of a total lack of need.

(It was not a very pleasant abstraction layer to write, I'll grant you that. But, still, abstracting over this sort of thing *is* the job of an abstraction layer.)

Heh

Posted Dec 19, 2011 14:54 UTC (Mon) by niner (subscriber, #26151) [Link]

Perl's DBI offers a last_insert_id Method which works with pretty much all database drivers. So while it is available and in practice works, the interface still has to live with the complete lack of standardisation among DBMS. A quote from the DBI man page:

There are several caveats to be aware of with this method if you want to use it for portable applications:

* For some drivers the value may only available immediately after the insert statement has executed (e.g., mysql, Informix).

* For some drivers the $catalog, $schema, $table, and $field parameters are required, for others they are ignored (e.g., mysql).

* Drivers may return an indeterminate value if no insert has been performed yet.

* For some drivers the value may only be available if placeholders have not been used (e.g., Sybase, MS SQL). In this case the value returned
would be from the last non-placeholder insert statement.

* Some drivers may need driver-specific hints about how to get the value. For example, being told the name of the database 'sequence' object
that holds the value. Any such hints are passed as driver-specific attributes in the \%attr parameter.

* If the underlying database offers nothing better, then some drivers may attempt to implement this method by executing ""select max($field)
from $table"". Drivers using any approach like this should issue a warning if "AutoCommit" is true because it is generally unsafe - another
process may have modified the table between your insert and the select. For situations where you know it is safe, such as when you have
locked the table, you can silence the warning by passing "Warn" => 0 in \%attr.

* If no insert has been performed yet, or the last insert failed, then the value is implementation defined.

Given all the caveats above, it's clear that this method must be used with care.

Yup

Posted Dec 19, 2011 16:19 UTC (Mon) by khim (subscriber, #9252) [Link]

Given all the caveats above, it's clear that this method must be used with care.

It must only be used with MySQL, really. If you want to use Oracle and Postgresql then it's much better to use sequences and their “nextval” and “currval” directly (not sure about other databases, but they probably have their own ways to do that).

So while it is available and in practice works, the interface still has to live with the complete lack of standardisation among DBMS.

Right. This just shows that SQL driver is completely wrong level of abstraction to try to unify different databases. It's [relatively] easy for ORM layer or application-specific layer to deal with these differences. If you try to do that on SQL driver layer you'll waste huge amount of resources and and the end result will sometimes still not usable (If the underlying database offers nothing better, then some drivers may attempt to implement this method by executing ""select max($field) from $table"").

NOT a way to go. PHP has many problematic warts, but PDO is not one of them.

Yup

Posted Dec 19, 2011 19:35 UTC (Mon) by fatrat (subscriber, #1518) [Link]

Have you seen Python's SQLAlchemy? It provides an ORM layer but also a really good SQL level layer.

Good try...

Posted Dec 20, 2011 0:35 UTC (Tue) by khim (subscriber, #9252) [Link]

No, I've never used SQL from python so I never used SQLAlchemy. Looks like fine ORM toolkit, though. As for SQL… as any good SQL driver it includes dialects. This is where you can specify sequences in Oracle, disable autoincrement in MySQL, etc.

Even so: I can not see how exactly you can chain multiple “INSERT”s in one “BEGIN”/“END” Oracle statement or execute multiquery in MySQL (like you can do with PHP).

Yup

Posted Dec 20, 2011 7:38 UTC (Tue) by niner (subscriber, #26151) [Link]

"It must only be used with MySQL, really. If you want to use Oracle and Postgresql then it's much better to use sequences and their “nextval” and “currval” directly."

Not really. It covers the simple case quite well which is the vast majority in our applications. If I want to use PostgreSQL features, I'd use returning to get the new ID back to the application without an additional roundtrip.

Cracks in the Foundation (PHP Advent)

Posted Dec 19, 2011 22:21 UTC (Mon) by adamg (subscriber, #42260) [Link]

A reasonable database layer allows one to switch database backends as needed. It allows me to change the backend driver (a one-line edit), and everything should still work afterwards.
That always makes me wonder, what kind of argument is that anyway? Do you really find yourself so often in situations where in the middle of the project you decide to switch database backend?

If a project developers write SQL queries that are so portable that it as easy to switch to another database backend as doing the mentioned s/my_/pg_/g (or equivalent), my guess is that they are not using much of the initial database features.

.. or maybe there is an easy way (which I don't know) to write stored procedures in a portable way.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds