I also have to admire how the guy lists up quite a number of royal fuckups in the PHP language (lack of unicode support, idiotic syntax and others), yet later on claims that PHP "withstood the test of time better than any other language of its kind".
Posted Dec 18, 2011 17:22 UTC (Sun) by smurf (subscriber, #17840)
[Link]
Not to mention the absence of core security features. Even where they exist, it's simpler to ignore them.
Take SQL Injection, for instance. The database interface is one of many areas where it's FAR easier to write broken than good code in PHP. Quote
The very next article on the site aptly demonstrates this by calling a shell command like "command $var1 $var2". No quoting whatsoever. Fun.
PHP may still be the most-used programming language on the web. So what?
Quantity != quality.
(Speaking of Unicode: LWN still uses ISO-8859-1 encoding I tried to use ≠ here, but no luck.)
Unicode != UTF-8
Posted Dec 18, 2011 18:32 UTC (Sun) by khim (subscriber, #9252)
[Link]
(Speaking of Unicode: LWN still uses ISO-8859-1 encoding I tried to use ≠ here, but no luck.)
You mean this one: ≠ ? Works fine for me. What I'm doing wrong?
Unicode != UTF-8
Posted Dec 18, 2011 19:57 UTC (Sun) by tetromino (subscriber, #33846)
[Link]
> You mean this one: ≠ ? Works fine for me. What I'm doing wrong?
In "plain text" comment format, Unicode input and character entities both fail for me (LWN displays them as escaped character entities).
In "HTML" comment format, Unicode input (e.g. Ctrl+Alt+u2260 in Firefox → ≠) as well as named (≠ → ≠) and decimal (≠ → ≠) character entities work correctly.
However, the presence of a hex character entity (e.g. ≠) in a comment breaks all Unicode parsing in that comment, leaving all Unicode characters as escaped character entities, like in plain text format. This seems to be a bug in the LWN comment engine.
Cracks in the Foundation (PHP Advent)
Posted Dec 18, 2011 21:29 UTC (Sun) by oldtomas (guest, #72579)
[Link]
Take SQL Injection, for instance. The database interface is one of many areas where it's FAR easier to write broken than good code in PHP
I think it's at least as much a cultural as it is a language design problem. Nowadays, PHP has prepared statements. But you don't see that used often "out in the wild". The best you get is some variations on "sql escape" which, given its interface, just can't get it right (on a short PHP stint I was horrified to see that even big frameworks like Joomla do it that way).
Then, beginning hackers copy that.
So, to "fix" PHP it would be necessary to "fix" all the code out there first, just to give beginners a chance to pick up good idioms.
Cracks in the Foundation (PHP Advent)
Posted Dec 19, 2011 7:18 UTC (Mon) by sxpert (subscriber, #19738)
[Link]
imho, the problem comes from the fact that the mysql binding doesn't have a "mysql_query_params" function.
the Postgresql binding has "pg_query_params", which does the right thing...
the solution is pretty simple... stop using mysql !
Cracks in the Foundation (PHP Advent)
Posted Dec 19, 2011 7:52 UTC (Mon) by smurf (subscriber, #17840)
[Link]
An alternate solution would be to use a programming language whose authors (and users) cared enough about security that they'd implement the thing sometime during the last N years.
A reasonable database layer allows one to switch database backends as needed. It allows me to change the backend driver (a one-line edit), and everything should still work afterwards.
A global replace of s/mysql_/pg_/ is not reasonable.
PHP includes such "reasonable database layer"...
Posted Dec 19, 2011 8:17 UTC (Mon) by khim (subscriber, #9252)
[Link]
A reasonable database layer allows one to switch database backends as needed.
It allows me to change the backend driver (a one-line edit), and everything should still work afterwards.
It allows you to change the backend driver and this starts long arduous task of fixing the code. Even trivial tasks (like addition of one row) is done differently with different servers. For example in mysql you usually add row using insert and then call PDO::lastInsertId to find the ID, but in Postgresql it does not work and the best alternative is to use INSERT RETURNING foo_id. Or you can use PDO::lastInsertId, but this is slower and requires changes anyway (you must specify name of a sequence object with Postgresql).
Sorry, but magic wand does not work with SQL servers: they all are so different that "multiple backends" paradigm works only if you test your program with all of them. Or you can use some kind of ORM library, but this creates it's own problems.
PHP includes such "reasonable database layer"...
Posted Dec 19, 2011 11:36 UTC (Mon) by alankila (subscriber, #47141)
[Link]
I agree that the requirement that *everything works afterwards* is probably too steep. But having to change a few selects to return the last insert id, or just use currval('foo_id_seq') in a few queries is the sort of price that seems acceptable. Much better, anyway, than using database-specific functions.
PHP includes such "reasonable database layer"...
Posted Dec 19, 2011 13:28 UTC (Mon) by nix (subscriber, #2304)
[Link]
Well, the example you gave (RETURNING versus ::lastInsertId) looks like the sort of trivial-yet-important thing that any reasonable RDBMS-independent database driver really should abstract over. If it doesn't bother, that says volumes...
Heh
Posted Dec 19, 2011 14:28 UTC (Mon) by khim (subscriber, #9252)
[Link]
Well, the example you gave (RETURNING versus ::lastInsertId) looks like the sort of trivial-yet-important thing that any reasonable RDBMS-independent database driver really should abstract over.
s/reasonable/unreasonable/. If it's database driver and not ORM then it's good idea to give you access to raw functions of SQL database. For example ODBC just does not give access to these capabilities so you need to use two SQL requests instead of one. Hardly an improvement.
If it doesn't bother, that says volumes...
Sure. It spokes volumes about unreasonable expectations.
Duh?
Posted Dec 19, 2011 14:46 UTC (Mon) by smurf (subscriber, #17840)
[Link]
Returning the last_insert_id is something every reasonable SQL database must do. It therefore makes sense to have a common interface for that, just as you have a common interface for opening a cursor and emitting a SELECT statement -- no matter how cursors are implemented internally or whether the database is a file or a network connection.
I have no problem with getting access to the raw functions. But that kind of implementation detail wants to be abstracted. It's not something a Web programmer should have to think about.
Wrong again...
Posted Dec 19, 2011 15:05 UTC (Mon) by khim (subscriber, #9252)
[Link]
Returning the last_insert_id is something every reasonable SQL database must do.
Puhlease. Oracle offers totally different way of doing the same thing. You never pull last_insert_id to your program at all: instead you use currval in the next use insert (update, etc).
I have no problem with getting access to the raw functions. But that kind of implementation detail wants to be abstracted.
Well, it only makes sense if you don't care about resource usage. Because practially the only way to abstract these kind of things is to create interface which works awfully slow for all supported databases. Perhaps some wizard can invent something which will support all common databases in the same fashion and still will work fast - but on practice I'm yet to see such implementation.
Wrong again...
Posted Dec 19, 2011 16:27 UTC (Mon) by nix (subscriber, #2304)
[Link]
Your belief that RETURNING doesn't work in Oracle is untrue. It does (though you have to RETURNING a .nextval, indeed), and is the recommended approach rather than using .currval, because it saves a roundtrip to the database.
Writing an abstraction layer which abstracts over all SQL differences is of course over the top... but abstracting over trivial, always-needed stuff like 'what is the sequence number of the row I just inserted' is, well, trivial. I've written such an abstraction layer myself more than once, and that was a little one-man thing, and it was not very much effort. That PHP couldn't do it says volumes.
This is funny...
Posted Dec 20, 2011 0:04 UTC (Tue) by khim (subscriber, #9252)
[Link]
Your belief that RETURNING doesn't work in Oracle is untrue. It does (though you have to RETURNING a .nextval, indeed), and is the recommended approach rather than using .currval, because it saves a roundtrip to the database.
Of course RETURNING works, but... how exactly will it save the roundtrip? When you use nextval/currval idiom you can just batch all your operation in one statement using BEGIN/END - thus using one single roudtrip. How exactly last_insert_id emulation can beat this is mystery to me.
Sure, sometimes you want to do some manipulations in your application before continuing - and in this case it's natural to use last_insert_id with MySQL (but not with Oracle).
Writing an abstraction layer which abstracts over all SQL differences is of course over the top... but abstracting over trivial, always-needed stuff like 'what is the sequence number of the row I just inserted' is, well, trivial.
Actually it's not trivial. In MySQL LAST_INSERT_ID only gives you one number but in Postresql or Oracle you can generate few different numbers. And since one roundtrip can include quite a few elemental operations it's not easy or simple to transform one to another.
I've written such an abstraction layer myself more than once, and that was a little one-man thing, and it was not very much effort.
WOW! Cool.
How exactly have you processed Oracle's insert statements which used two distinct sequences in MySQL?
That PHP couldn't do it says volumes.
About your unreasonable expectations? Sure. About SQL driver? Not much. DBI example which produces suboptimal results does not expire confidence: I'd rather handle such problems at the application level rather then in intense debugging session which will show that that my perfectly tuned code on some other database uses insert over unindexed table "because driver had no other choice". Thnks... but no, thnks.
This is funny...
Posted Dec 20, 2011 11:38 UTC (Tue) by nix (subscriber, #2304)
[Link]
As for last_insert_id et al, well, every abstraction layer has costs. One cost of this one was that you had to use a not-completely-disgusting data model, i.e., that having more than one sequence tracking a single table was not supported. I never felt the lack: compound PKs are all very well, but if you use a sequence (mysql: autoincrement) as a PK it means that nothing else would do. Why on earth would you want to use a compound PK consisting of multiple sequences? That smacks of very bad table design to me. I thought of various more or less ugly ways to fix this but never implemented them because of a total lack of need.
(It was not a very pleasant abstraction layer to write, I'll grant you that. But, still, abstracting over this sort of thing *is* the job of an abstraction layer.)
Heh
Posted Dec 19, 2011 14:54 UTC (Mon) by niner (subscriber, #26151)
[Link]
Perl's DBI offers a last_insert_id Method which works with pretty much all database drivers. So while it is available and in practice works, the interface still has to live with the complete lack of standardisation among DBMS. A quote from the DBI man page:
There are several caveats to be aware of with this method if you want to use it for portable applications:
* For some drivers the value may only available immediately after the insert statement has executed (e.g., mysql, Informix).
* For some drivers the $catalog, $schema, $table, and $field parameters are required, for others they are ignored (e.g., mysql).
* Drivers may return an indeterminate value if no insert has been performed yet.
* For some drivers the value may only be available if placeholders have not been used (e.g., Sybase, MS SQL). In this case the value returned
would be from the last non-placeholder insert statement.
* Some drivers may need driver-specific hints about how to get the value. For example, being told the name of the database 'sequence' object
that holds the value. Any such hints are passed as driver-specific attributes in the \%attr parameter.
* If the underlying database offers nothing better, then some drivers may attempt to implement this method by executing ""select max($field)
from $table"". Drivers using any approach like this should issue a warning if "AutoCommit" is true because it is generally unsafe - another
process may have modified the table between your insert and the select. For situations where you know it is safe, such as when you have
locked the table, you can silence the warning by passing "Warn" => 0 in \%attr.
* If no insert has been performed yet, or the last insert failed, then the value is implementation defined.
Given all the caveats above, it's clear that this method must be used with care.
Yup
Posted Dec 19, 2011 16:19 UTC (Mon) by khim (subscriber, #9252)
[Link]
Given all the caveats above, it's clear that this method must be used with care.
It must only be used with MySQL, really. If you want to use Oracle and Postgresql then it's much better to use sequences and their nextval and currval directly (not sure about other databases, but they probably have their own ways to do that).
So while it is available and in practice works, the interface still has to live with the complete lack of standardisation among DBMS.
Right. This just shows that SQL driver is completely wrong level of abstraction to try to unify different databases. It's [relatively] easy for ORM layer or application-specific layer to deal with these differences. If you try to do that on SQL driver layer you'll waste huge amount of resources and and the end result will sometimes still not usable (If the underlying database offers nothing better, then some drivers may attempt to implement this method by executing ""select max($field) from $table"").
NOT a way to go. PHP has many problematic warts, but PDO is not one of them.
Yup
Posted Dec 19, 2011 19:35 UTC (Mon) by fatrat (subscriber, #1518)
[Link]
Have you seen Python's SQLAlchemy? It provides an ORM layer but also a really good SQL level layer.
Good try...
Posted Dec 20, 2011 0:35 UTC (Tue) by khim (subscriber, #9252)
[Link]
Even so: I can not see how exactly you can chain multiple INSERTs in one BEGIN/END Oracle statement or execute multiquery in MySQL (like you can do with PHP).
Yup
Posted Dec 20, 2011 7:38 UTC (Tue) by niner (subscriber, #26151)
[Link]
"It must only be used with MySQL, really. If you want to use Oracle and Postgresql then it's much better to use sequences and their nextval and currval directly."
Not really. It covers the simple case quite well which is the vast majority in our applications. If I want to use PostgreSQL features, I'd use returning to get the new ID back to the application without an additional roundtrip.
Cracks in the Foundation (PHP Advent)
Posted Dec 19, 2011 22:21 UTC (Mon) by adamg (subscriber, #42260)
[Link]
A reasonable database layer allows one to switch database backends as needed. It allows me to change the backend driver (a one-line edit), and everything should still work afterwards.
That always makes me wonder, what kind of argument is that anyway? Do you really find yourself so often in situations where in the middle of the project you decide to switch database backend?
If a project developers write SQL queries that are so portable that it as easy to switch to another database backend as doing the mentioned s/my_/pg_/g (or equivalent), my guess is that they are not using much of the initial database features.
.. or maybe there is an easy way (which I don't know) to write stored procedures in a portable way.
This is the most ridiculous commentary...
Posted Dec 19, 2011 8:01 UTC (Mon) by khim (subscriber, #9252)
[Link]
There is one thing PHP did right: it was ridiculously easy to add mod_php to the Apache server from the very beginning - and overhead was negligible when it was not in use thus a lot of ISPs offered mod_php and very few offered mod_perl or mod_python. This is what made PHP popular and this is what keeps it alive.
This is the most ridiculous commentary...
Posted Dec 19, 2011 10:06 UTC (Mon) by drag (subscriber, #31333)
[Link]
It helps.
Also it's trivial to make a PHP script by taking some HTML code and putting PHP statements in it.
So it suckers developers in by trivial hello world type demos. More people come from a HTML editing background then a programming background so this sort of thing makes it easier to approach.
This is the most ridiculous commentary...
Posted Dec 19, 2011 21:58 UTC (Mon) by job (guest, #670)
[Link]
As a side note, I believe what was important about mod_php was safe_mode. It is a rudimentary file i/o sandbox that only allows you to open files with the same UID your script was created with. That enabled shared script hosting, you don't have to use one process per user which was important for the nascent "web hotel" businesses.
Performance was worse than Perl (which was the dominant language at the time, and had decent template frameworks too), security was worse, the language itself was a stripped down Perl 4, but deployment was so easy you just had to install it and you had yourself a shared web hosting environment.
I see now that safe_mode is removed from PHP 5.4 so perhaps we're full circle now. I wonder what shared hosts are supposed to replace it with?
This is the most ridiculous commentary...
Posted Dec 20, 2011 10:17 UTC (Tue) by smurf (subscriber, #17840)
[Link]
FastCGI and separate PHP processes per user, of course.
It doesn't make any sense at all to bloat your Apache with mod_php. If you have 200 Apache processes, of which 10 are needed for PHP, the (non-trivial, these days) memory space of 190 PHP interpreters sits idle. It can't even be swapped out because the set of Apache processes working on PHP requests constantly changes.
Using FastCGI, you have 10 busy PHP interpreters and 190 "normal" Apache (or lighttp or ) processors, and you get to run each customer's PHP stuff under their own UID instead of having global access to anything that's readable by www_data when (not if) the next security hole is discovered.
This is the most ridiculous commentary...
Posted Dec 20, 2011 11:09 UTC (Tue) by job (guest, #670)
[Link]
For large web apps this is (and has always been) true. But the issue here is altogether different.
A large shared webhost has many thousands of web sites and only a few of them are active at any time. It is unreasonable to keep thousands of idle PHP processes around (and that is without room for concorrency which would be required in production).
That is what the web hotel business is all about, very cheap hosting, and this is what made PHP popular.
This is the most ridiculous commentary...
Posted Dec 20, 2011 12:23 UTC (Tue) by smurf (subscriber, #17840)
[Link]
Sure it's unreasonable to keep them around. So let them die when they're no longer useful.
Large sites increase the likelihood that somebody, somewhere on your system, is running code with a security hole or two. So it's even more important to shield individual customers from each other.
Your decision, of course. But the web servers I am responsible for will never load mod_interpreter, much less mod_php.
This is the most ridiculous commentary...
Posted Dec 20, 2011 16:14 UTC (Tue) by job (guest, #670)
[Link]
When the number of idle sites are order of magnitudes larger than the number of active sites at any given moment that solution is equivalent to running PHP as CGI. That would price you out of the market. (And spare me the morals please, as much as I'd like to see mod_php dead it is still an important market which drove PHP's success.)
PHP won because of mod_php, but mod_php won the low end market because of safe_mode. My question still stands, what should replace it? Maybe SELinux can?
This is the most ridiculous commentary...
Posted Dec 21, 2011 12:02 UTC (Wed) by andresfreund (subscriber, #69562)
[Link]
> When the number of idle sites are order of magnitudes larger than the number of active sites at any given moment that solution is equivalent to running PHP as CGI. That would price you out of the market. (And spare me the morals please, as much as I'd like to see mod_php dead it is still an important market which drove PHP's success.)
Right, because idle sites cause that much load that cgi is actually a problem... Its the website that actually use their allotted bandwith/load/whatever that are a problem not the thousand with 3 visitors a day.
This is the most ridiculous commentary...
Posted Dec 22, 2011 0:22 UTC (Thu) by job (guest, #670)
[Link]
On a large shared web host, a very small percentage of the sites are have traffic at any given time, but it's still a lot in absolute numbers.
The only possible way to cram as many sites as possible on your host is to run them with a common interpreter, which can pose security problems. That's where mod_php succeeded.
(By the way, I got the answer to my specific question below, which is that other restrictions still apply.)
This is the most ridiculous commentary...
Posted Dec 20, 2011 22:58 UTC (Tue) by hholzgra (subscriber, #11737)
[Link]
Being able to define memory and cpu time limits, and maybe open-basedir restrictions, were (AFAIR) more important to shared hosters than safe-mode (which was never that safe anyway), and all of these still exist.
This is the most ridiculous commentary...
Posted Dec 21, 2011 8:48 UTC (Wed) by job (guest, #670)
[Link]
That explains it. I thought basedir restrictions was part of safe_mode.
Cracks in the Foundation (PHP Advent)
Posted Dec 20, 2011 23:02 UTC (Tue) by hholzgra (subscriber, #11737)
[Link]
The old ext/mysql extension did indeed not support prepared statements (just as the MySQL client protocol did not support them either at the time when ext/mysql created)
The newer ext/mysqli extension fixed that around 2003 already when prepared statement support became in MySQL, but somehow lots of people prefer to stay with the "classic" extensions API ...