By Jonathan Corbet
March 24, 2010
Rightly or wrongly, many in our community see Perl 6 as the definitive
example of vaporware. But what about PHP 6? This release was
first discussed by the PHP core
developers back in 2005.
There have been books on the shelves purporting to cover PHP 6 since at
least 2008. But, in March 2010, the PHP 6 release is not out - in fact, it
is not even close to out. Recent events suggest that PHP 6 will not be
released before 2011 - if, indeed, it is released at all.
PHP 6 was, as befits a major release, meant to bring some serious changes to
the language. To begin with, the safe_mode feature which is the
whipping boy for PHP security - or the lack thereof - will be consigned to
an unloved oblivion; the "register_globals" feature will be gone as well.
The proposed traits
feature would bring "horizontal reuse" to the language; think of traits as
a PHPish answer to multiple inheritance or Java's interfaces. A new 64-bit
integer type is planned. PHP was slated to gain a goto keyword
(though the plan was to avoid the scary goto name and add target
labels to break instead). Some basic static typing
features are under consideration. There was even talk of adding
namespaces to the language and making function and class names be
case-sensitive.
The really big change in PHP 6, though, was the shift to Unicode
throughout. Anybody who is running a web site which does not use Unicode
is almost certainly wishing that things were otherwise - trust your editor
on this one. It is possible to support Unicode to an extent even if the language in
use is not aware of Unicode, but it is a painful and error-prone affair;
proper Unicode support requires a language which understands Unicode
strings. The PHP 6 plan
was to support Unicode all the way:
PHP6 will have Unicode support everywhere; in the engine, in
extensions, in the API. It's going to be native and complete; no
hacks, no external libraries, no language bias. English is just
another language, it's not the primary language.
Unicode, however, appears to be the rock upon which the PHP 6 ship ran
aground. Despite claims back
in 2006 that the development process was "going pretty well," it seems
that few people are happy with the state of Unicode support in PHP. Memory
usage is high, performance is poor, and broken scripts are common. The
project has been struggling for some time to find a solution to this
problem.
From your editor's reading of the discussion, the fatal mistake would
appear to be the decision to use the two-byte UTF-16 encoding for all
strings within PHP. According to PHP creator Rasmus
Lerdorf, this decision was made to ease compatibility with the International Components for Unicode
(ICU) library:
Well, the obvious original reason is that ICU uses UTF-16
internally and the logic was that we would be going in and out of
ICU to do all the various Unicode operations many more times than
we would be interfacing with external things like MySQL or files on
disk. You generally only read or write a string once from an
external source, but you may perform multiple Unicode operations on
that same string so avoiding a conversion for each operation seems
logical.
But a lot of strings simply pass through PHP programs; in the end, the
conversion turned out to be more expensive and less convenient than had
been hoped. Johannes Schlüter describes
the problem this way:
By using UTF-16 as default encoding we'd have to convert the script
code and all data passed from or to the script (request data,
database results, output, ...) from another encoding, usually
UTF-8, to UTF-16 or back. The need for conversion doesn't only
require CPU time and more memory (a UTF-16 string takes double
memory of a UTF-8 string in many cases) but makes the
implementation rather complex as we always have to figure out which
encoding was the right one for a given situation. From the
userspace point of view the implementation brought some backwards
compatibility breaks which would require manual review of the
code.
These all are pains for a very small gain for many users
where many would be happy about a tighter integration of some
mbstring-like
functionality. This all led to a situation for many
contributors not willing to use "trunk" as their main development
tree but either develop using the stable 5.2/5.3 trees or refuse to
do development at all.
The end result of all this is that PHP 6 development eventually stalled.
The Unicode problems made a release impossible while blocking other
features from showing up in any PHP release at all. Eventually some work
was backported to 5.3, but that is always a problematic solution; it brings
back memories of the 2.5 kernel development series.
Developer frustration, it seems, grew for some time. Last November, Kalle
Sommer Nielsen tried to kickstart the
process, saying:
I've been thinking for a while what we should do about PHP6 and its
future, because right now it seems like there isn't much future in
it.
Things came to a head on March 11, when Jani Taskinen, fed up with being
unable to push things forward, (1) committed some disruptive changes
to the stable 5.3 branch, and (2) created a new PHP_5_4 branch which
looked like it was meant to be a new development tree. That is when Rasmus
stepped in:
The real decision is not whether to have a version 5.4 or not, it
is all about solving the Unicode problem. The current effort has
obviously stalled. We need to figure out how to get development
back on track in a way that people can get on board. We knew the
Unicode effort was hugely ambitious the way we approached it.
There are other ways.
So I think Lukas and others are right, let's move the PHP 6 trunk
to a branch since we are still going to need a bunch of code from
it and move development to trunk and start exploring lighter and
more approachable ways to attack Unicode.
And that is where it stands. The whole development series which was meant
to be PHP 6 has been pushed aside to a branch, and development is starting
anew based on the 5.3 release. Anything of value in the old PHP 6 branch
can be cherry-picked from there as need be, but the process of what is
going into the next release is beginning from scratch, and one assumes that
proposals will be looked at closely. There are no timelines or plans for
the next release at this point; as Rasmus
explains, that's not what the project needs now:
We don't need timelines right now. What we need is some hacking
time and to bring some fun back into PHP development. It hasn't
been fun for quite a while. Once we have a body of new interesting
stuff, we can start pondering releases...
So timing and features for the next PHP release are completely unknown at
this point. Even the name is unknown; Jani's 5.4 branch has been renamed
to THE_5_4_THAT_ISNT_5_4. There has been some concern about all of those
PHP 6 books out there; it has been suggested that a release which doesn't
conform to expectations for PHP 6 should be called something else - PHP7,
even. There's little sympathy for the authors and publishers of those
books, but those who bought them may merit a little more care. But that
will be a discussion for another day. Meanwhile, the PHP hackers are
refocusing on getting things done and having some fun too.
(
Log in to post comments)