|
|
Subscribe / Log in / New account

PHP and P++

By Jonathan Corbet
August 15, 2019
PHP is the Fortran of the world-wide web: it demonstrated the power of code embedded in web pages, but has since been superseded in many developers' minds by more contemporary technologies. Even so, as with Fortran, there is far more PHP code out there than one might think, and PHP is still chosen for new projects. There is a certain amount of tension in the PHP development community between the need to maintain compatibility for large amounts of ancient code and the need to evolve the language to keep it relevant for current developers. That tension has now come into the open with a proposal to split PHP into two languages.

PHP has been around for a long time; a previous version of the LWN site was implemented in PHP/FI in 1998. For most of its 25 years of existence, PHP has been criticized from multiple directions. Its development community has done a lot of work to address many of those criticisms while resisting others that, it was felt, went against the values of the language. Often these changes have forced code written in PHP to change as well; such changes tend to be the most controversial.

For example, consider the current controversy over "short open tags". PHP code is embedded within an HTML page with a sequence like this:

    <?php /* PHP code */ ?>

Back in the early days, though, the language also understood an abbreviated version:

    <? /* PHP code */ ?>

The latter form is, among other things, not properly XML compliant and has been deprecated for years. If the short_open_tag setting is enabled, though, these tags will still be recognized. The PHP community decided some time ago that it wanted to remove all settings that affect the language globally, and it seems that short_open_tag is the only one remaining. But a proposal to remove it has been through multiple iterations, has elicited strong opposition, and has inspired lengthy discussion threads on the project's mailing lists. The PHP project resolves such issues through voting; as of this writing, there are 28 votes in favor of removal and 24 opposed. Unless things change before the vote closes on August 20, this particular measure will fail to reach the 2/3 majority required to pass.

Bringing peace

This vote highlights the sort of division that can be found in the PHP community. Referring to "a growing sense of polarization", PHP developer Zeev Suraski tried to improve the situation with a proposal titled "Bringing Peace to the Galaxy"; so far, it would appear to have failed to do so, at least in the way that was intended.

Suraski described the disagreement as being between those who want to push the language forward and those who value backward compatibility above almost all else. "To a large degree, these views are diametrically opposed. This made many internals@ discussions turn into literally zero sum games - where when one side 'wins', the other side 'loses'". The answer that he came up with was to give both sides what they want by splitting the language in two:

  • "Classic" PHP would continue to be developed with a strong emphasis on keeping millions of lines of existing code working. The language would not be frozen, but it would be highly resistant to changes that break compatibility.
  • A new language, called "P++" for now, would take the opposite approach, breaking compatibility and adding features (such as strict typing, greater consistency across the language, new types and, naturally, killing off short open tags).

On its surface, this looks like a fork of PHP or, at a minimum, a split like that seen between Python versions 2 and 3. Suraski envisions something a bit different, though. Unlike Python, the PHP community would continue to develop (and add features to) its old version, with no plans for leaving it behind at some point. He also envisions supporting both languages from the same code base and a common runtime system. A single binary would implement both PHP and P++. So, rather than creating a fork, Suraski aims to create a single project with two faces.

Suraski's post (along with the subsequently posted P++ FAQ) was intended to provoke discussion; as one might imagine, it succeeded. Some developers like the idea, but many more seem to be concerned about it, for a number of reasons. One of those was expressed by Dan Ackroyd among others:

PHP internals is already lacking programming resources to do everything we want to be doing.

Maintaining two versions at once would be more work, so this idea is not feasible without a dramatic increase in the number of people working on PHP core.

Suraski optimistically responded that his proposal "will take no additional resources", mostly as the result of the use of a single code base. It is not clear that others in the community find this argument persuasive, though.

The idea of "rebranding" PHP with the new language is appealing to Suraski, who sees it as a way of getting away from PHP's not-entirely-positive reputation. Others, such as Nikita Popov, worried that rebranding will leave a valuable name behind without any corresponding benefit. Rebranding is also something that can only be done once, Popov said; it won't be an option five years down the road when the desire to add yet another set of incompatible features arises. Suraski responded that rebranding can bring new life to a project by attracting interest and getting developers who have written PHP off to take another look. The backward-compatibility break in P++ was described as a one-time thing, where all of the changes could be made at once, so Suraski was not worried about having to do it again anytime soon.

To some in the conversation, P++ was reminiscent of the Hack language, which is a fork of PHP created at Facebook. This concern is addressed in the FAQ, where Suraski essentially said that things will work out differently because it's the PHP community doing the work. Hack is a single-company project, the FAQ reads, and is not as widely distributed as PHP is. The P++ language, instead, would come automatically with a future version of PHP, so it would be there, waiting, whenever new developers wanted to try it.

What polarization?

What may well turn out to be the majority view, though, was well expressed by Arvids Godjuks, who seems to feel that the entire conception of the problem is wrong. The division described by Suraski does not really exist, Godjuks said; instead, almost all developers are interested in both compatibility and language evolution. The right thing to do is to continue, as the community has done so far, to find a balance between those two requirements:

Right now PHP does have somewhat of a plan and direction it is going, it is going at a decent pace - not too slow, not too fast. The community is able to adopt the new features and changes in a timely manner and gracefully introduce their support or requirement without everyone running like headless chickens. So maybe solidify the plan, make it into an actual roadmap? That will allow people to make long term plans and decisions and make [backward compatibility] less of an issue.

Instead, Godjuks said, splitting the language would force developers to choose between two versions of PHP, neither of which has the flexibility found in current PHP. One of those two versions would probably wither and die. This point of view was supported in the only post by Rasmus Lerdorf in the thread; Lerdorf, of course, is the creator of PHP. He said:

Forcing a balance, even if sometimes the arguments get rather heated (and they were just as heated, if not more so 20+ years ago), keeps everyone on the same page and working on the same code-base without the us vs. them situation that is bound to creep in.

The discussion is far from resolved at this point, but perhaps some indication of the community's feeling can be found in this poll asking whether the P++ idea is worth pursuing. As of this writing, there are zero votes in favor and 28 opposed (including Suraski, who described the poll as "a false choice").

All told, the P++ idea would appear to be fighting a strong headwind in the community; unless something changes, it seems that it will be hard to build a critical mass of developers interested in making it happen. The discussion will not be wasted, though, if it helps to focus the community's collective mind on how it wants to see the language develop in the coming years. Supporting existing code and keeping the language relevant into the future are both important goals; if the PHP community can find a way to balance those priorities, the language may well continue to thrive for a long time.


to post comments

PHP and P++

Posted Aug 15, 2019 14:28 UTC (Thu) by ju3Ceemi (subscriber, #102464) [Link] (23 responses)

This looks like a complicated ways of implementing feature flags.
The issue with short_open_tag and likes is the requirement to support the associated code : for every legacy feature, you need to support and maintain the code

This is the issue, which is not resolved by this p++ idea.

If you want those old features, but also a stricter mode, you shall use feature flags, which could be "strict" by default : people who want the "legacy-compliant" php simply need to change the configuration
Exactly what is done today with short_open_tag (it is enabled by default, thought)

PHP and P++

Posted Aug 15, 2019 14:55 UTC (Thu) by burki99 (subscriber, #17149) [Link] (19 responses)

I'm also sceptical, seeing how a seemingly innocent change like brackets around print in Python 3 took years to resolve. The only chance I see to make this happen is a per file option giving you the option to freely to intermix PHP and P++ in any project maybe along the lines of the already existing declare(strict_types=1) (e.g. declare(strict_syntax=1))

PHP and P++

Posted Aug 15, 2019 17:20 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (18 responses)

The brackets around print() were the *least* of Python 3's problems. If that had been the entire change, then both 2to3 and 3to2 would have been completely trivial programs, everyone would have transformed their code once, and then it would have been over and done with. Once every now and then, some ancient code would spit out a "SyntaxError: missing parentheses in call to print," you'd Google it, and StackOverflow would tell you "run it through 2to3," and that would be it.

The real problem was Unicode support. It's basically impossible to determine by static analysis how to transform a string-manipulation program written in Python 2 into Python 3, because you don't know the language-level types of anything, and you also don't know whether any given 8-bit string (Python 2 str) is semantically text, bytes, or a Unix filesystem path (which is neither text nor bytes but an unholy amalgamation of both).

PHP and P++

Posted Aug 15, 2019 20:41 UTC (Thu) by juliank (guest, #45896) [Link] (4 responses)

Yet in practice, translating was fairly trivial, and a lot of people were simply too lazy and did not bother.

PHP and P++

Posted Aug 16, 2019 10:39 UTC (Fri) by h2g2bob (subscriber, #130451) [Link] (3 responses)

As NYKevin said, the problem is that comparing bytes and unicode will return False. So you'll find this code the hard way:

enable_foo = b'true'
if enable_foo == u'true':
...

Obviously enable_foo is from one or more read() or recv() in a different module. Or from ctypes. Or from users of your library code.

PHP and P++

Posted Aug 16, 2019 11:03 UTC (Fri) by juliank (guest, #45896) [Link] (1 responses)

Yeah, it would be easier if comparison were strictly typed.

PHP and P++

Posted Aug 18, 2019 3:00 UTC (Sun) by k8to (guest, #15413) [Link]

Do you mean an exception on type mismatch?

That sounds probably useful for most code i write, and it would cause huge explosions in most code I have to work on that other people write. Probably a good idea all around.

PHP and P++

Posted Aug 18, 2019 23:25 UTC (Sun) by mjblenner (subscriber, #53463) [Link]

You could try the -b (or -bb) switch. e.g:

python3 -bb

>>> b'true' == 'true'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
BytesWarning: Comparison between bytes and string

PHP and P++

Posted Aug 15, 2019 20:56 UTC (Thu) by rweikusat2 (subscriber, #117920) [Link] (12 responses)

> or a Unix filesystem path (which is neither text nor bytes but an unholy amalgamation of both)

A UNIX filesystem name is a of bytes whose values are neither 0 nor 47. A UNIX filesystem path is sequence of UNIX filesystem names separated by non-empty sequences of bytes with value 47 ('/'). The unholy idea that there's one character set to rule them all (which - coincidentally - makes everyone bend over backwards to get support for the characters his language is written in except people from the USA) and that The Character Set Encoding is as dictated to uses as The Character Set by some entity selling operating systems is decades newer than this.

PHP and P++

Posted Aug 15, 2019 21:02 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (9 responses)

> which - coincidentally - makes everyone bend over backwards to get support for the characters his language is written in except people from the USA
How does UTF-8 make everybody bend over backwards?

At this point mandating UTF-8 for file names is pretty much the only sane way.

PHP and P++

Posted Aug 16, 2019 1:01 UTC (Fri) by flussence (guest, #85566) [Link] (8 responses)

NFC, NFD or broken UTF-8?

PHP and P++

Posted Aug 16, 2019 1:14 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Any of them would be better than the status quo.

PHP and P++

Posted Aug 16, 2019 11:14 UTC (Fri) by ale2018 (guest, #128727) [Link]

That is kind of irrelevant. A system choice. Even with ASCII it has always been possible to create files whose names begin with a minus (-), or contain backspaces (x08), spaces ( ), or other characters that may confuse human and machine interpreters alike. To paraphrase the POTUS, it's not the gun that shoots you in the foot.

PHP and P++

Posted Aug 16, 2019 16:11 UTC (Fri) by Deleted user 129183 (guest, #129183) [Link] (4 responses)

> NFC, NFD or broken UTF-8?

Since in Unicode, precomposed characters exist only for compatibility with pre-Unicode encodings, NFD should be probably the way to go.

PHP and P++

Posted Aug 16, 2019 16:17 UTC (Fri) by rweikusat2 (subscriber, #117920) [Link] (3 responses)

Please wake me when the unicode consortium start to consider S + combining vertical bar aka $ a precomposed character ...

PHP and P++

Posted Aug 16, 2019 20:14 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

"$" sign is not an "S-with-a-bar". It can be written as "S" with two smaller bars on top and bottom (like in the font I'm using right now).

But what does this have to do with the mess that are the file names?

PHP and P++

Posted Aug 18, 2019 16:22 UTC (Sun) by rweikusat2 (subscriber, #117920) [Link] (1 responses)

Why don't you just repeat the original statement without using a pointless aside sharing a couple of characters with a text of mine to pseudo-connect the repetition to this text?

PHP and P++

Posted Aug 18, 2019 17:13 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

What? I have no idea what you're saying.

PHP and P++

Posted Aug 16, 2019 19:17 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

For me, NFKC is the obviously-right way to normalize the names of filesystem entities.

PHP and P++

Posted Aug 15, 2019 22:09 UTC (Thu) by roc (subscriber, #30627) [Link]

Treating everything as bytes is fine for filesystem APIs, but a big problem arises when you want to print path names; if you don't know the encoding, and the path name is not ASCII, you can't print them correctly. A slightly lesser problem is the reverse: when you receive a path name that happens to be in Unicode (because it comes from user input in Unicode, for example), and is non-ASCII.

If you care about those problems then you need to define the encoding of path names, and decide how to handle path names that aren't valid in the encoding.

PHP and P++

Posted Aug 17, 2019 12:44 UTC (Sat) by dvdeug (guest, #10998) [Link]

Is it 47, or is it '/'? The latter smacks of "one character set to rule them all", because 47 is 'å' in certain dialects of EBCDIC and can be part of a multibyte character in SJIS.

> which - coincidentally - makes everyone bend over backwards to get support for the characters his language is written in except people from the USA

To the extent that's true, it's less true than any of the systems that preceded it, and one character set to rule them all seems to be the best way to reduce that problem. UNIX basically assumes that whatever character set is being used, it's a superset of ASCII, which can hardly be the fault of Unicode that was created 20 years later. Heck, in 1998, simply supporting 8-bit characters was a release goal for Debian Hamm, because many Un*x utilities didn't out of the box. That is, you could have any character set you want, as long as it's ASCII.

On any of the pre-Unicode European solutions, an Estonian named Tõnisson would be out of luck in adding his correct name to a document that French and Germans had already added their names to; one byte worked for Western Europe, and who wanted to waste more space for Estonians with names like Tõnisson? If you were lucky enough to be using something that supported ISO-2022 (i.e. someone from East Asia was probably involved), the Estonian could type his name, but not actually search safely for names, as Päts could be encoded various ways, depending on whether a German or an Estonian entered the name.

And - coincidentally - Unicode was the first and usually only character set for hundreds of languages around the world. Speakers of small, less powerful, languages like Lakota or Greenlandic or Xhosa had to resort to font hacks to get any support for the language at all, whereas now it comes free with a decent-sized Unicode font.

PHP and P++

Posted Aug 15, 2019 16:32 UTC (Thu) by iabervon (subscriber, #722) [Link] (2 responses)

The real problem with feature flags is that you actually have to support (and test) 2^N combinations of flags. Having a linear number of profiles that choose the combinations that are allowed is still not constant effort, but it's (literally) exponentially better.

This is effectively what GCC does with -std=c99 and such. There are ~7 C standards that GCC supports, rather than being able to mix and match paragraphs from different standards.

PHP and P++

Posted Aug 15, 2019 17:29 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Not all feature flags actually interact with each other. I'm also not sure that "feature flag" is the correct term here. Instead, something like CMake's policy system might be better (AFAIK, it is similar to Perl's `use 5.20;` statement, but also allows some fine-grained control). Code declares its minimum version (which sets policy settings). CMake then notices when a policy would be triggered by some code and says "hey, newer versions of CMake interpret this code differently, but the old behavior will be used right now". Most policies are orthogonal to each other, but when they do interact, the newer one usually just assumes the new behavior of the old policy (e.g., the one which rewrites the variable expansion code assumes another policy, so it is documented as "policy 10 is not relevant under policy 53; post-10 behavior is used"). Setting a policy setting to use the old behavior is a pretty big code smell and basically indicates that something isn't covered by the new behavior (warranting an issue).

If the PHP VM can warn when it sees code which changes behavior under the new policies, fixing them is much easier because usages get called out. When code is OK with the new behavior it says "I'm ready" and the VM just does the new thing instead of warning and doing the old behavior. CMake has been able to keep very strong backwards compatibility using this pattern and I think that PHP would be able to do so as well if it went down a similar route.

PHP and P++

Posted Aug 17, 2019 16:50 UTC (Sat) by felix.s (guest, #104710) [Link]

There are ~7 C standards that GCC supports, rather than being able to mix and match paragraphs from different standards.

You mean, like you can with -fno-delete-null-pointer-checks, -fstrict-aliasing, -f{w,t}rapv, -fgnu89-inline, -fms-extensions, -f{un,}signed-char, -fno-asm, and so on? And let's not forget __attribute__((optimize))

Sure, these aren't strictly the same thing; ISO C standards are designed to be largely forward-compatible (portable C89 code free of UB can be usually compiled as C11 with no changes), so you'll have a hard time finding cases where the different versions of the standard flat-out contradict each other, creating incompatible dialects. But these options do change how compiler handles certain specific situations that are implementation-defined or UB according to the standard, and some code does depend on one dialect or the other.

PHP and P++

Posted Aug 15, 2019 14:35 UTC (Thu) by dskoll (subscriber, #1630) [Link] (3 responses)

Extension/module authors would have to either pick one language and stick with it, or maintain two versions of their extensions or modules. This would be a disaster.

PHP and P++

Posted Aug 15, 2019 14:36 UTC (Thu) by dskoll (subscriber, #1630) [Link] (2 responses)

Ah, I see the FAQ says you can mix PHP and P++, but color me skeptical on how well that would work in practice for extensions and library code.

PHP and P++

Posted Aug 15, 2019 22:22 UTC (Thu) by roc (subscriber, #30627) [Link] (1 responses)

Rust's edition system is working pretty well so far. The key points are:
1) have modules explicitly state which edition they use and make sure all your tools respect that.
2) have tools available from day 1 of the new edition that automatically update code to the new edition as much as possible, and emit clear messages where non-automatic changes need to be made.
3) ensure modules from different editions can be used together, and make that as seamless as possible. For example, Rust introduced "raw identifiers" that let you write identifiers that are reserved words, so if a module API uses an identifier that's a reserved word in a later edition, code in the later edition can still use it.

This constrains the kinds of changes you can make between editions, but it does allow you to make a lot of significant backwards-incompatible changes. It works better when your language is strongly statically-typed like Rust.

Python completely failed at 1, 2 and 3, for various reasons.

PHP and P++

Posted Aug 15, 2019 22:35 UTC (Thu) by roc (subscriber, #30627) [Link]

Oh, you also kind of want

4) have a tool that reformats source code automatically (especially line breaks) and encourage a culture of using it routinely

so that you can run that tool after applying the automatic updates from point 2. Not a big deal, though you want this for other reasons too.

PHP and P++

Posted Aug 15, 2019 16:24 UTC (Thu) by juliank (guest, #45896) [Link] (8 responses)

Just switch to Go instead. Seriously, why bother with all that crap?

PHP and P++

Posted Aug 15, 2019 16:39 UTC (Thu) by xnox (guest, #63320) [Link]

Indeed. Lots of people migrated from python2 to golang, instead of python3.

Ideally, I would wish to not have PHP at all, but I also don't see it going away any time soon in practice. I wonder how many generations of developers it will take =/

It is a bit of an existential crisis because if it freezes and doesn't evolve anymore it might die too. However, it seems to work out great for LaTeX2e.

PHP and P++

Posted Aug 16, 2019 5:49 UTC (Fri) by da4089 (subscriber, #1195) [Link] (4 responses)

go1 or go2?

PHP and P++

Posted Aug 16, 2019 6:37 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

There's no go2, but even when it's finally here it's going to be backwards compatible with go1.

You might dislike Go, but they make great efforts to preserve backwards compatibility.

PHP and P++

Posted Aug 16, 2019 7:43 UTC (Fri) by amacater (subscriber, #790) [Link] (2 responses)

Surely: Go2 considered harmful from the outset

PHP and P++

Posted Aug 16, 2019 7:45 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

LOL. Walked right into it.

PHP and P++

Posted Aug 16, 2019 12:33 UTC (Fri) by remicardona (guest, #99141) [Link]

well played sir, well played [tips hat]

PHP and P++

Posted Aug 17, 2019 0:09 UTC (Sat) by Freeaqingme (subscriber, #103259) [Link] (1 responses)

I've used both PHP for a very long time and have been using Go for the past 4 years or so. Though I really like to use Go being statically typed and allow for low level machine access, I think there's still plenty of use cases for PHP. I haven't done any kind of formal analysis but my gut says that for certain templating or 'simple' webdevelopment projects, PHP may be the better choice in terms of implementation speed.

Also, using an IDE that provides static analysis, using OOP, and declaring strict_types=1 in every file, PHP is a pretty decent programming language. Most of the crap it receive(s|d) is based on PHP4 and people who were able to get something to compile, but in no way could be considered programmers or software engineers.

PHP and P++

Posted Aug 22, 2019 11:36 UTC (Thu) by mina86 (guest, #68442) [Link]

Just use Python with type hints if you like static typing but find PHP to have its use cases.

PHP and P++

Posted Aug 19, 2019 18:11 UTC (Mon) by acomjean (guest, #117735) [Link]

I use lots of languages at work (python, java, R), but for websites, I still reach for php. With Symfony framework, twig templating, I really enjoy it.

I haven't seen <? /* PHP code */ ?> style in years. Certainly not in anything I've developed in the past decade. Though we've been spoiled as php has maintained easy backward compatibility.

But PHP is now on a much more aggressive upgrade path, I get why nobody wants to test their old php code and make sure it works with the new version, but thats what we're looking at now. The php 7 series is so much faster the the 5x I'm surprised there are still laggards. Nobody like maintaining old versions, but splitting P++ I feel just splits the limited development manpower (Php is not the new hotness JS is...) . This was tried before with the language "hack", by Facebook. It never caught on.

https://docs.hhvm.com

https://www.php.net/supported-versions.php


Copyright © 2019, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds