PHP and P++
PHP has been around for a long time; a previous version of the LWN site was implemented in PHP/FI in 1998. For most of its 25 years of existence, PHP has been criticized from multiple directions. Its development community has done a lot of work to address many of those criticisms while resisting others that, it was felt, went against the values of the language. Often these changes have forced code written in PHP to change as well; such changes tend to be the most controversial.
For example, consider the current controversy over "short open tags". PHP code is embedded within an HTML page with a sequence like this:
<?php /* PHP code */ ?>
Back in the early days, though, the language also understood an abbreviated version:
<? /* PHP code */ ?>
The latter form is, among other things, not properly XML compliant and has been deprecated for years. If the short_open_tag setting is enabled, though, these tags will still be recognized. The PHP community decided some time ago that it wanted to remove all settings that affect the language globally, and it seems that short_open_tag is the only one remaining. But a proposal to remove it has been through multiple iterations, has elicited strong opposition, and has inspired lengthy discussion threads on the project's mailing lists. The PHP project resolves such issues through voting; as of this writing, there are 28 votes in favor of removal and 24 opposed. Unless things change before the vote closes on August 20, this particular measure will fail to reach the 2/3 majority required to pass.
Bringing peace
This vote highlights the sort of division that can be found in the PHP
community. Referring to "a growing sense of polarization
",
PHP developer Zeev Suraski tried to improve the situation with a proposal
titled "Bringing Peace to the Galaxy"; so far, it
would appear to have failed to do so, at least in the way that was
intended.
Suraski described the disagreement as being between those who want to push the
language forward and those who value backward compatibility above almost
all else. "To a large degree, these views are diametrically opposed.
This made many internals@ discussions turn into literally zero sum games -
where when one side 'wins', the other side 'loses'
". The answer
that he came up with was to give both sides what they want by splitting the
language in two:
- "Classic" PHP would continue to be developed with a strong emphasis on keeping millions of lines of existing code working. The language would not be frozen, but it would be highly resistant to changes that break compatibility.
- A new language, called "P++" for now, would take the opposite approach, breaking compatibility and adding features (such as strict typing, greater consistency across the language, new types and, naturally, killing off short open tags).
On its surface, this looks like a fork of PHP or, at a minimum, a split like that seen between Python versions 2 and 3. Suraski envisions something a bit different, though. Unlike Python, the PHP community would continue to develop (and add features to) its old version, with no plans for leaving it behind at some point. He also envisions supporting both languages from the same code base and a common runtime system. A single binary would implement both PHP and P++. So, rather than creating a fork, Suraski aims to create a single project with two faces.
Suraski's post (along with the subsequently posted P++ FAQ) was intended to provoke discussion; as one might imagine, it succeeded. Some developers like the idea, but many more seem to be concerned about it, for a number of reasons. One of those was expressed by Dan Ackroyd among others:
Maintaining two versions at once would be more work, so this idea is not feasible without a dramatic increase in the number of people working on PHP core.
Suraski optimistically responded that his
proposal
"will take no additional resources
", mostly as the result of
the use of a single code base. It is not clear that others in the
community find this argument persuasive, though.
The idea of "rebranding" PHP with the new language is appealing to Suraski, who sees it as a way of getting away from PHP's not-entirely-positive reputation. Others, such as Nikita Popov, worried that rebranding will leave a valuable name behind without any corresponding benefit. Rebranding is also something that can only be done once, Popov said; it won't be an option five years down the road when the desire to add yet another set of incompatible features arises. Suraski responded that rebranding can bring new life to a project by attracting interest and getting developers who have written PHP off to take another look. The backward-compatibility break in P++ was described as a one-time thing, where all of the changes could be made at once, so Suraski was not worried about having to do it again anytime soon.
To some in the conversation, P++ was reminiscent of the Hack language, which is a fork of PHP created at Facebook. This concern is addressed in the FAQ, where Suraski essentially said that things will work out differently because it's the PHP community doing the work. Hack is a single-company project, the FAQ reads, and is not as widely distributed as PHP is. The P++ language, instead, would come automatically with a future version of PHP, so it would be there, waiting, whenever new developers wanted to try it.
What polarization?
What may well turn out to be the majority view, though, was well expressed by Arvids Godjuks, who seems to feel that the entire conception of the problem is wrong. The division described by Suraski does not really exist, Godjuks said; instead, almost all developers are interested in both compatibility and language evolution. The right thing to do is to continue, as the community has done so far, to find a balance between those two requirements:
Instead, Godjuks said, splitting the language would force developers to choose between two versions of PHP, neither of which has the flexibility found in current PHP. One of those two versions would probably wither and die. This point of view was supported in the only post by Rasmus Lerdorf in the thread; Lerdorf, of course, is the creator of PHP. He said:
The discussion is far from resolved at this point, but perhaps some indication
of the community's feeling can be found in this poll asking whether
the P++ idea is worth pursuing. As of this writing, there are zero votes
in favor and 28 opposed (including Suraski, who described the poll as "a false
choice
").
All told, the P++ idea would appear to be fighting a strong headwind in the
community; unless something changes, it seems that it will be hard to build
a critical mass of developers interested in making it happen. The
discussion will not be wasted, though, if it helps to focus the community's
collective mind on how it wants to see the language develop in the coming
years. Supporting existing code and keeping the language relevant into the
future are both important goals; if the PHP community can find a way to
balance those priorities, the language may well continue to thrive for a
long time.
Posted Aug 15, 2019 14:28 UTC (Thu)
by ju3Ceemi (subscriber, #102464)
[Link] (23 responses)
This is the issue, which is not resolved by this p++ idea.
If you want those old features, but also a stricter mode, you shall use feature flags, which could be "strict" by default : people who want the "legacy-compliant" php simply need to change the configuration
Posted Aug 15, 2019 14:55 UTC (Thu)
by burki99 (subscriber, #17149)
[Link] (19 responses)
Posted Aug 15, 2019 17:20 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (18 responses)
The real problem was Unicode support. It's basically impossible to determine by static analysis how to transform a string-manipulation program written in Python 2 into Python 3, because you don't know the language-level types of anything, and you also don't know whether any given 8-bit string (Python 2 str) is semantically text, bytes, or a Unix filesystem path (which is neither text nor bytes but an unholy amalgamation of both).
Posted Aug 15, 2019 20:41 UTC (Thu)
by juliank (guest, #45896)
[Link] (4 responses)
Posted Aug 16, 2019 10:39 UTC (Fri)
by h2g2bob (subscriber, #130451)
[Link] (3 responses)
enable_foo = b'true'
Obviously enable_foo is from one or more read() or recv() in a different module. Or from ctypes. Or from users of your library code.
Posted Aug 16, 2019 11:03 UTC (Fri)
by juliank (guest, #45896)
[Link] (1 responses)
Posted Aug 18, 2019 3:00 UTC (Sun)
by k8to (guest, #15413)
[Link]
That sounds probably useful for most code i write, and it would cause huge explosions in most code I have to work on that other people write. Probably a good idea all around.
Posted Aug 18, 2019 23:25 UTC (Sun)
by mjblenner (subscriber, #53463)
[Link]
python3 -bb
>>> b'true' == 'true'
Posted Aug 15, 2019 20:56 UTC (Thu)
by rweikusat2 (subscriber, #117920)
[Link] (12 responses)
A UNIX filesystem name is a of bytes whose values are neither 0 nor 47. A UNIX filesystem path is sequence of UNIX filesystem names separated by non-empty sequences of bytes with value 47 ('/'). The unholy idea that there's one character set to rule them all (which - coincidentally - makes everyone bend over backwards to get support for the characters his language is written in except people from the USA) and that The Character Set Encoding is as dictated to uses as The Character Set by some entity selling operating systems is decades newer than this.
Posted Aug 15, 2019 21:02 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (9 responses)
At this point mandating UTF-8 for file names is pretty much the only sane way.
Posted Aug 16, 2019 1:01 UTC (Fri)
by flussence (guest, #85566)
[Link] (8 responses)
Posted Aug 16, 2019 1:14 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Aug 16, 2019 11:14 UTC (Fri)
by ale2018 (guest, #128727)
[Link]
Posted Aug 16, 2019 16:11 UTC (Fri)
by Deleted user 129183 (guest, #129183)
[Link] (4 responses)
Since in Unicode, precomposed characters exist only for compatibility with pre-Unicode encodings, NFD should be probably the way to go.
Posted Aug 16, 2019 16:17 UTC (Fri)
by rweikusat2 (subscriber, #117920)
[Link] (3 responses)
Posted Aug 16, 2019 20:14 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
But what does this have to do with the mess that are the file names?
Posted Aug 18, 2019 16:22 UTC (Sun)
by rweikusat2 (subscriber, #117920)
[Link] (1 responses)
Posted Aug 18, 2019 17:13 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Aug 16, 2019 19:17 UTC (Fri)
by mpr22 (subscriber, #60784)
[Link]
Posted Aug 15, 2019 22:09 UTC (Thu)
by roc (subscriber, #30627)
[Link]
If you care about those problems then you need to define the encoding of path names, and decide how to handle path names that aren't valid in the encoding.
Posted Aug 17, 2019 12:44 UTC (Sat)
by dvdeug (guest, #10998)
[Link]
> which - coincidentally - makes everyone bend over backwards to get support for the characters his language is written in except people from the USA
To the extent that's true, it's less true than any of the systems that preceded it, and one character set to rule them all seems to be the best way to reduce that problem. UNIX basically assumes that whatever character set is being used, it's a superset of ASCII, which can hardly be the fault of Unicode that was created 20 years later. Heck, in 1998, simply supporting 8-bit characters was a release goal for Debian Hamm, because many Un*x utilities didn't out of the box. That is, you could have any character set you want, as long as it's ASCII.
On any of the pre-Unicode European solutions, an Estonian named Tõnisson would be out of luck in adding his correct name to a document that French and Germans had already added their names to; one byte worked for Western Europe, and who wanted to waste more space for Estonians with names like Tõnisson? If you were lucky enough to be using something that supported ISO-2022 (i.e. someone from East Asia was probably involved), the Estonian could type his name, but not actually search safely for names, as Päts could be encoded various ways, depending on whether a German or an Estonian entered the name.
And - coincidentally - Unicode was the first and usually only character set for hundreds of languages around the world. Speakers of small, less powerful, languages like Lakota or Greenlandic or Xhosa had to resort to font hacks to get any support for the language at all, whereas now it comes free with a decent-sized Unicode font.
Posted Aug 15, 2019 16:32 UTC (Thu)
by iabervon (subscriber, #722)
[Link] (2 responses)
This is effectively what GCC does with -std=c99 and such. There are ~7 C standards that GCC supports, rather than being able to mix and match paragraphs from different standards.
Posted Aug 15, 2019 17:29 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
If the PHP VM can warn when it sees code which changes behavior under the new policies, fixing them is much easier because usages get called out. When code is OK with the new behavior it says "I'm ready" and the VM just does the new thing instead of warning and doing the old behavior. CMake has been able to keep very strong backwards compatibility using this pattern and I think that PHP would be able to do so as well if it went down a similar route.
Posted Aug 17, 2019 16:50 UTC (Sat)
by felix.s (guest, #104710)
[Link]
You mean, like you can with Sure, these aren't strictly the same thing; ISO C standards are designed to be largely forward-compatible (portable C89 code free of UB can be usually compiled as C11 with no changes), so you'll have a hard time finding cases where the different versions of the standard flat-out contradict each other, creating incompatible dialects. But these options do change how compiler handles certain specific situations that are implementation-defined or UB according to the standard, and some code does depend on one dialect or the other.
Posted Aug 15, 2019 14:35 UTC (Thu)
by dskoll (subscriber, #1630)
[Link] (3 responses)
Extension/module authors would have to either pick one language and stick with it, or maintain two versions of their extensions or modules. This would be a disaster.
Posted Aug 15, 2019 14:36 UTC (Thu)
by dskoll (subscriber, #1630)
[Link] (2 responses)
Ah, I see the FAQ says you can mix PHP and P++, but color me skeptical on how well that would work in practice for extensions and library code.
Posted Aug 15, 2019 22:22 UTC (Thu)
by roc (subscriber, #30627)
[Link] (1 responses)
This constrains the kinds of changes you can make between editions, but it does allow you to make a lot of significant backwards-incompatible changes. It works better when your language is strongly statically-typed like Rust.
Python completely failed at 1, 2 and 3, for various reasons.
Posted Aug 15, 2019 22:35 UTC (Thu)
by roc (subscriber, #30627)
[Link]
4) have a tool that reformats source code automatically (especially line breaks) and encourage a culture of using it routinely
so that you can run that tool after applying the automatic updates from point 2. Not a big deal, though you want this for other reasons too.
Posted Aug 15, 2019 16:24 UTC (Thu)
by juliank (guest, #45896)
[Link] (8 responses)
Posted Aug 15, 2019 16:39 UTC (Thu)
by xnox (guest, #63320)
[Link]
Ideally, I would wish to not have PHP at all, but I also don't see it going away any time soon in practice. I wonder how many generations of developers it will take =/
It is a bit of an existential crisis because if it freezes and doesn't evolve anymore it might die too. However, it seems to work out great for LaTeX2e.
Posted Aug 16, 2019 5:49 UTC (Fri)
by da4089 (subscriber, #1195)
[Link] (4 responses)
Posted Aug 16, 2019 6:37 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
You might dislike Go, but they make great efforts to preserve backwards compatibility.
Posted Aug 16, 2019 7:43 UTC (Fri)
by amacater (subscriber, #790)
[Link] (2 responses)
Posted Aug 16, 2019 7:45 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Aug 16, 2019 12:33 UTC (Fri)
by remicardona (guest, #99141)
[Link]
Posted Aug 17, 2019 0:09 UTC (Sat)
by Freeaqingme (subscriber, #103259)
[Link] (1 responses)
Also, using an IDE that provides static analysis, using OOP, and declaring strict_types=1 in every file, PHP is a pretty decent programming language. Most of the crap it receive(s|d) is based on PHP4 and people who were able to get something to compile, but in no way could be considered programmers or software engineers.
Posted Aug 22, 2019 11:36 UTC (Thu)
by mina86 (guest, #68442)
[Link]
Posted Aug 19, 2019 18:11 UTC (Mon)
by acomjean (guest, #117735)
[Link]
I haven't seen <? /* PHP code */ ?> style in years. Certainly not in anything I've developed in the past decade. Though we've been spoiled as php has maintained easy backward compatibility.
But PHP is now on a much more aggressive upgrade path, I get why nobody wants to test their old php code and make sure it works with the new version, but thats what we're looking at now. The php 7 series is so much faster the the 5x I'm surprised there are still laggards. Nobody like maintaining old versions, but splitting P++ I feel just splits the limited development manpower (Php is not the new hotness JS is...) . This was tried before with the language "hack", by Facebook. It never caught on.
PHP and P++
The issue with short_open_tag and likes is the requirement to support the associated code : for every legacy feature, you need to support and maintain the code
Exactly what is done today with short_open_tag (it is enabled by default, thought)
PHP and P++
PHP and P++
PHP and P++
PHP and P++
if enable_foo == u'true':
...
PHP and P++
PHP and P++
PHP and P++
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
BytesWarning: Comparison between bytes and string
PHP and P++
PHP and P++
How does UTF-8 make everybody bend over backwards?
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
There are ~7 C standards that GCC supports, rather than being able to mix and match paragraphs from different standards.
-fno-delete-null-pointer-checks
, -fstrict-aliasing
, -f{w,t}rapv
, -fgnu89-inline
, -fms-extensions
, -f{un,}signed-char
, -fno-asm
, and so on? And let's not forget __attribute__((optimize))
…
PHP and P++
PHP and P++
PHP and P++
1) have modules explicitly state which edition they use and make sure all your tools respect that.
2) have tools available from day 1 of the new edition that automatically update code to the new edition as much as possible, and emit clear messages where non-automatic changes need to be made.
3) ensure modules from different editions can be used together, and make that as seamless as possible. For example, Rust introduced "raw identifiers" that let you write identifiers that are reserved words, so if a module API uses an identifier that's a reserved word in a later edition, code in the later edition can still use it.
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++
PHP and P++