Python structural pattern matching morphs again
A way to specify multiply branched conditionals in the Python language—akin to the C switch statement—has been a longtime feature request. Over the years, various proposals have been mooted, but none has ever crossed the finish line and made it into the language. A highly ambitious proposal that would solve the multi-branch-conditional problem (and quite a bit more) has been discussed—dissected, perhaps—in the Python community over the last six months or so. We have covered some of the discussion in August and September, but the ground has shifted once again so it is time to see where things stand.
It seems quite possible that this could be the last major change that is made to the language—if it is made at all. As with many mature projects, there is a good deal of conservatism that tends to rear its head when big changes are proposed for Python. But this proposal has the backing of project founder (and former benevolent dictator for life) Guido van Rossum and has attracted support from other core developers—as well as opposition from within that group. It may also depend on one's definition of major, of course, but large syntactic and semantic language changes are definitely finding major headwinds in the Python community these days.
Background
The basic idea behind the "structural pattern matching" proposal is fairly straightforward, but there are
some rather deep aspects to it as well. Our
previous coverage, as well as the various Python Enhancement Proposals
(PEPs) surrounding the feature—linked below—will be helpful to readers who
want to dig in a ways. For those who just want the high-level
introduction, this example taken from PEP 622
("Structural Pattern Matching
") gives much of the flavor of
the proposed feature:
def make_point_3d(pt):
match pt:
case (x, y):
return Point3d(x, y, 0)
case (x, y, z):
return Point3d(x, y, z)
case Point2d(x, y):
return Point3d(x, y, 0)
case Point3d(_, _, _):
return pt
case _:
raise TypeError("not a point we support")
The make_point_3d() function uses the proposed match statement to extract the relevant information from its pt argument, which may be passed as a two-tuple, three-tuple, Point2d, or Point3d. The x, y, and z (if present) are matched in the object passed and assigned to those variables, which are then used to create a Point3d with the right values. The use of "_" as a wildcard is consistent with other languages that have similar constructs, and is even used in a similar fashion as a convention in Python, but is perhaps one of the more contentious parts of the proposal. The final case matches anything at all that has not been matched by an earlier case.
If you squint at that example, it looks ... Python-ish, perhaps. But the case entries have some substantial differences from the existing language. In particular, constructs like Point2d(x, y) do not instantiate a Point2d object, but test if the match argument matches that type. If so, x and y are not looked up in the local scope, but are, instead, assigned to. It is different enough from the usual way of reading Python code that some have called it a domain-specific language inside Python for matching, which is seen (by some) as something to be avoided.
Another contentious part of the proposal is the handling of names, which are always treated as variables that get filled in from the match (called "capture variables"), as opposed to looking the name up and using its current value as a constant to be matched. That does not sit well with some, who mainly think that the capture variables should be indicated with some kind of sigil (e.g. ?var); other uses of names should conform to Python's usual practice. But the long list of authors for PEP 622 unanimously agreed that the common capturing case should not be made "ugly" for consistency with other parts of Python. Part of the reasoning is that other languages which have the feature also default to capture variables for unadorned names.
But programmers will want to be able to use constants in their case entries. The first version of PEP 622 required a sigil in the form of a dot prepended to names that should be used as constants (e.g. .CONSTANT), but that was not wildly popular—to put it mildly. Round two of that PEP switched to requiring constants to be in a namespace, which might be seen as something of a cop-out, since that effectively still requires the dot (e.g. namespace.constant).
Three new PEPs
When last we left the saga, PEP 622 was being handed off to the Python
steering council for consideration. The council members discussed the PEP
among themselves as well as with the PEP's authors. The result of that was
announced
by one of those authors, Van Rossum, toward the end of October. It turned
out that "there were a lot of problems with the text
" of
PEP 622, so the authors abandoned it in favor of three new PEPs:
- PEP 634:
"
Structural Pattern Matching: Specification
" - PEP 635:
"
Structural Pattern Matching: Motivation and Rationale
" - PEP 636:
"
Structural Pattern Matching: Tutorial
"
Make that four
A few days before Van Rossum's announcement, steering council member Thomas
Wouters posted
a PEP addressing the use of "_": PEP 640
("Unused variable syntax
"). It would create a new unused
variable that can be assigned to, though the binding (or assignment) is not actually performed
and that variable cannot be used in any other way. The PEP proposes to use
"?" as that variable.
Currently, some Python code conventionally uses "_" for unused variables, though that name has no special treatment in the language. In particular, the "unused" value does get bound to the name "_". It is often used as follows:
x, _, z = (2, 3, 4) # x=2, z=4 (but _=3 as well)
for _ in range(10):
do_something()
# _=9 here
Using "unused", "dummy", or other regular names is possible too, of course. The problem that Wouters (and others) see is that the structural pattern matching proposal gives an additional meaning to "_", but does not extend it to the rest of the language. It is this inconsistency that led to the PEP:
Introducing ``?`` as special syntax for unused variables *both inside and outside pattern matching* allows us to retain that consistency. It avoids the conflict with internationalization *or any other uses of _ as a variable*. It makes unpacking assignment align more closely with pattern matching, making it easier to explain pattern matching as an extension of unpacking assignment.
There is one other oddity with "_": it has ... interesting ... behavior in the Python read-eval-print loop (REPL), where "_" is normally assigned to the value of the last-executed expression.
>>> 2+2
4
>>> _
4
If any of that is done in the REPL after the user explicitly assigns to "_", though, it
always holds the last value that was assigned. So there is a fair amount of
established usage of "_" that PEP 640 is trying to sidestep.
In Wouters's posting, he noted that adding "?" as the unused variable had benefits entirely independent of the pattern matching proposal, but he believes they are too small if PEP 634 is not adopted. So he thinks that PEP 640 should be rejected in that case. The reaction to the PEP was generally somewhat negative, though there was not a lot of discussion of the PEP itself in that thread. The main objection is that debugging uses of the unused variable when its value cannot be queried will be difficult.
Or five
Van Rossum's announcement of the three PEPs was also met with a fairly abbreviated thread (at least by the standards set in earlier rounds) that mostly consisted of tangential discussions on various pieces. But, as he was with PEP 622, Mark Shannon is not convinced that this form of pattern matching is needed at all in the language. He argued that it is a bad fit for a dynamically-typed procedural language like Python and that PEP 635 fails to offer a convincing case for the value of the feature (though the arguments have improved since PEP 622, he said).
Shannon had a number of specific areas where he believes that the proposal
falls short, which were mostly met with disagreement, but Nick Coghlan noted
that he shared some of Shannon's concerns. In fact, Coghlan had just posted
an announcement of PEP 642
("Constraint Pattern Syntax for Structural Pattern Matching
")
addressing some of those problems. His idea is that the existing
assignment syntax can be tweaked slightly to accommodate pattern matching,
while retaining the possibility that it could be used elsewhere in the
language down the road.
In the original version of the PEP, Coghlan combines literal and value (e.g. namespace.constant) patterns from PEP 634 into "constraint patterns". These constraint patterns can be tested either for equality or identity in a case. He used "?" as a prefix for equality and "?is" for identity and replaced the non-binding "_" wildcard with "?". The end result is that names are looked up and literals used if they are marked with "?"; literals that are not marked would raise a SyntaxError. It would look something like:
MISSING=404
match foo:
case ?0:
print('foo equals zero')
case ?is None:
print('foo is None')
case ?MISSING:
print('foo not found (404)')
case (a, b):
print(f'foo is a two-tuple: {a} {b}')
case _: # still works, _ is just a normal capture variable
print('foo is something wildly unexpected')
Steven D'Aprano did not like the PEP, but he had several suggestions, some of which were subsequently adopted by Coghlan. In particular, he dropped the need to have equality markers for literal values and switched away from using "?" entirely. Literal patterns are simply "case 0:", equality uses "==", and identity uses "is". D'Aprano also suggested that the problem with "_" in match is overblown:
Wouters sees things differently, however:
[...] The use of something else, like '?', leaves existing uses of '_' unambiguous, and allows structured pattern matching and iterable unpacking to be thought of the same. It reduces the complexity of the language because it no longer uses the same syntax for disparate things.
Tobias Kohn, one of the PEP 622 authors and co-author of PEP 635 with Van Rossum, noted that the idea of "load sigils" had been discussed and, in fact, the authors had settled on dot (".") for that case, but it proved to be unpopular. Kohn said that there is nothing in the current structural pattern matching proposal that precludes adding, say, "?" as a load sigil in the future. But he thinks those kinds of things can wait:
Deciding
While there are five PEPs floating around, two of them are informational in nature (635 and 636), so the steering council needs to decide if it will accept PEP 634 and add structural pattern matching to the language. It also needs to decide whether to augment or modify the feature with either PEP 640 to add "?" as an unused variable or PEP 642 to add constraint patterns and, effectively, load sigils. It could choose to adopt all three since Coghlan had switched PEP 642 to use "__" (double underscore) as its wildcard matching variable.
It is a complicated set of questions; if anything is adopted, it seems likely to have a significant impact for the language for a long time to come. The 2020 steering council will not be making the decision, however. The election for the 2021 steering council is currently underway; it completes on December 16. As reported by Wouters in early November, the current council will make a strong recommendation on the PEPs to the incoming council, which will make the final determination. There is no huge rush since the schedule for Python 3.10 shows the first beta, which is also the feature-freeze date, in early May 2021.
As part of the effort to make that recommendation, steering council member Brett Cannon posted a poll to the Python Discourse instance. He posted to the "Committers" category, where only core developers can comment and answer the poll. There were five options, one rejecting pattern matching entirely, three accepting PEP 634 with and without the other PEPs, and one for those who want pattern matching but not as defined in any of the PEPs.
When the voting closed on November 23, the clear split among core developers was evident. Half of the 34 voters wanted to accept PEP 634 in some form, while 44% (15 voters) did not want pattern matching at all and two voters (6%) wanted pattern matching but not as proposed. The poll is not binding in any way, of course, but it is indicative of the fault lines in the community with regard to the feature. Whichever way the council decides, it is likely to leave a sizable contingent unhappy.
Several commented in the poll thread about why they were voting one way or another; those in favor tended to see ways they could use the feature in their own code and were not overly bothered by any perceived inconsistencies. For the "no pattern matching" folks, Larry Hastings may have spoken for many of them when he said:
I can see how the PEP authors arrived at this approach, and I believe them when they say they thought long and hard about it and they really think this is the best solution. Therefore, since I dislike this approach so much, I’m pessimistic that anybody could come up with a syntax for pattern matching in Python that I would like. That’s why I voted for I don’t want pattern matching rather than I want pattern matching, but not as defined in those PEPs. It’s not that I’m against the whole concept of pattern matching, but I now believe it’s impossible to add it to Python today in a way that I would want.
There is a great deal more discussion in the python-dev mailing list for those who might want to dig in further. Coghlan's post of version two of PEP 642 and a suggestion by David Mertz to use words rather than sigils both led to interesting discussions. Paul Sokolovsky pointed participants to a recent academic paper [PDF] written by the authors of PEP 622 about pattern matching for Python; the paper sparked some discussion. Shannon also posted about some work he has been doing to define the precise semantics of pattern matching, which is something that is currently lacking. And so on.
It is, in short, one of the most heavily discussed Python features of all time. It seems likely that it even surpasses the discussion in the "PEP 572 mess", which brought the walrus operator (":=") to Python, but also led to Van Rossum's retirement. But maybe it only seems as large. In any case, the soon-to-be-elected steering council is in something of an unenviable position, but it seems clear that the question of this style of pattern matching for Python will finally be laid to rest early in 2021—one way or the other.
| Index entries for this article | |
|---|---|
| Python | match statement |
| Python | Python Enhancement Proposals (PEP)/PEP 622 |
| Python | Python Enhancement Proposals (PEP)/PEP 634 |
