"Structural pattern matching" for Python, part 2
We left the saga of PEP 622
("Structural Pattern Matching
") at the end of June, but the
discussion of
a Python "match" statement—superficially similar to a C
switch but with extra data-matching features—continued. At this
point, the next steps are up to the Python steering
council, which will determine the fate of the PEP. But there is lots
of discussion to catch up on from the last two months or so.
As a quick review, the match statement is meant to choose a particular code block to execute from multiple options based on conditions specified for each of the separate case entries. The conditions can be simple, as in:
match x:
case 1:
print('1')
case 2:
print('2')
That simply compares the value of x to the numeric constants, but
match can do far more than that. It is a kind of
generalized pattern matching that can instantiate variables in the
case condition when it gets matched, as the following example from
the PEP shows:
# The target is an (x, y) tuple
match point:
case (0, 0):
print("Origin")
case (0, y):
print(f"Y={y}")
case (x, 0):
print(f"X={x}")
case (x, y):
print(f"X={x}, Y={y}")
case _:
raise ValueError("Not a point")
Anti-PEP
On the other hand, there are some cognitive hurdles that folks will need to clear in order to understand this new syntax—and that's what is driving many of the complaints about the feature. Within a day after Guido van Rossum posted the first version of the PEP, he asked that commenters refrain from piling onto the huge thread so that the authors could have some time to hash things out. For the most part, that is what happened, but there was some concern that perhaps the opposition's arguments against the PEP would not get a full airing in the PEP itself. Mark Shannon, who had earlier questioned the need for the feature, proposed an "Anti-PEP" as a mechanism for opponents to marshal their arguments:
Whether the ultimate decision is made by the steering committee or by a PEP delegate, it is hard to make a decision between the pros and cons, when the pros are in a single formal document and the cons are scattered across the internet.
An Anti-PEP is a way to ensure that those opposed to a PEP can be heard and, if possible, have a coherent voice. Hopefully, it would also make things a lot less stressful for PEP authors.
Notably, an Anti-PEP would not propose an alternative, it would just collect the arguments against a proposed PEP. But Brett Cannon (and others) thought that there is another mechanism to be used:
Most other commenters agreed that the "Rejected Ideas" section (or
perhaps a new "Objections" section) is the right place to record such
arguments, though Raymond
Hettinger agreed
with Shannon about the need for a more formalized opposition document:
"The current process doesn't make it
likely that a balanced document is created for decision making
purposes.
" They were in the minority, however, so a formal anti-PEP
process seems improbable.
On July 1, Van Rossum
announced
a match statement "playground" that can be used
to test the proposed feature. It is a Binder
instance that runs a Jupyter Python
kernel that has been modified with a "complete implementation of the
PEP
". But the playground and other aspects of the process made
Rob Cliffe uneasy; he was concerned that the PEP was being
"railroaded through
".
Cliffe was under the impression that PEPs are never implemented until they have been accepted, but several people in the thread pointed out that is not the case. Chris Angelico said:
Round 2
The second version of the PEP was announced
on July 8. Van Rossum noted that the __match__() protocol
(for customizing a class's matching behavior)
had been removed at the behest of Daniel
Moisset, who was added as the sixth author of the PEP (with Van Rossum,
Brandt Bucher, Tobias Kohn, Ivan Levkivskyi, and Talin). The protocol is
not required for the feature; "postponing it will allow us to design it at a later
time when we have more experience with how `match` is being used
".
Moisset was added in part for his contribution of new text that "introduces the subject
matter much more gently than the
first version did
"
The other big change was to drop the leading dot for constants (e.g. .CONST) in the case statements. The problem addressed by that feature is that identifier strings in the patterns to be matched can either be a variable to be stored into if there is a match or a constant value to be looked up in order to match it (sometimes called "load-and-compare" values); Python cannot determine which it is without some kind of syntactical element or convention (e.g. all uppercase identifiers are constants). Consider the following example from the announcement:
USE_POLAR = "polar" USE_RECT = "rect"[...]
match t:
case (USE_RECT, real, imag):
return complex(real, imag)
case (USE_POLAR, r, phi):
return complex(r * cos(phi), r * sin(phi))
Python cannot distinguish the USE_RECT constant, which should cause
it to only match tuples with "rect" as the first element, from the
real and imag variables that should be filled in with the
values from the match. Since the original choice of prepending a dot to
the constants was quite unpopular, that was removed. It is a problem that
other languages with this kind of match have struggled with and
the PEP authors have as well; it was the
first issue in their bug tracker. Adding a sigil for the to-be-stored variables,
as has been suggested (e.g. ?real) "makes this common case ugly
and inconsistent
". In the end, they have decided to only allow
constants that come from a namespace:
If that part of the proposal is a "deal-breaker
", Van Rossum
said, then, when any
other problems have been resolved, that decision could be reconsidered. He
also outlined the other outstanding items, with the authors' position on
them:
Regarding the syntax for wildcards and OR patterns, the PEP explains why `_` and `|` are the best choices here: no other language surveyed uses anything but `_` for wildcards, and the vast majority uses `|` for OR patterns. A similar argument applies to class patterns.
That post predictably set off yet another mega-thread on various aspects of the new syntax. The alignment of else (if added) was one such topic; Stefano Borini suggested that the code to be executed should not be two levels of indentation in from the match, but be more like if statements. Glenn Linderman took that further:
match:
t
case ("rect", real, imag):
return complex(real, imag)
case ("polar", r, phi):
return complex( r* cos(phi), r*sin(phi)
else:
return None
He compared that syntax favorably with that of try/except/else blocks, in addition to it resolving the else-alignment question. The PEP authors had addressed the idea, noting that there are two possibilities: either the match expression is in its own single-statement block (as Linderman has it), which is unique in the language, or the match and its expression are introducing a block with a colon, but that block is not indented like every other block after a colon in Python.
match:
expression
case ...
# or
match expression:
case ...
Either would violate a longstanding expectation in Python, so the authors
rejected both possibilities.
Larry Hastings wondered about the special treatment being given to the "_" wildcard match. That symbol acts like a regular identifier, except in case statements, where it does not get bound (assigned to) for a match; it can also be used more than once in a case, which is not allowed for other match variables:
match x:
case (_, _): # match any tuple
print(_) # _ will not have a value
case (x, x): # ILLEGAL
Hastings argued that if the same variable can be used more than once and that underscore does get bound, the special case disappears. The cost is an extra store for the binding, which could be optimized away as a dead store if that was deemed important. Moisset pointed out a few technical hurdles, and Van Rossum added more, but also thought it really did not make sense to do the binding for a value that is being explicitly described as a "don't care":
The need for a wildcard pattern has already been explained -- we really want to disallow `Point(x, y, y)` but we really need to allow `Point(z, _, _)`. Generating code to assign the value to `_` seems odd given the clear intent to *ignore* the value.
Compelling case?
As he was with the first version, Shannon is far from convinced that Python
needs match and that the PEP actually lays out a compelling case
for it. He returned to that question in a mid-July
post that pointed out several flaws that he saw in the justification
for the feature.
In general, the examples in the PEP do not show any real-world uses of
the new syntax that he finds compelling.
As he said
in a followup message: "I worry that the PEP is treating pattern matching as an ideal which we
should be striving towards. That is a bad thing, IMO.
"
Others disagreed with that view, however. Kohn, one of the PEP authors who also participated in that short thread, started his own short thread with a different way to look at the feature. It is not, he said, adding a switch statement to Python; instead, it is providing a mechanism for doing function overloading in various guises:
Kohn went through a few detailed examples of function overloading and the visitor design
pattern, showing how the proposed match feature would bring
multiple benefits to the code. It is worth a read, especially for those
who may not fully see the scope of what the feature can do. On the
flipside, though, Shannon posted
a link to his lengthy
deconstruction of PEP 622, which was met with skepticism—at best.
Several responders felt that it was simply a rehash of various arguments
that had already been made. Van Rossum effectively dismissed
it entirely:
"[...] it just repeats Mark's own arguments, which are exclusively focused on the
examples in the PEP (it's as if Mark read nothing *but* the
examples)
".
But Shannon (and others) pointed out his analysis of code in the standard library, which is linked in the critique. He concluded that in the roughly 600,000 lines of Python code in CPython, there were only three or four examples of places where the match statement made things more clear. While Stephen J. Turnbull is in favor of the PEP, he agreed that Shannon's analysis of the standard library was useful. Unlike Shannon, Turnbull thought that most of the pattern-matching rewrites were more clear, but there were not many of them; however, there is still a missing piece:
To the steering council
At this point, the PEP is in the hands of the steering council, which could approve it as written, ask for changes, or reject it entirely. One suspects that an outright rejection would likely kill the idea forever, though some pieces of it could perhaps survive in other guises; Shannon had some suggestions along those lines.
If modification is requested, else seems like a logical addition, though there is no real consensus on how it should be aligned, either within the author group or the Python community at large. The requirement that constants be referred to via a namespace (e.g. color.RED) is another area that could draw attention from the council. Doing things that way requires that match variables not be referred to via a namespace, which has a few downsides, including the inability to assign to self.x in a match pattern. While it was deemed "ugly" (at best), the original idea of requiring constants to have a dot prepended to them (e.g. .RED) would at least remove that restriction on match variables.
The council is an unenviable position here; one suspects that Van Rossum knows how the members feel after the "PEP 572 mess" that led to his resignation as benevolent dictator for life (BDFL)—thus to the creation of the council. Contentious decisions are draining and the aftermath can be painful as well. In this case, the opposition does not seem as strong as it was to the "walrus operator" from PEP 572, but the change here is far more fundamental—and the effects are far-reaching.
The council has not really tipped its hand in the multiple, typically long, threads discussing the feature—most of its members have not participated at all. Clearly a lot of work has gone into the PEP, which may make it somewhat harder to reject outright. The opinion of the language's inventor and former BDFL also likely helps tip the scales somewhat in the PEP's favor; most of the community seemed to like the idea overall, while there were (lots of) quibbles about details. It would not be surprising to find that Python 3.10 (slated for October 2021) shows up with a match statement.
| Index entries for this article | |
|---|---|
| Python | Enhancements |
| Python | match statement |
| Python | Python Enhancement Proposals (PEP)/PEP 622 |
