New features and other changes in Python 3.10
Python 3.10 is proceeding apace; everything looks to be on track for the final release, which is expected on October 4. The beta releases started in early May, with the first of those marking the feature-freeze for this version of the language. There are a number of interesting changes that are coming with Python 3.10, including what is perhaps the "headline feature": structural pattern matching.
As we did with Python 3.9, and Python 3.8 before it, taking a look at what is coming in the (now) yearly major release of the language has become something of a tradition here at LWN. The release notes that are compiled as part of the release process are invaluable in helping track all of the bits and pieces that make up the release. "What's New In Python 3.10" does not disappoint in that regard; those looking for more information about this release are encouraged to give it a look. In addition, we have covered some of these changes as they were discussed and developed over the last year or so.
Headlining
The structural pattern matching feature fills a longstanding hole that many have complained about along the way, but it also does a whole lot more than that. Python has never had a "switch" statement or its equivalent; programmers have relied on a series of if/elif/else blocks to handle the various values of a particular expression instead. But there have been proposals to add a switch statement going back at least 20 years.
A year ago, Python creator Guido van Rossum and few other folks resurrected the idea, but in a much more
sweeping form. That led to multiple large threads on the python-dev
mailing list, and to a second version of the
Python Enhancement Proposal (PEP) for the feature. After the steering
council looked, though, that original proposal became three PEPs (two informational) in
October 2020, and two other competing PEPs
were added into the mix. In February, the council decided to accept one of the three, PEP 634
("Structural Pattern
Matching: Specification
"), along with its two companions. The other
two PEPs were rejected.
The basic idea of the feature is that the value being "matched" (the new
Python statement is match) can be unpacked in various ways, so
that pieces of the object can be extracted. The example that probably gives
the most "bang for the buck" comes from PEP 622
("Structural Pattern Matching
"):
def make_point_3d(pt): match pt: case (x, y): return Point3d(x, y, 0) case (x, y, z): return Point3d(x, y, z) case Point2d(x, y): return Point3d(x, y, 0) case Point3d(_, _, _): return pt case _: raise TypeError("not a point we support")
Perhaps the most unfamiliar piece of that example is the use of "_" as a "wildcard" (i.e. match anything), which was a major point of contention during the discussions of the feature. But the look of match is only really Pythonic if you squint ... hard. The case statements are unlike anything else in the language, really. If pt is a 2-tuple, the first case will be used and x will get the value of pt[0] and y will get pt[1].
The third and fourth cases are even weirder looking, but the intent should be reasonably clear: objects of those types (Point2d and Point3d) will be matched and the variables will be filled in appropriately. But the normal rules for reading Python are violated, which was another controversial part of the proposal; "case Point2d(x,y):" does not instantiate an object, instead it serves as a template for what is to be matched. The references to x and y in the case do not look up the values of those variables, rather they are used to specify the variables that get assigned from the unpacking.
There is a lot more to the match statement; those interested should dig into the PEPs for more information, or run the most recent beta (3.10.0b3 at the time of this writing) to try it out. There are also other parts of the syntax (and semantics) that are at least somewhat controversial; the LWN articles, and the mailing-list threads they point to, will help unravel those concerns as well.
Parsing and error reporting
One of the bigger changes that came with Python 3.9 was the new parsing expression grammar (PEG) parser for CPython. The PEG parser was added as the default in 3.9, but the existing LL(1) parser (with "hacks" to get around the one-token lookahead limitation) would remain as an option. In 3.10, that option has disappeared, along with the code for the older parser. In addition, the deprecated parser module has been removed.
Now that there is no requirement to stick (mostly) to LL(1) for CPython
parsing, that opens up other possibilities for the syntax of the
language. In a semi-prescient
post as part of a discussion about the PEG-parser proposal in April 2020,
Van Rossum suggested one possibility: "(For
example, I've been toying with the idea of introducing a 'match' statement
similar to Scala's match expression by making 'match' a keyword only when
followed by an expression and a colon.)
"
For 3.10, there is another example of a place where the new parser improves the readability of the language: multiple context managers can now be enclosed in parentheses. A longstanding enhancement request was closed in the process. Instead of needing to use the backslash continuation for multi-line with statements, they can be written as follows:
with (open('long_file_name') as foo, open('yet_another_long_file_name') as bar, open('somewhat_shorter_name') as baz): ...
Various error
messages have been improved in this release as well. The
SyntaxError exception has better diagnostic output in a number of
cases, including pointing to the opening brace or parenthesis when the
closing delimiter is missing, rather than pointing to the wrong location or
giving the dreaded "unexpected EOF while parsing
" message:
expected = {9: 1, 18: 2, 19: 2, 27: 3, 28: 3, 29: 3, 36: 4, 37: 4, 38: 4, 39: 4, 45: 5, 46: 5, 47: 5, 48: 5, 49: 5, 54: 6, some_other_code = foo()Previous versions of the interpreter reported confusing places as the location of the syntax error:File "example.py", line 3 some_other_code = foo() ^ SyntaxError: invalid syntaxbut in Python 3.10 a more informative error is emitted:File "example.py", line 1 expected = {9: 1, 18: 2, 19: 2, 27: 3, 28: 3, 29: 3, 36: 4, 37: 4, ^ SyntaxError: '{' was never closed
That fix was inspired by similar error messages in PyPy. Several other syntax errors, for things like missing commas in dict or list literals, missing colons before blocks (e.g. after while, if, for, etc.), unparenthesized tuples as targets in comprehensions, missing colons in dict literals, and more, have all gotten revamped messages and indicators to make it easier to diagnose the problem. Beyond that, an IndentationError will indicate what kind of block was expecting the indentation, which should help track down a problem of that sort. The AttributeError and NameError exceptions will now give suggestions of similar names, under the assumption that the error is actually a typo; those suggestions are only given if PyErr_Display() is called, however, which is not the case for some alternate read-eval-print loops (REPLs), such as IPython.
Type hints
There are several
upgrades to the feature introduced in PEP 484
("Type Hints
"). The most visible new feature will likely be
the new union operator specified in PEP 604
("Allow writing union types as X | Y
"). As the title
indicates, types can now be separated by the "|" operator to
indicate that multiple types are accepted. The release notes show that it
is a big upgrade in readability:
In previous versions of Python, to apply a type hint for functions accepting arguments of multiple types, typing.Union was used:def square(number: Union[int, float]) -> Union[int, float]: return number ** 2Type hints can now be written in a more succinct manner:def square(number: int | float) -> int | float: return number ** 2
The operator can be used to "or" types in isinstance()
and issubclass()
calls as well. A new meta-type has been added with PEP 613
("Explicit Type Aliases
") so that static type-checkers and
other programs can more easily distinguish type aliases from other
module-level variables. As would be expected, the PEP gives lots of
examples of the kinds of the problems the PEP is meant to solve, but
the example in the release notes gives the general idea:
PEP 484 introduced the concept of type aliases, only requiring them to be top-level unannotated assignments. This simplicity sometimes made it difficult for type checkers to distinguish between type aliases and ordinary assignments, especially when forward references or invalid types were involved. Compare:StrCache = 'Cache[str]' # a type alias LOG_PREFIX = 'LOG[DEBUG]' # a module constantNow the typing module has a special value TypeAlias which lets you declare type aliases more explicitly:StrCache: TypeAlias = 'Cache[str]' # a type alias LOG_PREFIX = 'LOG[DEBUG]' # a module constant
One related change that was planned for 3.10 has been put on the back burner, at least for
now. When originally specified in PEP 3107
("Function Annotations
") and PEP 526
("Syntax for Variable Annotations
"), annotations were
presented as a way to attach the value of an expression to function
arguments, function return values, and variables. The intent was to
associate type information that could be used by static type-checkers to
those program elements.
Forward references in annotations and a few other problems led to PEP 563
("Postponed Evaluation of Annotations
"), which sought to delay
the evaluation of the annotation values until they were actually being
used. That new behavior was gated by a __future__ import, but was
slated to become the default in 3.10, with no way to request the
previous semantics. That would not change things for static type-checkers, which
do their own parsing separate from CPython, but it was a rather large
change for run-time users of the annotations.
There seems to have been an unspoken belief
that run-time users of annotations would be rare—or even nonexistent. But,
as the 3.10 alpha process proceeded, it became clear that the
PEP 563 solution might not be the best way forward. In PEP 649
("Deferred Evaluation Of Annotations Using Descriptors
"), Larry Hastings pointed out
a number of problems he saw with PEP 563 and offered an alternate
solution. The maintainer of the pydantic
data-validation library, which uses type annotations at run time, noted
the problems he has encountered trying to support PEP 563; he implored
the steering council to adopt PEP 649 in its stead.
While the council did not do that, it did put the brakes on making PEP 563 the default in order to give everyone some time (roughly a year until Python 3.11) to determine the best course without the time pressure of the imminent feature-freeze. In the meantime, though, the annotation oddities that Hastings noticed elsewhere in the language did get fixed so that annotations are now handled more consistently throughout Python.
Other bits
There are plenty of other features, fixes, and changes coming in the new version. The release notes show a rather eye-opening body of work for roughly a year's worth of development. For example, one of the most basic standard types, int, had a new method added in 3.10: int.bit_count() gives the "population count" of the integer, which is the number of ones in the binary representation of its absolute value. Discussion of this "micro-feature" goes back to 2017.
The pair of security vulnerabilities from February that led to fast-tracked releases for all of the supported Python versions have, naturally, been fixed in 3.10. The fix for the buffer overflow when converting floating-point numbers to strings was not mentioned in the release notes, presumably because it is not exactly highly visible—the interpreter simply no longer crashes. The second vulnerability led to a change in the urllib.parse module that users may need to pay attention to.
Back in the pre-HTML5 days, two different characters were allowed in URLs for separating query parameters: ";" and "&". HTML5 restricts the separator character to only be "&", but urllib.parse did not change until a web-cache poisoning vulnerability was reported in January. Now, only a single separator is supported for urllib.parse.parse_qs() and urllib.parse.parse_qsl(), which default to "&". Those changes also affect cgi.parse() and cgi.parse_multipart() because they use the urllib functions. In addition, urllib.parse has been fixed to remove carriage returns, newlines, and tabs from URLs in order to avoid certain kinds of attacks.
Another security change is described in PEP 644
("Require OpenSSL 1.1.1 or newer
"). From 3.10 onward,
the CPython standard library will require OpenSSL version 1.1.1 or higher in
order to reduce the maintenance burden on the core developers. OpenSSL is used
by the hashlib,
hmac, and
ssl
modules in the standard library. Maintaining
support for multiple older versions of OpenSSL (earlier Pythons support
OpenSSL 1.0.2, 1.1.0, and 1.1.1) combined with various
distribution-specific choices in building OpenSSL has led to something of a
combinatorial explosion in the test matrix. In addition, the other
two versions are no longer getting updates; OpenSSL 1.1.1 is a long-term support
release, which is slated to be supported until September 2023.
One feature that did not make the cut for the language is indexing using keywords, which is an oft-requested feature that has now presumably been laid to rest for good. The basic idea is to apply the idea of keyword function arguments to indexing:
print(matrix[row=4, col=17]) some_obj[1, 2, a=43, b=47] = 23 del(grid[x=1, y=0, z=0])
The most recent incarnation is PEP 637
("Support for indexing with keyword arguments
") but the idea
(and lengthy discussions) have gone back to 2014—at least. The
steering council rejected
the PEP in March; "fundamentally we do not believe the benefit is great
enough to outweigh the cost of the new syntax
". The PEP will now
serve as a place for people to point the next time the idea crops up on
python-ideas or elsewhere; unless there are major changes in the language
or use cases, there may well be no need to discuss the idea yet again. That is
part of the value of rejected PEPs, after all.
The future
Development is already in progress for Python 3.11; in fact, there is already a draft of the "what's new" document for the release. It can be expected in October 2022. With luck, it will come with major CPython performance improvements. It will likely also come with the exception groups feature that we looked at back in March; the feature was postponed to 3.11 in mid-April. In addition, of course, there will be lots of other changes, fixes, features, and such, both for 3.11 and for the much nearer 3.10 release. Python marches on.
Index entries for this article | |
---|---|
Python | Releases |
Posted Jun 23, 2021 6:46 UTC (Wed)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Jun 23, 2021 7:36 UTC (Wed)
by excors (subscriber, #95769)
[Link]
As far as I can see, the only thing HTML5 has is an algorithm for encoding application/x-www-form-urlencoded data (which uses only '&' separators) for <form> submission, and related specs have the URLSearchParams() API which decodes the query string as application/x-www-form-urlencoded, and that's about it. And the URL spec says "The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices", so it's hardly a recommendation for using this '&' syntax. Outside of <form> you're free to encode URL parameters however you want, you just need to make sure you implement a matching decoder.
Posted Jun 24, 2021 3:13 UTC (Thu)
by douglasbagnall (subscriber, #62736)
[Link] (4 responses)
This description made me realise the parallel to “
Posted Jun 24, 2021 18:30 UTC (Thu)
by samlh (subscriber, #56788)
[Link] (1 responses)
Rust takes this a step further:
Rust's match statements and function parameter syntax are (intentionally) quite tightly aligned - they are both "patterns".
The difference is that function parameter bindings must be irrefutable (always match), while match statement arms don't have to always match.
https://doc.rust-lang.org/reference/expressions/match-exp...
https://doc.rust-lang.org/reference/items/functions.html#...
https://doc.rust-lang.org/reference/patterns.html#refutab...
Posted Jul 1, 2021 13:02 UTC (Thu)
by bluss (guest, #47454)
[Link]
Posted Jun 24, 2021 23:35 UTC (Thu)
by JanC_ (guest, #34940)
[Link] (1 responses)
Posted Jun 25, 2021 1:35 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link]
Functions are first-class objects (as is everything else that has a name, except for keywords). Every object is an instance of the class "object," so writing isinstance(foo, object) is just a fancy way of writing True. In fact, as far as Python is concerned, Point2d is "just" a regular variable that happens to be callable. The same is true of classes and even imported modules (except, of course, that modules are usually not callable).
As it happens, all classes are also instances of the "type" class (it can't be called the "class" class because "class" is a reserved word). This means that you get the following, perhaps surprising, behavior:
>>> isinstance(object, object)
Obviously this can't be literally true, as one of type or object had to exist first, so they can't both have been instantiated from each other (or from themselves, for that matter). But since they're both built-in classes, Python cheats and statically initializes them. This works because, ultimately, they're all "just" PyTypeObject instances (i.e. structs) at the C level, and so you can manually set all the fields to the correct values and call it good.
Posted Jun 25, 2021 10:03 UTC (Fri)
by jnareb (subscriber, #46500)
[Link] (1 responses)
See "Parsing: a timeline": section "2004: PEG" (https://jeffreykegler.github.io/personal/timeline_v3#h1-2...),
Posted Jun 25, 2021 11:20 UTC (Fri)
by t-v (guest, #112111)
[Link]
HTML query parameters
Back in the pre-HTML5 days, two different characters were allowed in URLs for separating query parameters: ";" and "&”. HTML5 restricts the separator character to only be "&”
Oh, that’s news to me. I had been thinking of the ; as the “new-style” parameter separator (much more readable when embedded in HTML) and looking forward to the day when it would be used everywhere. Oh well…
HTML query parameters
match using "case Point2d(x,y):"
"
case Point2d(x,y):
" does not instantiate an object, instead it serves as a template for what is to be matched. The references to x and y in the case do not look up the values of those variables, rather they are used to specify the variables that get assigned from the unpacking.
def Point2d(x,y):
”, which would also not instantiate a Point2d object, and would also not look up x and y, rather using them to unpack the arguments.
match using "case Point2d(x,y):"
> A match expression branches on a pattern. The exact form of matching that occurs depends on the pattern.
> As with let bindings, function parameters are irrefutable patterns, so any pattern that is valid in a let binding is also valid as an argument.
> A pattern is said to be refutable when it has the possibility of not being matched by the value it is being matched against. Irrefutable patterns, on the other hand, always match the value they are being matched against.
match using "case Point2d(x,y):"
Careful what you say about “not instantiating a Point2d object”:
match using "case Point2d(x,y):"
>>> def Point2d(x,y):
... return (x,y)
...
>>> isinstance(Point2d, object)
True
🧐
match using "case Point2d(x,y):"
True
>>> isinstance(type, type)
True
>>> isinstance(type, object)
True
>>> isinstance(object, type)
True
Parsing Expression Grammar (PEG) parser for CPython
and "PEG: Ambiguity, precision and confusion" (http://jeffreykegler.github.io/Ocean-of-Awareness-blog/in...)
though one needs to take what is written there with some care, as it is an article from the point of view
of the author of competing parsing algorithm - the table-driven approach based on the prior work
of Jay Earley, Joop Leo, John Aycock and R. Nigel Horspool.
I think there are several parts:
Parsing Expression Grammar (PEG) parser for CPython
So how does a PEG parser solve these annoyances? By using an infinite lookahead buffer! The typical implementation of a PEG parser uses something called “packrat parsing”, which not only loads the entire program in memory before parsing it, but also allows the parser to backtrack arbitrarily. While the term PEG primarily refers to the grammar notation, the parsers generated from PEG grammars are typically recursive-descent parsers with unlimited backtracking, and packrat parsing makes this efficient by memoizing the rules already matched for each position.