|
|
Subscribe / Log in / New account

New features and other changes in Python 3.10

By Jake Edge
June 23, 2021

Python 3.10 is proceeding apace; everything looks to be on track for the final release, which is expected on October 4. The beta releases started in early May, with the first of those marking the feature-freeze for this version of the language. There are a number of interesting changes that are coming with Python 3.10, including what is perhaps the "headline feature": structural pattern matching.

As we did with Python 3.9, and Python 3.8 before it, taking a look at what is coming in the (now) yearly major release of the language has become something of a tradition here at LWN. The release notes that are compiled as part of the release process are invaluable in helping track all of the bits and pieces that make up the release. "What's New In Python 3.10" does not disappoint in that regard; those looking for more information about this release are encouraged to give it a look. In addition, we have covered some of these changes as they were discussed and developed over the last year or so.

Headlining

The structural pattern matching feature fills a longstanding hole that many have complained about along the way, but it also does a whole lot more than that. Python has never had a "switch" statement or its equivalent; programmers have relied on a series of if/elif/else blocks to handle the various values of a particular expression instead. But there have been proposals to add a switch statement going back at least 20 years.

A year ago, Python creator Guido van Rossum and few other folks resurrected the idea, but in a much more sweeping form. That led to multiple large threads on the python-dev mailing list, and to a second version of the Python Enhancement Proposal (PEP) for the feature. After the steering council looked, though, that original proposal became three PEPs (two informational) in October 2020, and two other competing PEPs were added into the mix. In February, the council decided to accept one of the three, PEP 634 ("Structural Pattern Matching: Specification"), along with its two companions. The other two PEPs were rejected.

The basic idea of the feature is that the value being "matched" (the new Python statement is match) can be unpacked in various ways, so that pieces of the object can be extracted. The example that probably gives the most "bang for the buck" comes from PEP 622 ("Structural Pattern Matching"):

def make_point_3d(pt):
    match pt:
        case (x, y):
            return Point3d(x, y, 0)
        case (x, y, z):
            return Point3d(x, y, z)
        case Point2d(x, y):
            return Point3d(x, y, 0)
        case Point3d(_, _, _):
            return pt
        case _:
            raise TypeError("not a point we support")

Perhaps the most unfamiliar piece of that example is the use of "_" as a "wildcard" (i.e. match anything), which was a major point of contention during the discussions of the feature. But the look of match is only really Pythonic if you squint ... hard. The case statements are unlike anything else in the language, really. If pt is a 2-tuple, the first case will be used and x will get the value of pt[0] and y will get pt[1].

The third and fourth cases are even weirder looking, but the intent should be reasonably clear: objects of those types (Point2d and Point3d) will be matched and the variables will be filled in appropriately. But the normal rules for reading Python are violated, which was another controversial part of the proposal; "case Point2d(x,y):" does not instantiate an object, instead it serves as a template for what is to be matched. The references to x and y in the case do not look up the values of those variables, rather they are used to specify the variables that get assigned from the unpacking.

There is a lot more to the match statement; those interested should dig into the PEPs for more information, or run the most recent beta (3.10.0b3 at the time of this writing) to try it out. There are also other parts of the syntax (and semantics) that are at least somewhat controversial; the LWN articles, and the mailing-list threads they point to, will help unravel those concerns as well.

Parsing and error reporting

One of the bigger changes that came with Python 3.9 was the new parsing expression grammar (PEG) parser for CPython. The PEG parser was added as the default in 3.9, but the existing LL(1) parser (with "hacks" to get around the one-token lookahead limitation) would remain as an option. In 3.10, that option has disappeared, along with the code for the older parser. In addition, the deprecated parser module has been removed.

Now that there is no requirement to stick (mostly) to LL(1) for CPython parsing, that opens up other possibilities for the syntax of the language. In a semi-prescient post as part of a discussion about the PEG-parser proposal in April 2020, Van Rossum suggested one possibility: "(For example, I've been toying with the idea of introducing a 'match' statement similar to Scala's match expression by making 'match' a keyword only when followed by an expression and a colon.)"

For 3.10, there is another example of a place where the new parser improves the readability of the language: multiple context managers can now be enclosed in parentheses. A longstanding enhancement request was closed in the process. Instead of needing to use the backslash continuation for multi-line with statements, they can be written as follows:

    with (open('long_file_name') as foo,
            open('yet_another_long_file_name') as bar,
	    open('somewhat_shorter_name') as baz):
        ...

Various error messages have been improved in this release as well. The SyntaxError exception has better diagnostic output in a number of cases, including pointing to the opening brace or parenthesis when the closing delimiter is missing, rather than pointing to the wrong location or giving the dreaded "unexpected EOF while parsing" message:

expected = {9: 1, 18: 2, 19: 2, 27: 3, 28: 3, 29: 3, 36: 4, 37: 4,
            38: 4, 39: 4, 45: 5, 46: 5, 47: 5, 48: 5, 49: 5, 54: 6,
some_other_code = foo()
Previous versions of the interpreter reported confusing places as the location of the syntax error:
File "example.py", line 3
    some_other_code = foo()
                    ^
SyntaxError: invalid syntax
but in Python 3.10 a more informative error is emitted:
File "example.py", line 1
    expected = {9: 1, 18: 2, 19: 2, 27: 3, 28: 3, 29: 3, 36: 4, 37: 4,
               ^
SyntaxError: '{' was never closed

That fix was inspired by similar error messages in PyPy. Several other syntax errors, for things like missing commas in dict or list literals, missing colons before blocks (e.g. after while, if, for, etc.), unparenthesized tuples as targets in comprehensions, missing colons in dict literals, and more, have all gotten revamped messages and indicators to make it easier to diagnose the problem. Beyond that, an IndentationError will indicate what kind of block was expecting the indentation, which should help track down a problem of that sort. The AttributeError and NameError exceptions will now give suggestions of similar names, under the assumption that the error is actually a typo; those suggestions are only given if PyErr_Display() is called, however, which is not the case for some alternate read-eval-print loops (REPLs), such as IPython.

Type hints

There are several upgrades to the feature introduced in PEP 484 ("Type Hints"). The most visible new feature will likely be the new union operator specified in PEP 604 ("Allow writing union types as X | Y"). As the title indicates, types can now be separated by the "|" operator to indicate that multiple types are accepted. The release notes show that it is a big upgrade in readability:

In previous versions of Python, to apply a type hint for functions accepting arguments of multiple types, typing.Union was used:
def square(number: Union[int, float]) -> Union[int, float]:
    return number ** 2
Type hints can now be written in a more succinct manner:
def square(number: int | float) -> int | float:
    return number ** 2

The operator can be used to "or" types in isinstance() and issubclass() calls as well. A new meta-type has been added with PEP 613 ("Explicit Type Aliases") so that static type-checkers and other programs can more easily distinguish type aliases from other module-level variables. As would be expected, the PEP gives lots of examples of the kinds of the problems the PEP is meant to solve, but the example in the release notes gives the general idea:

PEP 484 introduced the concept of type aliases, only requiring them to be top-level unannotated assignments. This simplicity sometimes made it difficult for type checkers to distinguish between type aliases and ordinary assignments, especially when forward references or invalid types were involved. Compare:
StrCache = 'Cache[str]'  # a type alias
LOG_PREFIX = 'LOG[DEBUG]'  # a module constant
Now the typing module has a special value TypeAlias which lets you declare type aliases more explicitly:
StrCache: TypeAlias = 'Cache[str]'  # a type alias
LOG_PREFIX = 'LOG[DEBUG]'  # a module constant

One related change that was planned for 3.10 has been put on the back burner, at least for now. When originally specified in PEP 3107 ("Function Annotations") and PEP 526 ("Syntax for Variable Annotations"), annotations were presented as a way to attach the value of an expression to function arguments, function return values, and variables. The intent was to associate type information that could be used by static type-checkers to those program elements.

Forward references in annotations and a few other problems led to PEP 563 ("Postponed Evaluation of Annotations"), which sought to delay the evaluation of the annotation values until they were actually being used. That new behavior was gated by a __future__ import, but was slated to become the default in 3.10, with no way to request the previous semantics. That would not change things for static type-checkers, which do their own parsing separate from CPython, but it was a rather large change for run-time users of the annotations.

There seems to have been an unspoken belief that run-time users of annotations would be rare—or even nonexistent. But, as the 3.10 alpha process proceeded, it became clear that the PEP 563 solution might not be the best way forward. In PEP 649 ("Deferred Evaluation Of Annotations Using Descriptors"), Larry Hastings pointed out a number of problems he saw with PEP 563 and offered an alternate solution. The maintainer of the pydantic data-validation library, which uses type annotations at run time, noted the problems he has encountered trying to support PEP 563; he implored the steering council to adopt PEP 649 in its stead.

While the council did not do that, it did put the brakes on making PEP 563 the default in order to give everyone some time (roughly a year until Python 3.11) to determine the best course without the time pressure of the imminent feature-freeze. In the meantime, though, the annotation oddities that Hastings noticed elsewhere in the language did get fixed so that annotations are now handled more consistently throughout Python.

Other bits

There are plenty of other features, fixes, and changes coming in the new version. The release notes show a rather eye-opening body of work for roughly a year's worth of development. For example, one of the most basic standard types, int, had a new method added in 3.10: int.bit_count() gives the "population count" of the integer, which is the number of ones in the binary representation of its absolute value. Discussion of this "micro-feature" goes back to 2017.

The pair of security vulnerabilities from February that led to fast-tracked releases for all of the supported Python versions have, naturally, been fixed in 3.10. The fix for the buffer overflow when converting floating-point numbers to strings was not mentioned in the release notes, presumably because it is not exactly highly visible—the interpreter simply no longer crashes. The second vulnerability led to a change in the urllib.parse module that users may need to pay attention to.

Back in the pre-HTML5 days, two different characters were allowed in URLs for separating query parameters: ";" and "&". HTML5 restricts the separator character to only be "&", but urllib.parse did not change until a web-cache poisoning vulnerability was reported in January. Now, only a single separator is supported for urllib.parse.parse_qs() and urllib.parse.parse_qsl(), which default to "&". Those changes also affect cgi.parse() and cgi.parse_multipart() because they use the urllib functions. In addition, urllib.parse has been fixed to remove carriage returns, newlines, and tabs from URLs in order to avoid certain kinds of attacks.

Another security change is described in PEP 644 ("Require OpenSSL 1.1.1 or newer"). From 3.10 onward, the CPython standard library will require OpenSSL version 1.1.1 or higher in order to reduce the maintenance burden on the core developers. OpenSSL is used by the hashlib, hmac, and ssl modules in the standard library. Maintaining support for multiple older versions of OpenSSL (earlier Pythons support OpenSSL 1.0.2, 1.1.0, and 1.1.1) combined with various distribution-specific choices in building OpenSSL has led to something of a combinatorial explosion in the test matrix. In addition, the other two versions are no longer getting updates; OpenSSL 1.1.1 is a long-term support release, which is slated to be supported until September 2023.

One feature that did not make the cut for the language is indexing using keywords, which is an oft-requested feature that has now presumably been laid to rest for good. The basic idea is to apply the idea of keyword function arguments to indexing:

    print(matrix[row=4, col=17])
    some_obj[1, 2, a=43, b=47] = 23
    del(grid[x=1, y=0, z=0])

The most recent incarnation is PEP 637 ("Support for indexing with keyword arguments") but the idea (and lengthy discussions) have gone back to 2014—at least. The steering council rejected the PEP in March; "fundamentally we do not believe the benefit is great enough to outweigh the cost of the new syntax". The PEP will now serve as a place for people to point the next time the idea crops up on python-ideas or elsewhere; unless there are major changes in the language or use cases, there may well be no need to discuss the idea yet again. That is part of the value of rejected PEPs, after all.

The future

Development is already in progress for Python 3.11; in fact, there is already a draft of the "what's new" document for the release. It can be expected in October 2022. With luck, it will come with major CPython performance improvements. It will likely also come with the exception groups feature that we looked at back in March; the feature was postponed to 3.11 in mid-April. In addition, of course, there will be lots of other changes, fixes, features, and such, both for 3.11 and for the much nearer 3.10 release. Python marches on.


Index entries for this article
PythonReleases


to post comments

HTML query parameters

Posted Jun 23, 2021 6:46 UTC (Wed) by epa (subscriber, #39769) [Link] (1 responses)

Back in the pre-HTML5 days, two different characters were allowed in URLs for separating query parameters: ";" and "&”. HTML5 restricts the separator character to only be "&”
Oh, that’s news to me. I had been thinking of the ; as the “new-style” parameter separator (much more readable when embedded in HTML) and looking forward to the day when it would be used everywhere. Oh well…

HTML query parameters

Posted Jun 23, 2021 7:36 UTC (Wed) by excors (subscriber, #95769) [Link]

I think the article (and the discussion on bugs.python.org) is wrong - HTML5 has neither rules nor opinions on what separator you use. That's a private decision between your web page and your web server. (The security vulnerability comes when the "web server" is multiple components (like a caching proxy and a Python script) that have differing interpretations of separators, and interoperability between server components is way outside HTML's scope.)

As far as I can see, the only thing HTML5 has is an algorithm for encoding application/x-www-form-urlencoded data (which uses only '&' separators) for <form> submission, and related specs have the URLSearchParams() API which decodes the query string as application/x-www-form-urlencoded, and that's about it. And the URL spec says "The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices", so it's hardly a recommendation for using this '&' syntax. Outside of <form> you're free to encode URL parameters however you want, you just need to make sure you implement a matching decoder.

match using "case Point2d(x,y):"

Posted Jun 24, 2021 3:13 UTC (Thu) by douglasbagnall (subscriber, #62736) [Link] (4 responses)

"case Point2d(x,y):" does not instantiate an object, instead it serves as a template for what is to be matched. The references to x and y in the case do not look up the values of those variables, rather they are used to specify the variables that get assigned from the unpacking.

This description made me realise the parallel to “def Point2d(x,y):”, which would also not instantiate a Point2d object, and would also not look up x and y, rather using them to unpack the arguments.

match using "case Point2d(x,y):"

Posted Jun 24, 2021 18:30 UTC (Thu) by samlh (subscriber, #56788) [Link] (1 responses)

Good insight!

Rust takes this a step further:

Rust's match statements and function parameter syntax are (intentionally) quite tightly aligned - they are both "patterns".

The difference is that function parameter bindings must be irrefutable (always match), while match statement arms don't have to always match.

https://doc.rust-lang.org/reference/expressions/match-exp...
> A match expression branches on a pattern. The exact form of matching that occurs depends on the pattern.

https://doc.rust-lang.org/reference/items/functions.html#...
> As with let bindings, function parameters are irrefutable patterns, so any pattern that is valid in a let binding is also valid as an argument.

https://doc.rust-lang.org/reference/patterns.html#refutab...
> A pattern is said to be refutable when it has the possibility of not being matched by the value it is being matched against. Irrefutable patterns, on the other hand, always match the value they are being matched against.

match using "case Point2d(x,y):"

Posted Jul 1, 2021 13:02 UTC (Thu) by bluss (guest, #47454) [Link]

Python 2.x had tuple unpacking in arguments, but it has been removed since then. https://www.python.org/dev/peps/pep-3113/

match using "case Point2d(x,y):"

Posted Jun 24, 2021 23:35 UTC (Thu) by JanC_ (guest, #34940) [Link] (1 responses)

Careful what you say about “not instantiating a Point2d object”:
 >>> def Point2d(x,y):
 ...    return (x,y)
 ...
 >>> isinstance(Point2d, object)
 True
🧐

match using "case Point2d(x,y):"

Posted Jun 25, 2021 1:35 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

Explainer for those who don't Python at a sufficiently high level:

Functions are first-class objects (as is everything else that has a name, except for keywords). Every object is an instance of the class "object," so writing isinstance(foo, object) is just a fancy way of writing True. In fact, as far as Python is concerned, Point2d is "just" a regular variable that happens to be callable. The same is true of classes and even imported modules (except, of course, that modules are usually not callable).

As it happens, all classes are also instances of the "type" class (it can't be called the "class" class because "class" is a reserved word). This means that you get the following, perhaps surprising, behavior:

>>> isinstance(object, object)
True
>>> isinstance(type, type)
True
>>> isinstance(type, object)
True
>>> isinstance(object, type)
True

Obviously this can't be literally true, as one of type or object had to exist first, so they can't both have been instantiated from each other (or from themselves, for that matter). But since they're both built-in classes, Python cheats and statically initializes them. This works because, ultimately, they're all "just" PyTypeObject instances (i.e. structs) at the C level, and so you can manually set all the fields to the correct values and call it good.

Parsing Expression Grammar (PEG) parser for CPython

Posted Jun 25, 2021 10:03 UTC (Fri) by jnareb (subscriber, #46500) [Link] (1 responses)

What about the fact that PEG parsers supposedly do not parse well defined grammar, but some random subset of it?

See "Parsing: a timeline": section "2004: PEG" (https://jeffreykegler.github.io/personal/timeline_v3#h1-2...),
and "PEG: Ambiguity, precision and confusion" (http://jeffreykegler.github.io/Ocean-of-Awareness-blog/in...)
though one needs to take what is written there with some care, as it is an article from the point of view
of the author of competing parsing algorithm - the table-driven approach based on the prior work
of Jay Earley, Joop Leo, John Aycock and R. Nigel Horspool.

Parsing Expression Grammar (PEG) parser for CPython

Posted Jun 25, 2021 11:20 UTC (Fri) by t-v (guest, #112111) [Link]

I think there are several parts:
  • The core complaint appears to be that it is possible to create grammars with surprising behaviour. While indeed that is a drawback, I think the opportunities of hitting this in Python's grammar are somewhat limited. Also, from what I understood, the ("aa" | "a") "a" vs. ("a" | "aa") "a" example might not apply to Python because it actually fully backtracks. From GvR's article on PEG parsers:
    So how does a PEG parser solve these annoyances? By using an infinite lookahead buffer! The typical implementation of a PEG parser uses something called “packrat parsing”, which not only loads the entire program in memory before parsing it, but also allows the parser to backtrack arbitrarily. While the term PEG primarily refers to the grammar notation, the parsers generated from PEG grammars are typically recursive-descent parsers with unlimited backtracking, and packrat parsing makes this efficient by memoizing the rules already matched for each position.
  • I would venture that the author's use of "random" appears misplaced in a theoretic argument. Opaque, it might be, but random is just creating emotion.


Copyright © 2021, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds