When and how to evaluate Python annotations

By Jake Edge
June 9, 2021

Annotations in Python came late to the party; they were introduced in Python 3 as a way to attach information to functions describing their arguments and return values. While that mechanism had obvious applications for adding type information to Python functions, standardized interpretations for the annotations came later with type hints. But evaluating the annotations at function-definition time caused some difficulties, especially with respect to forward references to type names, so a Python Enhancement Proposal (PEP) was created to postpone their evaluation until they were needed. The PEP-described behavior was set to become the default in the upcoming Python 3.10 release, but that is not to be; the postponement of evaluation by default has itself been postponed in the hopes of unwinding things.

History

It is, as might be guessed, a bit of a tangle, which will require some backstory in order to fully understand things. In 2006, PEP 3107 ("Function Annotations") was adopted for Python 3 to allow the value of arbitrary expressions to be associated with a function's arguments and its return value. The __annotations__ dictionary associated with the function would then contain the data. An example from the PEP is instructive, even if it is somewhat contrived:

For example, the following annotation:
def foo(a: 'x', b: 5 + 6, c: list) -> max(2, 9):
    ...
would result in an __annotations__ mapping of
{'a': 'x',
 'b': 11,
 'c': list,
 'return': 9}
The return key was chosen because it cannot conflict with the name of a parameter; any attempt to use return as a parameter name would result in a SyntaxError.

The interpretation of the values associated by annotations was specifically left out of the PEP: "[...] this PEP makes no attempt to introduce any kind of standard semantics, even for the built-in types. This work will be left to third-party libraries." That all changed with PEP 484 ("Type Hints"), which was adopted in 2015 for Python 3.5; it added a typing module to the standard library to provide "a standard vocabulary and baseline tools" for type annotations. It did not alter other uses of annotations, nor did it add any run-time type checking, it simply standardized the type information for use by static type-checkers. Variable annotations came the following year with PEP 526 ("Syntax for Variable Annotations"), which appeared in Python 3.6.

Some problems were encountered using these annotations, however. For one, forward references to types that have not yet been defined requires using string literals, instead of type names:

For example, the following code (the start of a simple binary tree implementation) does not work:
class Tree:
    def __init__(self, left: Tree, right: Tree):
        self.left = left
        self.right = right
To address this, we write:
class Tree:
    def __init__(self, left: 'Tree', right: 'Tree'):
        self.left = left
        self.right = right
The string literal should contain a valid Python expression (i.e., compile(lit, '', 'eval') should be a valid code object) and it should evaluate without errors once the module has been fully loaded.

Another problem is that evaluating these annotations requires computation, so all programs pay the price of the annotations, even if they are never needed. In addition, type checking was not meant to be done at run time, but potentially complex type annotations will be evaluated every time a module gets imported, which slows things down for no real gain.

The problems brought about PEP 563 ("Postponed Evaluation of Annotations"). Its goal, as the name would imply, was to defer the evaluation of the annotations until they were actually needed; instead, they would be stored in __annotations__ as strings. For static type-checkers, there should be no difference, since they are processing the source code anyway, but if the annotations are needed at run time, they will need to be evaluated—that's where the problems start cropping up.

For code that uses type hints, the typing.get_type_hints() function is meant to be used to return the evaluated annotations for any object; any code using the hints at run time should be making that call already, "since a type annotation can be expressed as a string literal". For other uses of annotations, users are expected to call eval() on the string stored in __annotations__. But there is a wrinkle; both of those functions optionally take global and local namespace parameters:

In both cases it's important to consider how globals and locals affect the postponed evaluation. An annotation is no longer evaluated at the time of definition and, more importantly, in the same scope where it was defined. Consequently, using local state in annotations is no longer possible in general. As for globals, the module where the annotation was defined is the correct context for postponed evaluation.

The switch to deferred evaluations was gated behind a __future__ import:

    from __future__ import annotations

The feature was available starting with Python 3.7 and the plan was to switch to deferred evaluations as the default behavior for Python 3.10, which is due in October. But in mid-January, Larry Hastings found some inconsistencies in the ways __annotations__ were handled for different types of objects (functions, classes, and modules) while he was working on a new PEP to address some of the problems encountered with PEP 563.

The PEP would eventually get turned into PEP 649 ("Deferred Evaluation Of Annotations Using Descriptors"). In it, Hastings listed a number of problems with the PEP 563 feature. By requiring Python implementations to turn annotations (which, syntactically, can be any valid expression) into strings, PEP 563 introduces difficulties that spread beyond CPython:

It requires Python implementations to stringize their annotations. This is surprising behavior—unprecedented for a language-level feature. Also, adding this feature to CPython was complicated, and this complicated code would need to be reimplemented independently by every other Python implementation.
[...]
It adds an ongoing maintenance burden to Python implementations. Every time the language adds a new feature available in expressions, the implementation's stringizing code must be updated in tandem in order to support decompiling it.

But there are problems even strictly within the CPython ecosystem that PEP 649 also describes. PEP 563 necessitates a code change everywhere that annotations are going to be used. The use of eval() is problematic because it is slow, but it is also unavailable in some contexts because it has been removed for space reasons. All annotations are evaluated at module-level scope and cannot refer to local or class variables. In addition, evaluating the annotations on a class requires a reference to the class's global symbols, "which PEP 563 suggests should be done by looking up that class by name in sys.modules". That too is surprising for a language-level feature like annotations.

Overall, Hasting's analysis, coupled with other problems noted with annotations and PEP 563, gives the appearance of features that had rough edges, with fixes that filed them off, only to be left with more rough (or perhaps even sharp) edges. Annotations were bolted onto the language, then type hints bolted onto annotations, with deferred evaluation added, perhaps somewhat hastily, to fix the forward-reference problems that were introduced by type hints. But now, after several releases as an opt-in feature, deferred evaluation is slated to become the only behavior supported, with no way to opt-out for those who never chose to opt-in. It is all something of a tangle that seems to need some unsnarling.

Lazy evaluation

Hastings's solution in PEP 649 is to defer the evaluation of the annotations, but to effectively replace them with a function, rather than a string as PEP 563 does. That function would evaluate and return the annotations as a dictionary, while storing the result. It would do so in the same scope as the annotations were made, neatly sidestepping all of the weird scoping and namespace corner (and not-so-corner) cases that arise with PEP 563.

In this new approach, the code to generate the annotations dict is written to its own function which computes and returns the annotations dict. Then, __annotations__ is a "data descriptor" which calls this annotation function once and retains the result. This delays the evaluation of annotations expressions until the annotations are examined, at which point all circular references have likely been resolved. And if the annotations are never examined, the function is never called and the annotations are never computed.

PEP 649 would add a __co_annotations__ attribute to objects that would hold a callable object. The first time __annotations__ is accessed, __co_annotations__() is called, its return value is assigned to __annotations__, and the value of __co_annotations__ is set to None. All of that is described in the PEP, including pseudocode. PEP 649 would be gated behind its own __future__ import (co_annotations), but the idea is that it would replace the behavior of PEP 563, which would eventually be deprecated and removed.

In general, the response was favorable toward the PEP back in January. Guido van Rossum, who originated type hints back when he was the benevolent dictator of the language, seemed favorably disposed toward the idea and suggested some further refinements. Others, such as PEP 563 author Łukasz Langa, also expressed support: "I like the clever lazy-evaluation of the __annotations__ using a pre-set code object."

Typing-only annotations?

For several months, though, that's where things stood. Despite a prodding or two, Hastings did not post a second version of PEP 649 until April 11. That was pretty late in the Python 3.10 schedule, which had its first beta (thus feature freeze) coming on May 3. Van Rossum had given up on PEP 649 in the interim. In fact, he seemed to be ready to restrict annotations to types:

Nevertheless I think that it's time to accept that annotations are for types -- the intention of PEP 3107 was to experiment with different syntax and semantics for types, and that experiment has resulted in the successful adoption of a specific syntax for types that is wildly successful.

But annotations have been part of the language for a long time, with other use cases beyond just type hints for static type-checkers; Hastings wondered why it made sense to remove them now:

I'm glad that type hints have found success, but I don't see why that implies "and therefore we should restrict the use of annotations solely for type hints". Annotations are a useful, general-purpose feature of Python, with legitimate uses besides type hints. Why would it make Python better to restrict their use now?

For Van Rossum, though, "typing is, to many folks, a Really Important Concept", but using the same syntax for type information and, generally undefined, "other stuff" is confusing. Hastings is not convinced that type hints are so overwhelmingly important that they should trump other uses:

I'm not sure I understand your point. Are you saying that we need to take away the general-purpose functionality of annotations, that's been in the language since 3.0, and restrict annotations to just type hints... because otherwise an annotation might not be used for a type hint, and then the programmer would have to figure out what it means? We need to take away the functionality from all other use cases in order to lend clarity to one use case?

Van Rossum said: "Yes, that's how I see it." But he was unhappy with how Hastings's effort to change things has played out:

[...] the stringification of annotations has been in the making a long time, with the community's and the SC's [steering council's] support. You came up with a last-minute attempt to change it, using the PEP process to propose to *revert* the decision already codified in PEP 563 and implemented in the master branch. But you've waited until the last minute (feature freeze is in three weeks) and IMO you're making things awkward for the SC (who can and will speak for themselves).

There are Python libraries that use the type annotations at run time, but some have run aground on supporting the deferred evaluations as described in PEP 563. Since it looked like PEP 563 would become the default (with no way to preserve the existing behavior) for 3.10, the developers behind some of those libraries got a bit panicky. That resulted in a strident post (and GitHub issue) from Samuel Colvin, who maintains the pydantic data-validation library. Colvin lists a bunch of other bugs that have come up while trying to support PEP 563 in pydantic, noting:

The reasons are complicated but basically typing.get_type_hints() doesn't work all the time and neither do the numerous hacks we've introduced to try and get fix it. Even if typing.get_type_hints() was faultless, it would still be massively slower than the current semantics or PEP 649 [...]
In short - pydantic doesn't work very well with postponed annotations, perhaps it never will.

While the tone of Colvin's post was deemed over-dramatic and perhaps somewhat divisive, it turns out that others have encountered some of the same kinds of problems. Paul Ganssle pointed to the difficulties supporting the feature in the attrs package (which provided much of the inspiration for the dataclasses feature added in Python 3.7) as an example of how things may be going awry. He suggested a path forward as well:

[...] I wouldn't be surprised if PEP 563 is quietly throwing a spanner in the works in several other places as well), my vote is to leave PEP 563 opt-in until at least 3.11 rather than try to rush through a discussion on and implementation of PEP 649.

There was more discussion, along the way, including Langa's look at "PEP 563 in light of PEP 649" and Hastings's ideas on finding a compromise position that was meant to try to find a way to please both "camps". There were also side discussions on duck typing, the Python static-typing ecosystem, and more. But mostly, it seemed that folks were just marking time awaiting a pronouncement from the steering council.

Postponement

That came on April 20. Given the timing, the nature of the problems, and the importance of not locking the language into behavior that might not be beneficial long-term, it probably was not much of a surprise that the council "kicked the can down the road" a bit. It decided to postpone making the PEP 563 behavior the default until Python 3.11 at the earliest. It deferred PEP 649 or any other alternatives as well.

There was an assumption that type annotations would only be consumed by static type-checkers, Thomas Wouters said on behalf of the council, but: "There are clearly existing real-world, run-time uses of type annotations that would be adversely affected by this change." The existing users of pydantic (which includes the popular FastAPI web framework) would be impacted by the change, but there are also likely to be uses of annotations at run time that have not yet come to light. The least disruptive option is to roll back to the Python 3.9 behavior, but:

We need to continue discussing the issue and potential solutions, since this merely postpones the problem until 3.11. (For the record, postponing the change further is not off the table, either, for example if the final decision is to treat evaluated annotations as a deprecated feature, with warnings on use.)
For what it’s worth, the SC is also considering what we can do to reduce the odds of something like this happening again, but that’s a separate consideration, and a multi-faceted one at that.

For an optional feature, support for static typing and type hints has poked its nose into other parts of the language over the past five years or so. It was fairly easy to argue that the general-purpose annotations feature could be used to add some static-typing support, but we may be getting to the point where providing better support for typing means deprecating any other uses of annotations, which is not something that seems particularly Pythonic. If typing is to remain optional, and proponents are adamant that it will, other longstanding features should not have to be sacrificed in order to make the optional use case work better.

Collectively taking a deep breath and stepping back to consider possible alternatives is obviously the right approach here. Perhaps some compromise can be found so that all existing users of annotations—especially fringe uses whose developers may be completely unaware that there is even a change on the horizon—can be accommodated. That kind of outcome would be best for everyone concerned, of course. Taking the pressure off for a year or so might just provide enough space to make that happen.

[I would like to thank Salvo Tomaselli for giving us a "heads up" about this topic.]

Index entries for this article
Python	Annotations
Python	Python Enhancement Proposals (PEP)/PEP 563
Python	Python Enhancement Proposals (PEP)/PEP 649

When and how to evaluate Python annotations

Posted Jun 15, 2021 6:53 UTC (Tue) by cpitrat (subscriber, #116459) [Link] (7 responses)

> There are clearly existing real-world, run-time uses of type annotations that would be adversely affected by this change.

Unless I missed something, none are mentioned in the article. Were they mentioned in the discussion? I wonder what these usages are.

When and how to evaluate Python annotations

Posted Jun 15, 2021 23:49 UTC (Tue) by timrichardson (subscriber, #72836) [Link] (5 responses)

I think you missed the discussion in the article of pydantic, which is a dependency of FastAPI. There were some others mentioned, but pydantic was the headline.

When and how to evaluate Python annotations

Posted Jun 16, 2021 6:53 UTC (Wed) by cpitrat (subscriber, #116459) [Link] (2 responses)

No I didn't. I just pasted the wrong part of the article. Silly me!

What I wanted to ask is whether there's really any usage of annotations for something else than typing, so questioning rather the objections to this part:
"restrict annotations to just type hints"

When and how to evaluate Python annotations

Posted Jun 16, 2021 11:18 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)

I remember seeing a macro system based on it. No idea how serious it was or how prevalent, but I really don't like the precedent of "here's an abstract ball, do what you want with it however you see it" then later the developers coming in and saying "so we said it was an abstract ball, but we're making it a bowling ball now; sorry to anyone who was playing basketball with it before, but you're going to have a bad time".

When and how to evaluate Python annotations

Posted Jun 16, 2021 11:31 UTC (Wed) by cpitrat (subscriber, #116459) [Link]

Yeah I get that. I'm quite curious about other applications and especially about whether they can mix with typing. I assume not, which by itself makes having annotations abstract in the first place a dubious choice.

When and how to evaluate Python annotations

Posted Jun 17, 2021 4:36 UTC (Thu) by jamesh (guest, #1159) [Link] (1 responses)

From a quick look at the docs, pydantic is treating annotations as types: the same as static type checkers. The same seems to go for attrs. So they don't seem to be examples of using annotations for non-typing purposes.

When and how to evaluate Python annotations

Posted Nov 28, 2022 17:06 UTC (Mon) by agarbanzo (guest, #162411) [Link]

I think there's a lot of use cases stemming from PEP 593: https://peps.python.org/pep-0593/. It provides a way to attach arbitrary metadata which is ignored by static type checkers. So pretty much by definition any use of PEP 593 is going to be non-typing related. The PEP even gives an example use case: https://peps.python.org/pep-0593/#combining-runtime-and-s...

When and how to evaluate Python annotations

Posted Aug 31, 2021 16:10 UTC (Tue) by tanriol (guest, #131322) [Link]

The `argh` argument parsing libary uses function argument annotations as per-argument documentation when generating the help output. Maybe there are some other "minor" uses in the ecosystem.