A revamped Python string-formatting proposal
The proposal to add a more general facility for string formatting to Python, which we looked at in August 2024, has changed a great deal since, so it merits another look. The changes take multiple forms: a new title for PEP 750 ("Template Strings"), a different mechanism for creating and using templates, a new Template type to hold them, and several additional authors for the PEP. Meanwhile, one controversial part of the original proposal, lazy evaluation of the interpolated values, has been changed so that it requires an explicit opt-in (via lambda); template strings are a generalization of f-strings and lazy evaluation was seen by some as a potentially confusing departure from their behavior.
There are a wide variety of use cases for template strings; the previous title of the PEP referred to creating domain-specific languages using them. Obvious examples are safely handling SQL queries or HTML output with user-supplied input. The PEP also has an example with two different approaches to structured logging using template strings.
Template strings use a character tag before their opening quote that modify the way they are interpreted, much like f-strings do, though there are two main differences. The first is the character tag used to denote them; instead of "f", template strings use a "t". A more fundamental difference is that template strings (also known as "t-strings" for obvious reasons) do not return a string, as f-strings do, but instead return a Template object:
name = 'world'
fstr = f'Hello, {name}' # fstr is "Hello, world"
tstr = t'Hello, {name}' # tstr is an object of type Template
One of the complaints about the original proposal was that it would have allowed arbitrary function names as tags on a string. Given that people would likely want to use short tags, that would tend to pollute the program namespace with short function names. It would have also precluded adding any other tags to Python down the road; currently, the language has others, such as r"" for raw strings and b"" for byte strings. Had the earlier proposal been adopted, no others could ever be added since some program might be using it for its template strings.
The PEP has been revised several times since we covered it in August; it was updated twice in that original thread in mid-October, on October 17 and then a few days later. That turned an already lengthy thread into something approaching a mega-thread, so the most recent update was posted in its own thread in mid-November. Even there, the PEP has continued to evolve based on suggestions and comments; the version at the time of this writing is covered below, but it could change further before it gets formally proposed to the steering council.
Template
The immutable Template type contains two tuples, one each for the static (strings) and interpolated (interpolations) parts of the string. Entries in the interpolations tuple are Interpolation objects, which store the expression to be evaluated, name in the example above, and its value ('world' from above), along with any format specifications given for the interpolation. Those are the conversion function to be used (e.g. 'r' for repr()) and any output-format specifier (e.g. '.2f' for rounding to two places). The following example, adapted from the PEP, demonstrates how it is meant to work:
name = "World"
template = t"Hello {name}"
assert template.strings[0] == "Hello "
assert template.interpolations[0].expr == "name"
assert template.interpolations[0].value == "World"
assert template.strings[1] == ""
The interpolations tuple will have an entry per interpolation site in the template, so it may be empty. The strings tuple will have one entry per interpolation site plus an extra; empty strings will be used for places where there is no static data, such as the end of the string or between two interpolation sites with no spaces between them. For example:
a = b = 2
template = t"{a} + {b} = {a+b}"
assert template.strings[0] == ""
assert template.interpolations[0].expr == "a"
assert template.interpolations[0].value == 2
assert template.strings[1] == " + "
assert template.interpolations[1].expr == "b"
assert template.interpolations[1].value == 2
assert template.strings[2] == " = "
assert template.interpolations[2].expr == "a+b"
assert template.interpolations[2].value == 4
assert template.strings[3] == ""
The two tuples are meant to be processed in alternating fashion, starting with the first element of strings. An easy way to do that is to iterate over the template; the class provides an __iter__() method so template objects can be used in for loops, for example. It will return strings and interpolations in the order they appear in the template, without any empty strings to enforce the alternation. There is also a values() convenience method that returns a tuple of the value attributes of all interpolations in the template.
The programmer can then provide any sort of template processing that they want by creating a function which operates on a Template. For example, an html() function could provide input sanitizing and allow adding HTML attributes via a dictionary, neither of which is possible using f-strings:
evil = "<script>alert('evil')</script>"
template = t"<p>{evil}</p>"
assert html(template) ==
"<p><script>alert('evil')</script></p>"
attributes = {"src": "shrubbery.jpg", "alt": "looks nice"}
template = t"<img {attributes} />"
assert html(template) ==
'<img src="shrubbery.jpg" alt="looks nice" />'
The PEP has a section that shows how f-strings
could be implemented using template strings. As with most of the
examples, it uses the match-based
processing that the PEP authors see as "the expected
best practice for many template function implementations
". The
skeleton of that is as follows:
for item in template:
match item:
case str() as s:
... # handle each string part
case Interpolation() as interpolation:
... # handle each interpolation
Originally, the expression for an Interpolation was not evaluated until the template-processing function called a getvalue() method, which was a form of lazy evaluation. In contrast, the interpolations for an f-string are evaluated when it is. Lazy evaluation was removed from the proposal back in October, because of that behavioral difference. Most people, including the PEP authors, think that f-strings should be the starting point for understanding template strings. Template-processing functions can be written to do their own form of lazy evaluation, as the PEP describes; if the function is written to handle a callable as an interpolation, a lambda can be used as the expression. Similarly, asynchronous evaluation can be supported for template strings.
Filters
There has been a lot of discussion over the last six months or so, but
there is a sense that most of the objections and suggestions have been
handled in one form or another at this point. One that was passed over was
Larry Hasting's strong
desire for adding a filter mechanism. He noted that several Python
template libraries, including Jinja, Django
Template Language, and Mako, all have the concept of a
filter and, interestingly, all use the pipe ("|") symbol for
filters. The basic idea is to be able to process the strings in an
interpolation by feeding them to expressions or functions that modify them.
A classic use case in the existing libraries is to escape
HTML in the interpolated string. He said that it would be a
"misstep
" not to include filter syntax in the PEP.
Guido van Rossum, who is one of the PEP authors, disagreed
with using "|", since it already has established meanings in
Python expressions (bitwise or and set union). The interpolations are
already Python expressions, however, so filtering "should be part of the
expression syntax
". He suggested: "If you want a filter
operator it should be a proposal for an additional expression operator, not
just in the context of t-strings.
"
Hastings pointed out that the operator is already overloaded, but acknowledged some ambiguity when using it unadorned in an interpolation expression. He had other ideas for ways to use the pipe symbol, but Van Rossum continued to push back:
If we're looking for a filter operator that doesn't conflict with expressions, we could extend !r, !s, and !a with !identifier, keeping the original three as shorthands for !repr, !str and !ascii, respectively.That would work in f-strings too. But I would recommend making that a separate PEP.
Hastings seemed
to like that idea, but had some follow-up questions. Van Rossum wryly
pointed
out: "Those are all excellent questions for the team working on that
PEP — not for me nor for the PEP 750 team. :)
" On the other hand,
Jinja maintainer David Lord said
that he is "not convinced I would add filter syntax if I was rewriting
Jinja today
"; it has caused confusion with regard to operator
precedence and the readability gains are minimal.
Next steps
On January 16, Dave Peck, posted
the latest updated version of the PEP. Over the past few months, Peck has
been doing the editing on the PEP, as well as being one of the more active
participants, among the PEP authors, in the discussions. The next day,
another PEP author, Paul Everitt, thought
that it was likely that the time had come to "start the process of getting on the steering council's radar, as we've integrated multiple rounds of review and feedback
"
PEP 750 is an extensive proposal that has seen a lot of effort, some of it going back well beyond the PEP itself. Beyond just the specification, the PEP contains examples, patterns for processing templates, an extensive collection of rejected ideas, and more. These days, Python has no shortage of string-formatting tools, both in the language itself and in libraries and frameworks of various sorts, but the PEP authors clearly see value in another. Before too long, we will presumably see if the steering council shares their enthusiasm.
| Index entries for this article | |
|---|---|
| Python | Python Enhancement Proposals (PEP)/PEP 750 |
| Python | Strings |
