Custom string formatters in Python
Python has had formatted string literals (f-strings), a syntactic shorthand for building strings, since 2015. Recently, Jim Baker, Guido van Rossum, and Paul Everitt have proposed PEP 750 ("Tag Strings For Writing Domain-Specific Languages") which would generalize and expand that mechanism to provide Python library writers with additional flexibility. Reactions to the proposed change were somewhat positive, although there was a good deal of discussion of (and opposition to) the PEP's inclusion of lazy evaluation of template parameters.
The proposal
In Python (since version 3.6), programmers can write f-strings to easily interpolate values into strings:
name = "world"
print(f"Hello, {name}") # Prints "Hello, world"
This is an improvement on the previous methods for string interpolation, because it makes it clear exactly what is being inserted into the string in each location. F-strings do still have some drawbacks, though. In particular, since the expressions inside braces are evaluated when the string is evaluated, they're not suitable for more complex templating. They also make it easier to write some kinds of security bugs — it's tempting to use them to make SQL queries, even though doing so can make code more susceptible to SQL-injection attacks. The PEP aims to fix both of these, by allowing people to use arbitrary functions as string "tags", taking the place of the "f" in f-strings. For example, it would be possible to write a safe sql() function that could be invoked like this:
name = "O'Henry"
# Calls sql().
# The function inserts 'name' the query properly escaped.
query = sql"SELECT * FROM Users WHERE name = {name}"
print(query)
# Prints "SELECT * FROM Users WHERE name = 'O''Henry'".
Other examples of potential uses include automatically escaping strings in the correct way for other formats (shell commands, URLs, regular expressions, etc.), building custom DSLs that can be embedded in Python programs, or partially replacing the use of templating libraries like Jinja.
The proposed syntax works by calling any function used as a string tag with a sequence of arguments representing fragments of the string and interpolation sites. These are values that implement new Decoded and Interpolation protocols for string components and interpolations, respectively. In the example above, sql() would be called with two arguments: one for the first part of the string, and then a second for the interpolation itself. The function is then free to interpret these values in whatever way it likes, and return an arbitrary object (such as, for example, a compiled regular expression). In particular, the expressions inside braces in the affected string aren't evaluated ahead of time, but are instead evaluated when the function calls the .getvalue() method of an Interpolation object — which the function only needs to call if it needs the value. Interpolation objects also include the original expression as a string, and the optional conversion function or format specification if they were provided.
def example(*args):
# This example assumes it will receive a tag string
# that starts with a string and contains exactly one
# interpolation
string_part = args[0] # "Value: "
interpolation = args[1] # 2 + 3
# It can reference the original text, and ask for the value
# to be computed.
return f"{string_part}{interpolation.expr} = {interpolation.getvalue()}"
print(example"Value: {2 + 3}")
# Prints "Value: 2 + 3 = 5"
This does, however, lead to some surprising outcomes, such as making it possible to write code that depends on the assignment of a variable after the place where the tag string is defined. This example shows how that could work, as well as demonstrating the PEP's recommended method to deal with a sequence of Decoded and Interpolation values with a match statement:
class Delayed:
def __init__(self, *args):
self.args = args
def __str__(self):
result = ""
for a in self.args:
match a:
case Decoded() as decoded:
result += decoded
case Interpolation() as interpolation:
result += str(interpolation.getvalue())
return result
name = 'Alice'
fstring = f'My name is {name}' # Always Alice
delayed = Delayed'My name is {name}'
for name in ['Bob', 'Charlie']:
print(delayed)
# Prints 'My name is Bob' and 'My name is Charlie'
The PEP describes this behavior as lazy evaluation — although unlike true lazy evaluation in languages like Haskell, the library author does need to explicitly ask for the value to be calculated. Despite the potentially unintuitive consequences, lazy evaluation is a feature that the PEP's authors definitely want to see included in the final version, because of the additional flexibility it allows. Specifically, since a tag function could call .getvalue() zero times, one time, or multiple times, a library author could come up with clever uses that aren't possible with an f-string.
Discussing naming
The
discussion of the change started with Steve Dower
expressing concern that this might cause people to fill up the
namespace with short function names, which would be a problem for
readability. Dower suggested that perhaps the syntax should be changed to have a
single generic string tag that converts a format string into a piece of
structured data that could be operated on by a normal function. "It's not
quite as syntactic sugar, but it's more extensible and safer to use.
"
Everitt clarified that the PEP does not propose adding any tag functions to the built-ins or standard library, but acknowledged the point. Baker pointed out that there was no syntactical reason that tag names could not contain dots, which would help with the namespace-pollution problem, though that might be undesirable for other reasons.
If dotted names are going to be allowed, "we should just allow any expression
",
Brandt Bucher
said, noting that the same evolution
has already occurred for decorators.
Unfortunately, that's not possible, Pablo Galindo Salgado
explained, because the Python lexer would have no way to know whether it
should switch into the f-string parsing mode (which would be used for all tag
strings) without information from the parser.
The current syntax does invite some appealing syntactic sugar, however. Paul Moore gave this example, of a "decimal literal" (for the arbitrary-precision arithmetic decimal library), a frequently requested feature:
from decimal import Decimal as D
dec_num = D"2.71828"
Josh Cannon pointed out that the syntax means Python will never get another string prefix, implying that it would make any such changes backward-incompatible. Matt Wozniski agreed, and stated the drawback explicitly. Wozniski later said that Python's existing string prefixes cannot all be implemented in terms of the proposed tag strings.
Discussing semantics
Eric Traut was the first to raise concerns with the lazy evaluation of expressions being interpolated, pointing out that it could potentially lead to surprising and unintuitive consequences. For example, the Delayed class shown above. Traut suggested asking programmers to explicitly write lambda: when they want lazy evaluation:
tag"Evaluated immediately: {2 + 3}; Not evaluated immediately: {lambda: name}"
Charlie Marsh agreed with Traut, that introducing lazy evaluation would be unintuitive, given that it's a change from how f-strings work. Several people objected to the idea of requiring users to opt into lazy evaluation, though, especially since the suggestion of just using lambda expressions would be fairly verbose. Cornelius Krupp pointed out that the intended use case was to define domain-specific languages — so the expectation is that specific uses will vary, and it's up to callers of a particular tag function to read its documentation, just like with any library function. Moore went so far as to say:
I'll be honest, I haven't really gone through the proposal trying to come up with real world use cases, and as a result I don't yet have a feel for how much value lazy evaluation adds, but I am convinced that no-one will ever use lambda as "explicit lazy evaluation".
Another user asked how lazy evaluation interacts with other Python features such as await and yield — which surprisingly worked inside f-strings without a problem when I tried them. Nobody had a compelling answer, although Dower pointed out another particularly pernicious problem for lazy evaluation: how should it interact with context managers (such as locks)? Allowing unmarked lazy evaluation could make for difficult-to-debug problems. Finally, interactions with type checking are a concern, since lazy evaluation interacts with scoping.
David Lord, a Jinja maintainer,
questioned the entire justification for lazy
evaluation. Several times during the discussion, people asserted that lazy
evaluation was necessary for implementing template languages like Jinja, but
Lord didn't think tag strings were actually sufficient for templating. Jinja
templates are often rendered multiple times with different values, which lazy evaluation
doesn't make easy — the variables at the definition site of the template
would need to be changed. Plus, Jinja already has to parse the limited subset of
Python that can be used in its templates; if it has to keep parsing the
templates in order to handle variable substitutions, lazy evaluation doesn't
help with that. Moore
agreed that the proposal would not be a suitable replacement for Jinja as it
stands, and that he wished "the PEP had some better examples of actual use cases, though.
"
Everitt
shared that the PEP authors chose to keep the more complex examples out of the PEP
for simplicity, but "I think we chose wrong.
" He provided
a link to a
lengthy tutorial on building an HTML template system using the proposed
functionality.
Baker
added that being able to start sending the start of a template before fully
evaluating its latter sections was an important use case for
web-server gateway interface (WSGI)
or
asynchronous-server gateway interface (ASGI) applications. He
also said that delayed evaluation let logging libraries easily
do the right thing by only taking the time to evaluate the interpolations if
the message would actually be logged.
Discussing backticks
Trey Hunner saw how useful the change could be, but, as someone who teaches Python to new users, he was worried that the syntax would be hard to explain or search for.
If backticks or another symbol were used instead of quotes, I could imagine beginners searching for 'Python backtick syntax'. But the current syntax doesn't have a simple answer to 'what should I type into Google/DDG/etc. to look this up'.
The suggestion to use backticks was interesting, Everitt said, noting that JavaScript uses backticks for similar features. Baker said that they had considered backticks, but thought that matching the existing f-string syntax would be more straightforward. Simon Saint-André also said that backticks can be hard to type on non-English keyboards. Dower noted that backticks have traditionally been banned from Python feature proposals due to the difficulty in distinguishing ` from '.
Discussing history
Alyssa Coghlan shared a detailed comparison between this proposal and her work on PEP 501 ("General purpose template string literals"). In short, Coghlan is not convinced that tag strings offer significantly more flexibility than template string literals as proposed in PEP 501, but she does acknowledge that the PEP 750 variety has a more lightweight syntax. Other participants in the discussion were not convinced that simpler syntax was worth the tradeoff. Coghlan has started collecting ideas from the discussion that might apply equally well to PEP 501.
Barry Warsaw, who worked on early Python internationalization efforts, likewise compared the new PEP to existing practice. He pointed out a few ways that tag strings were unsuitable for internationalization, before saying:
I have to also admit to cringing at the syntax, and worry deeply about the cognitive load this will add to readability . I think it will also be difficult to reason about such code. I don't think the PEP addresses this sufficiently in the "How to Teach This" section. And I think history bears out my suspicion that you won't be able to ignore this new syntax until you need it, because if it's possible to do, a lot of code will find clever new ways to use it and the syntax will creep into a lot of code we all end up having to read.
Baker summarized the most important feedback into a set of possible changes to the proposal, including adopting some ideas from PEP 501, and using decorators to make it easier to write tag functions that typecheckers can understand. Coghlan approved of the changes, although at time of writing other commenters have not weighed in. It remains to be seen how the PEP will evolve from here.
| Index entries for this article | |
|---|---|
| Python | Python Enhancement Proposals (PEP)/PEP 750 |
| Python | Strings |
