Formalizing f-strings
Python's formatted strings, or "f-strings", came relatively late to the language, but have become a popular feature. F-strings allow a compact representation for the common task of interpolating program data into strings, often in order to output them in some fashion. Some restrictions were placed on f-strings to simplify the implementation of them, but those restrictions are not really needed anymore and, in fact, are complicating the CPython parser. That has led to a Python Enhancement Proposal (PEP) to formalize the syntax of f-strings for the benefit of Python users while simplifying the maintenance of the interpreter itself.
Some history
F-strings got their start in 2015, when PEP 498 ("Literal String Interpolation") was accepted for Python 3.6. The PEP added a new way to specify strings that would have values interpolated into them:
answer = 42
reply = f'Obviously, the answer is {answer}'
# reply is now "Obviously, the answer is 42"
But f-strings are far more than just that, because any arbitrary Python
expression can be placed between the curly brackets; the expression will be
evaluated and its value interpolated into the string:
reply = f'More answers: { [ answer+i for i in range(5) ] }'
# "More answers: [42, 43, 44, 45, 46]"
The expression can, of course, contain other strings, including f-strings,
but PEP 498 imposed a limitation due to the (then) existing
parser. Whatever type of quote was used to start the f-string could not be
used inside the expression portions (i.e. the parts inside curly
brackets). So, simply cutting and pasting code into an f-string may not
work:
foo = a['x'] + a['y']
f'{foo}' # works, of course
f'{a['x']}' # fails with SyntaxError
f'{a["x"]}' # workaround
The current implementation for f-strings simply used the existing machinery for handling other kinds of specialized strings, such as r'' for raw strings or b'' for byte strings. But f-strings are fundamentally different from the others because of the arbitrary expressions that are allowed. The advent of a new CPython parser for Python 3.9 in 2020 opened up some other possibilities for implementing f-strings.
In 2021, Pablo Galindo Salgado posted
to the python-dev mailing list
that he was working on moving the parsing of f-strings into the CPython
parser. That would mean some of the restrictions could potentially be
removed and "we can drop a considerable amount of
hand-written code
". He was asking for opinions about the idea and on
the various options for
restrictions that could be lifted. That resulted in a fairly brief
discussion (by
Python standards at least) that was generally favorable toward the idea.
At the 2022 Python Language Summit in April, Galindo Salgado gave a presentation on the idea, which was greeted with enthusiasm from various core developers, including Eric V. Smith who developed f-strings and authored PEP 498. So Galindo Salgado teamed up with Batuhan Taskaya and Lysandros Nikolaou to create PEP 701 ("Syntactic formalization of f-strings"), which was announced on the Python discussion forum in mid-December.
PEP 701
The new PEP sets out "to lift some of the restrictions originally formulated
in PEP 498 and to provide a formalized grammar for f-strings that can be
integrated into the parser directly
". It notes that the restrictions
were set to be removed in PEP 536 ("Final Grammar
for Literal String Interpolation") but that PEP, written in 2016, has
never been implemented and was deferred
in 2019. In addition to removing the restriction
on reusing the f-string quote delimiter within expressions, as mentioned above,
the new PEP would dispense with a few other restrictions: escape
sequences using backslashes would be permitted in the expressions as would
comments in
multi-line f-strings. The following examples would work as expected if
PEP 701 gets adopted:
>>> a = [ 'hello', 'world' ]
>>> f'{"\n".join(a)}'
File "<stdin>", line 1
f'{"\n".join(a)}'
^
SyntaxError: f-string expression part cannot include a backslash
>>> f'''foo {
... bar # a comment about bar
... }'''
File "<stdin>", line 3
}'''
^
SyntaxError: f-string expression part cannot include '#'
PEP 701 points out that other languages (such as Ruby, JavaScript, Swift,
and C#) that have string interpolation mechanisms allowing expressions also
allow arbitrary nesting of said expressions. The current limitations are
more or less just annoyances, but they are unnecessary—removing them would also
substantially simplify the code that parses f-strings.
The main objection to the PEP centers around the ability to reuse the quotes within the expression, and the arbitrary nesting it allows, due to the ability to abuse the feature in various ways. Steven D'Aprano took exception to two of the examples given in the PEP:
f"These are the things: {", ".join(things)}" f"{source.removesuffix(".py")}.c: $(srcdir)/{source}" [...]The first two might be perfectly understandable to the parser, but as a human reader, they make it more complicated and error-prone to work out which quotes delimit the f-string and which do not.[...] I consider the first two examples terrible code which should be discouraged and the fact that your PEP allows it is a point against it, not in favour.
Especially since we can get the same effect by just changing one of the pairs of quotes to '. So in this regard, the PEP doesn't even add functionality. It just encourages people to write code which is harder to read and more error prone.
Galindo Salgado acknowledged
those concerns and started
a poll to try to gauge the sentiments of the participants in the
discussion. Currently, the poll is around two-thirds in favor of allowing
quote reuse in the expressions. Paul Moore said
that he had voted in favor because he had encountered the problem along the
way. "And
I think consistency in allowing whatever can be in an expression is easier
to explain and understand.
" He did suggest that the PEP add a warning
about overusing the feature, however.
Barry Warsaw agreed
with D'Aprano
that the "join()" example was "challenging for me to
parse
", but he can see the consistency argument as well. "But maybe
for consistency, the answer should be to let people write terrible,
unreadable code!
" Galindo Salgado pointed
out that there is a cost beyond inconsistency, though:
[...] limiting quote reuse raises quite a lot the complexity. This is because when parsing the expression part now the parser needs to be aware that is parsing an expression inside an f-string with a given [quote], and that becomes even more tricky when f-strings are nested with different quotes.This doesn't mean that this invalidates the "code smell" argument by any means: I just want to give some context on the maintenance point.
Mark Shannon thought that adding the quote-reuse restriction back in would be irritating:
Personally, I found the prohibition on reusing the same quote mildly annoying. f"You have the following in your basket: {", ".join(items)}." seems perfectly fine to me.But I think I would find the restriction much more annoying if I knew that it was unnecessary, and that extra work had been put in just to stop me.
There are some horrific f-string abuses that are already possible, as
Marc-André Lemburg demonstrated. "Should we
really head on in the same direction even more ?
" He supports the PEP
because it simplifies the implementation and "removes some annoying bits
(e.g. the backslash limitation)
", but keeping the current quote
restriction removes the possibility of arbitrary nesting of f-strings,
which is a good thing in his mind.
"Shashwat" suggested
that the possibility of misuse is not a good enough reason to disallow
quote reuse, thus nesting, in the parser; "such restrictions belong in
linters and code formatters rather than in the language grammar
itself
". Lemburg agreed
with that point, but also thought the PEP should be changed to discourage
that usage. "At the moment, it reads in a way which promotes reusing the
same quotes inside f-string expressions as a feature.
"
There were also concerns expressed about difficulties in doing syntax highlighting in editors and other tools, but the consensus seems to be that those tools can and will be taught to handle the quote reuse. In fact, any of those tools that support multiple languages have probably already had to deal with this issue since its existence is fairly widespread. In light of the poll and the discussion, the PEP authors decided to keep the lifting of the quote-reuse restriction, though a new, lengthy "Considerations regarding quote reuse" section was added to the PEP. The thread also contained some detailed discussion of the guts of the implementation for CPython and other Python dialects. The results of that have been incorporated as well.
Much of the argument around quote reuse seems to boil down to readability, which is highly subjective. But unreadable code can (and will) be written, even if people differ in their view of which particular constructs fail their criteria. As we saw in the discussions about None-aware operators, readability is often simply in the eye of the beholder.
The discussion has pretty much wound down at this point, so it would not be a surprise to see the PEP make its way to the steering council for pronouncement before long, which means it could be coming in Python 3.12 in October. It seems a foregone conclusion that the idea of formalizing f-strings and replacing the hand-written parser code will be accepted; that will reduce the code maintenance and could lead to better error messages for f-strings, like those that have been added elsewhere for CPython. The council could perhaps require that the quote constraint be retained, but that seems unlikely given the general reception; discouraging abuses of the feature via the PEP and various tools may well be enough.
| Index entries for this article | |
|---|---|
| Python | Language specification |
| Python | Python Enhancement Proposals (PEP)/PEP 701 |
| Python | Strings |
