Formalizing f-strings
Python's formatted strings, or "f-strings", came relatively late to the language, but have become a popular feature. F-strings allow a compact representation for the common task of interpolating program data into strings, often in order to output them in some fashion. Some restrictions were placed on f-strings to simplify the implementation of them, but those restrictions are not really needed anymore and, in fact, are complicating the CPython parser. That has led to a Python Enhancement Proposal (PEP) to formalize the syntax of f-strings for the benefit of Python users while simplifying the maintenance of the interpreter itself.
Some history
F-strings got their start in 2015, when PEP 498 ("Literal String Interpolation") was accepted for Python 3.6. The PEP added a new way to specify strings that would have values interpolated into them:
answer = 42 reply = f'Obviously, the answer is {answer}' # reply is now "Obviously, the answer is 42"But f-strings are far more than just that, because any arbitrary Python expression can be placed between the curly brackets; the expression will be evaluated and its value interpolated into the string:
reply = f'More answers: { [ answer+i for i in range(5) ] }' # "More answers: [42, 43, 44, 45, 46]"The expression can, of course, contain other strings, including f-strings, but PEP 498 imposed a limitation due to the (then) existing parser. Whatever type of quote was used to start the f-string could not be used inside the expression portions (i.e. the parts inside curly brackets). So, simply cutting and pasting code into an f-string may not work:
foo = a['x'] + a['y'] f'{foo}' # works, of course f'{a['x']}' # fails with SyntaxError f'{a["x"]}' # workaround
The current implementation for f-strings simply used the existing machinery for handling other kinds of specialized strings, such as r'' for raw strings or b'' for byte strings. But f-strings are fundamentally different from the others because of the arbitrary expressions that are allowed. The advent of a new CPython parser for Python 3.9 in 2020 opened up some other possibilities for implementing f-strings.
In 2021, Pablo Galindo Salgado posted
to the python-dev mailing list
that he was working on moving the parsing of f-strings into the CPython
parser. That would mean some of the restrictions could potentially be
removed and "we can drop a considerable amount of
hand-written code
". He was asking for opinions about the idea and on
the various options for
restrictions that could be lifted. That resulted in a fairly brief
discussion (by
Python standards at least) that was generally favorable toward the idea.
At the 2022 Python Language Summit in April, Galindo Salgado gave a presentation on the idea, which was greeted with enthusiasm from various core developers, including Eric V. Smith who developed f-strings and authored PEP 498. So Galindo Salgado teamed up with Batuhan Taskaya and Lysandros Nikolaou to create PEP 701 ("Syntactic formalization of f-strings"), which was announced on the Python discussion forum in mid-December.
PEP 701
The new PEP sets out "to lift some of the restrictions originally formulated
in PEP 498 and to provide a formalized grammar for f-strings that can be
integrated into the parser directly
". It notes that the restrictions
were set to be removed in PEP 536 ("Final Grammar
for Literal String Interpolation") but that PEP, written in 2016, has
never been implemented and was deferred
in 2019. In addition to removing the restriction
on reusing the f-string quote delimiter within expressions, as mentioned above,
the new PEP would dispense with a few other restrictions: escape
sequences using backslashes would be permitted in the expressions as would
comments in
multi-line f-strings. The following examples would work as expected if
PEP 701 gets adopted:
>>> a = [ 'hello', 'world' ] >>> f'{"\n".join(a)}' File "<stdin>", line 1 f'{"\n".join(a)}' ^ SyntaxError: f-string expression part cannot include a backslash >>> f'''foo { ... bar # a comment about bar ... }''' File "<stdin>", line 3 }''' ^ SyntaxError: f-string expression part cannot include '#'PEP 701 points out that other languages (such as Ruby, JavaScript, Swift, and C#) that have string interpolation mechanisms allowing expressions also allow arbitrary nesting of said expressions. The current limitations are more or less just annoyances, but they are unnecessary—removing them would also substantially simplify the code that parses f-strings.
The main objection to the PEP centers around the ability to reuse the quotes within the expression, and the arbitrary nesting it allows, due to the ability to abuse the feature in various ways. Steven D'Aprano took exception to two of the examples given in the PEP:
f"These are the things: {", ".join(things)}" f"{source.removesuffix(".py")}.c: $(srcdir)/{source}" [...]The first two might be perfectly understandable to the parser, but as a human reader, they make it more complicated and error-prone to work out which quotes delimit the f-string and which do not.[...] I consider the first two examples terrible code which should be discouraged and the fact that your PEP allows it is a point against it, not in favour.
Especially since we can get the same effect by just changing one of the pairs of quotes to '. So in this regard, the PEP doesn't even add functionality. It just encourages people to write code which is harder to read and more error prone.
Galindo Salgado acknowledged
those concerns and started
a poll to try to gauge the sentiments of the participants in the
discussion. Currently, the poll is around two-thirds in favor of allowing
quote reuse in the expressions. Paul Moore said
that he had voted in favor because he had encountered the problem along the
way. "And
I think consistency in allowing whatever can be in an expression is easier
to explain and understand.
" He did suggest that the PEP add a warning
about overusing the feature, however.
Barry Warsaw agreed
with D'Aprano
that the "join()" example was "challenging for me to
parse
", but he can see the consistency argument as well. "But maybe
for consistency, the answer should be to let people write terrible,
unreadable code!
" Galindo Salgado pointed
out that there is a cost beyond inconsistency, though:
[...] limiting quote reuse raises quite a lot the complexity. This is because when parsing the expression part now the parser needs to be aware that is parsing an expression inside an f-string with a given [quote], and that becomes even more tricky when f-strings are nested with different quotes.This doesn't mean that this invalidates the "code smell" argument by any means: I just want to give some context on the maintenance point.
Mark Shannon thought that adding the quote-reuse restriction back in would be irritating:
Personally, I found the prohibition on reusing the same quote mildly annoying. f"You have the following in your basket: {", ".join(items)}." seems perfectly fine to me.But I think I would find the restriction much more annoying if I knew that it was unnecessary, and that extra work had been put in just to stop me.
There are some horrific f-string abuses that are already possible, as
Marc-André Lemburg demonstrated. "Should we
really head on in the same direction even more ?
" He supports the PEP
because it simplifies the implementation and "removes some annoying bits
(e.g. the backslash limitation)
", but keeping the current quote
restriction removes the possibility of arbitrary nesting of f-strings,
which is a good thing in his mind.
"Shashwat" suggested
that the possibility of misuse is not a good enough reason to disallow
quote reuse, thus nesting, in the parser; "such restrictions belong in
linters and code formatters rather than in the language grammar
itself
". Lemburg agreed
with that point, but also thought the PEP should be changed to discourage
that usage. "At the moment, it reads in a way which promotes reusing the
same quotes inside f-string expressions as a feature.
"
There were also concerns expressed about difficulties in doing syntax highlighting in editors and other tools, but the consensus seems to be that those tools can and will be taught to handle the quote reuse. In fact, any of those tools that support multiple languages have probably already had to deal with this issue since its existence is fairly widespread. In light of the poll and the discussion, the PEP authors decided to keep the lifting of the quote-reuse restriction, though a new, lengthy "Considerations regarding quote reuse" section was added to the PEP. The thread also contained some detailed discussion of the guts of the implementation for CPython and other Python dialects. The results of that have been incorporated as well.
Much of the argument around quote reuse seems to boil down to readability, which is highly subjective. But unreadable code can (and will) be written, even if people differ in their view of which particular constructs fail their criteria. As we saw in the discussions about None-aware operators, readability is often simply in the eye of the beholder.
The discussion has pretty much wound down at this point, so it would not be a surprise to see the PEP make its way to the steering council for pronouncement before long, which means it could be coming in Python 3.12 in October. It seems a foregone conclusion that the idea of formalizing f-strings and replacing the hand-written parser code will be accepted; that will reduce the code maintenance and could lead to better error messages for f-strings, like those that have been added elsewhere for CPython. The council could perhaps require that the quote constraint be retained, but that seems unlikely given the general reception; discouraging abuses of the feature via the PEP and various tools may well be enough.
Index entries for this article | |
---|---|
Python | Language specification |
Python | Python Enhancement Proposals (PEP)/PEP 701 |
Python | Strings |
Posted Jan 10, 2023 23:25 UTC (Tue)
by KaiRo (subscriber, #1987)
[Link] (2 responses)
Posted Jan 11, 2023 2:31 UTC (Wed)
by JoeBuck (subscriber, #2330)
[Link]
Posted Jan 11, 2023 11:43 UTC (Wed)
by smurf (subscriber, #17840)
[Link]
Posted Jan 11, 2023 8:24 UTC (Wed)
by SLi (subscriber, #53131)
[Link] (9 responses)
It's not. It's because we have separate opening and closing parentheses.
I'm not sure how seriously I'm proposing this, but separate opening and closing quotes exist. Perhaps Python should support them.
Posted Jan 11, 2023 9:11 UTC (Wed)
by dtlin (subscriber, #36537)
[Link] (3 responses)
In Perl, q(...)/q<...>/q[...]/q{...} can be used instead of '...', and qq(...)/qq<...>/qq[...]/qq{...} can be used instead of "...". I don't think this approach would be accepted in Python, though.
Posted Jan 11, 2023 18:56 UTC (Wed)
by mbunkus (subscriber, #87248)
[Link]
my $s = qq|This is "nice", I guess, and $this_var will be interpolated!|;
Even C++ has raw strings with custom delimiters now: R"delimiter( raw_characters )delimiter", e.g.
auto s = R"!(This is "nice", I guess!)!";
Posted Jan 12, 2023 2:39 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
(On top of that, Python has, perhaps surprisingly, inherited the "consecutive " "string " "literal " "concatenation" behavior of C, so you can just use two separate string literals if absolutely necessary, and the compiler will concatenate them for you, as if you had written a single string literal. Of course, Python also has peephole optimization and basic constant folding, so this is arguably unnecessary - you could just use plus to concatenate.)
Posted Jan 12, 2023 6:37 UTC (Thu)
by SLi (subscriber, #53131)
[Link]
When I think about it, it just seems kind of nonsensical to use the same thing to start and end something.
Posted Jan 11, 2023 19:00 UTC (Wed)
by pavon (guest, #142617)
[Link] (4 responses)
Posted Jan 11, 2023 20:18 UTC (Wed)
by dtlin (subscriber, #36537)
[Link] (3 responses)
Whose left/right quote marks? The French-style guillemets «...» are more visually distinct than either the English-style “...” and less confusable with other punctuation symbols than the German-style „...“.
If those don't stand out enough, there are many CJK brackets to choose from: 《...》, 「...」, 『...』, 【...】, and more.
Of course, I'm not being entirely serious here, but for comparison, Raku allows for all of those quotes, and then some.
Posted Jan 11, 2023 21:17 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link]
https://jakubmarian.com/map-of-quotation-marks-in-europea...
Posted Jan 12, 2023 2:33 UTC (Thu)
by stephen.pollei (subscriber, #125364)
[Link]
Posted Jan 12, 2023 11:00 UTC (Thu)
by shiar (subscriber, #67206)
[Link]
Posted Jan 11, 2023 8:51 UTC (Wed)
by marcH (subscriber, #57642)
[Link]
While readability has always been strongly encouraged in Python, it is still possible to write obfuscated code in any language with a small effort. As long as there are obvious, more readable alternatives then no one usually cares or talks about obfuscation possibilities. So only because of this historical "accident" people became passionate about this non-question. Whether obfuscated code is allowed or not looks like serious... bikeshedding compared to other, actually important considerations like the implementation's complexity.
> discouraging abuses of the feature via the PEP and various tools may well be enough.
Indeed where this belongs.
Posted Jan 11, 2023 11:54 UTC (Wed)
by eru (subscriber, #2753)
[Link] (1 responses)
Nice work, but not sure if this is so useful in real life. f-strings (and similar features in other programming languages) turn the code unreadable, if the expression being expanded is complex (basically anything beyond a variable reference or a very simple expression). Anything more complicated is best handled by introducing auxiliary variables. Or just by using the string concatenation operator.
Posted Jan 11, 2023 17:53 UTC (Wed)
by iabervon (subscriber, #722)
[Link]
For that matter, what constitutes "a very simple expression" is really dependent on the language's common idioms and the capabilities of widely-deployed syntax highlighting. Comma-separated strings from a list is probably that idiomatic in Python, especially if the ".join(" portion doesn't appear in the string literal color, and it's a common thing to only want for a message anyway.
Posted Jan 12, 2023 13:46 UTC (Thu)
by nbecker (subscriber, #35200)
[Link] (2 responses)
Posted Jan 18, 2023 0:23 UTC (Wed)
by heiner (guest, #158880)
[Link]
r'$x^2+\sigma^%(coef)i$' % {'coef': 4}
Posted Jan 18, 2023 9:20 UTC (Wed)
by anselm (subscriber, #2796)
[Link]
If you don't mind the extra braces, then fr'$x^2+\sigma^{{{coef}}}$' should work. (Note how you can combine f'…' and r'…', so fr'\alpha' will do the Right Thing even if '\a' would otherwise come out as '\x07'.) Tedious, perhaps, but not impossible.
Posted Jan 25, 2023 15:41 UTC (Wed)
by NRArnot (subscriber, #3033)
[Link]
d = dict( a=1, b=2)
Formalizing f-strings
It's a similar issue, I think, to shadowing in C/C++. We can have
Formalizing f-strings
int foo;
code;
if (condition) {
double foo;
use(foo);
}
and the behavior is well-defined but not recommended, so we have -Wshadow. The {...} in an f-string is a similar kind of inner scope. Re-use of the same quote character can be confusing and perhaps might produce a warning, but it's not difficult to parse, and forbidding it can be more work than letting it pass.
Formalizing f-strings
Formalizing f-strings
Formalizing f-strings
In Ruby, %q(...)/%q<...>/%q[...]/%q{...} can be used instead of '...', and %Q(...)/%Q<...>/%Q[...]/%Q{...} can be used instead of "...".
(They both allow for other delimiters as well, but then open and close are the same character.)
Formalizing f-strings
Formalizing f-strings
Formalizing f-strings
Formalizing f-strings
Formalizing f-strings
Formalizing f-strings
Raku
So does (or will) Perl: Formalizing f-strings
use experimental 'extra_paired_delimiters'
in v5.36.
Formalizing f-strings
Formalizing f-strings
Formalizing f-strings
Formalizing f-strings
Usually to get latex to work in python strings uses raw r' strings. But that's not possible with f-strings.
I don't propose a solution. An example of text you might want in matplotlib using latex could be:
f'$x^2+\sigma^{coef}$'
Formalizing f-strings
Formalizing f-strings
I'm trying to see why this isn't a very small storm in a teacup. We have two sorts of triple-quotes! This works just fine:
Formalizing f-strings
print( f'''{d.get('b')}''' )