|
|
Subscribe / Log in / New account

Formalizing f-strings

By Jake Edge
January 10, 2023

Python's formatted strings, or "f-strings", came relatively late to the language, but have become a popular feature. F-strings allow a compact representation for the common task of interpolating program data into strings, often in order to output them in some fashion. Some restrictions were placed on f-strings to simplify the implementation of them, but those restrictions are not really needed anymore and, in fact, are complicating the CPython parser. That has led to a Python Enhancement Proposal (PEP) to formalize the syntax of f-strings for the benefit of Python users while simplifying the maintenance of the interpreter itself.

Some history

F-strings got their start in 2015, when PEP 498 ("Literal String Interpolation") was accepted for Python 3.6. The PEP added a new way to specify strings that would have values interpolated into them:

    answer = 42
    reply = f'Obviously, the answer is {answer}'
    # reply is now "Obviously, the answer is 42"
But f-strings are far more than just that, because any arbitrary Python expression can be placed between the curly brackets; the expression will be evaluated and its value interpolated into the string:
    reply = f'More answers: { [ answer+i for i in range(5) ] }'
    # "More answers: [42, 43, 44, 45, 46]"
The expression can, of course, contain other strings, including f-strings, but PEP 498 imposed a limitation due to the (then) existing parser. Whatever type of quote was used to start the f-string could not be used inside the expression portions (i.e. the parts inside curly brackets). So, simply cutting and pasting code into an f-string may not work:
    foo = a['x'] + a['y']
    f'{foo}'     # works, of course
    f'{a['x']}'  # fails with SyntaxError
    f'{a["x"]}'  # workaround

The current implementation for f-strings simply used the existing machinery for handling other kinds of specialized strings, such as r'' for raw strings or b'' for byte strings. But f-strings are fundamentally different from the others because of the arbitrary expressions that are allowed. The advent of a new CPython parser for Python 3.9 in 2020 opened up some other possibilities for implementing f-strings.

In 2021, Pablo Galindo Salgado posted to the python-dev mailing list that he was working on moving the parsing of f-strings into the CPython parser. That would mean some of the restrictions could potentially be removed and "we can drop a considerable amount of hand-written code". He was asking for opinions about the idea and on the various options for restrictions that could be lifted. That resulted in a fairly brief discussion (by Python standards at least) that was generally favorable toward the idea.

At the 2022 Python Language Summit in April, Galindo Salgado gave a presentation on the idea, which was greeted with enthusiasm from various core developers, including Eric V. Smith who developed f-strings and authored PEP 498. So Galindo Salgado teamed up with Batuhan Taskaya and Lysandros Nikolaou to create PEP 701 ("Syntactic formalization of f-strings"), which was announced on the Python discussion forum in mid-December.

PEP 701

The new PEP sets out "to lift some of the restrictions originally formulated in PEP 498 and to provide a formalized grammar for f-strings that can be integrated into the parser directly". It notes that the restrictions were set to be removed in PEP 536 ("Final Grammar for Literal String Interpolation") but that PEP, written in 2016, has never been implemented and was deferred in 2019. In addition to removing the restriction on reusing the f-string quote delimiter within expressions, as mentioned above, the new PEP would dispense with a few other restrictions: escape sequences using backslashes would be permitted in the expressions as would comments in multi-line f-strings. The following examples would work as expected if PEP 701 gets adopted:

    >>> a = [ 'hello', 'world' ]
    >>> f'{"\n".join(a)}'
      File "<stdin>", line 1
	f'{"\n".join(a)}'
			 ^
    SyntaxError: f-string expression part cannot include a backslash

    >>> f'''foo {
    ... bar # a comment about bar
    ... }'''
      File "<stdin>", line 3
	}'''
	    ^
    SyntaxError: f-string expression part cannot include '#'
PEP 701 points out that other languages (such as Ruby, JavaScript, Swift, and C#) that have string interpolation mechanisms allowing expressions also allow arbitrary nesting of said expressions. The current limitations are more or less just annoyances, but they are unnecessary—removing them would also substantially simplify the code that parses f-strings.

The main objection to the PEP centers around the ability to reuse the quotes within the expression, and the arbitrary nesting it allows, due to the ability to abuse the feature in various ways. Steven D'Aprano took exception to two of the examples given in the PEP:

    f"These are the things: {", ".join(things)}"

    f"{source.removesuffix(".py")}.c: $(srcdir)/{source}"

    [...]
The first two might be perfectly understandable to the parser, but as a human reader, they make it more complicated and error-prone to work out which quotes delimit the f-string and which do not.

[...] I consider the first two examples terrible code which should be discouraged and the fact that your PEP allows it is a point against it, not in favour.

Especially since we can get the same effect by just changing one of the pairs of quotes to '. So in this regard, the PEP doesn't even add functionality. It just encourages people to write code which is harder to read and more error prone.

Galindo Salgado acknowledged those concerns and started a poll to try to gauge the sentiments of the participants in the discussion. Currently, the poll is around two-thirds in favor of allowing quote reuse in the expressions. Paul Moore said that he had voted in favor because he had encountered the problem along the way. "And I think consistency in allowing whatever can be in an expression is easier to explain and understand." He did suggest that the PEP add a warning about overusing the feature, however.

Barry Warsaw agreed with D'Aprano that the "join()" example was "challenging for me to parse", but he can see the consistency argument as well. "But maybe for consistency, the answer should be to let people write terrible, unreadable code!" Galindo Salgado pointed out that there is a cost beyond inconsistency, though:

[...] limiting quote reuse raises quite a lot the complexity. This is because when parsing the expression part now the parser needs to be aware that is parsing an expression inside an f-string with a given [quote], and that becomes even more tricky when f-strings are nested with different quotes.

This doesn't mean that this invalidates the "code smell" argument by any means: I just want to give some context on the maintenance point.

Mark Shannon thought that adding the quote-reuse restriction back in would be irritating:

Personally, I found the prohibition on reusing the same quote mildly annoying. f"You have the following in your basket: {", ".join(items)}." seems perfectly fine to me.

But I think I would find the restriction much more annoying if I knew that it was unnecessary, and that extra work had been put in just to stop me.

There are some horrific f-string abuses that are already possible, as Marc-André Lemburg demonstrated. "Should we really head on in the same direction even more ?" He supports the PEP because it simplifies the implementation and "removes some annoying bits (e.g. the backslash limitation)", but keeping the current quote restriction removes the possibility of arbitrary nesting of f-strings, which is a good thing in his mind.

"Shashwat" suggested that the possibility of misuse is not a good enough reason to disallow quote reuse, thus nesting, in the parser; "such restrictions belong in linters and code formatters rather than in the language grammar itself". Lemburg agreed with that point, but also thought the PEP should be changed to discourage that usage. "At the moment, it reads in a way which promotes reusing the same quotes inside f-string expressions as a feature."

There were also concerns expressed about difficulties in doing syntax highlighting in editors and other tools, but the consensus seems to be that those tools can and will be taught to handle the quote reuse. In fact, any of those tools that support multiple languages have probably already had to deal with this issue since its existence is fairly widespread. In light of the poll and the discussion, the PEP authors decided to keep the lifting of the quote-reuse restriction, though a new, lengthy "Considerations regarding quote reuse" section was added to the PEP. The thread also contained some detailed discussion of the guts of the implementation for CPython and other Python dialects. The results of that have been incorporated as well.

Much of the argument around quote reuse seems to boil down to readability, which is highly subjective. But unreadable code can (and will) be written, even if people differ in their view of which particular constructs fail their criteria. As we saw in the discussions about None-aware operators, readability is often simply in the eye of the beholder.

The discussion has pretty much wound down at this point, so it would not be a surprise to see the PEP make its way to the steering council for pronouncement before long, which means it could be coming in Python 3.12 in October. It seems a foregone conclusion that the idea of formalizing f-strings and replacing the hand-written parser code will be accepted; that will reduce the code maintenance and could lead to better error messages for f-strings, like those that have been added elsewhere for CPython. The council could perhaps require that the quote constraint be retained, but that seems unlikely given the general reception; discouraging abuses of the feature via the PEP and various tools may well be enough.


Index entries for this article
PythonLanguage specification
PythonPython Enhancement Proposals (PEP)/PEP 701
PythonStrings


to post comments

Formalizing f-strings

Posted Jan 10, 2023 23:25 UTC (Tue) by KaiRo (subscriber, #1987) [Link] (2 responses)

IMHO, it's fine to allow quote reuse in this PEP, but the PEP-8 guidelines/standard should be extended to discourage quote reuse at least in single-line f-strings.

Formalizing f-strings

Posted Jan 11, 2023 2:31 UTC (Wed) by JoeBuck (subscriber, #2330) [Link]

It's a similar issue, I think, to shadowing in C/C++. We can have
int foo;
code;
if (condition) {
    double foo;
    use(foo);
}
and the behavior is well-defined but not recommended, so we have -Wshadow. The {...} in an f-string is a similar kind of inner scope. Re-use of the same quote character can be confusing and perhaps might produce a warning, but it's not difficult to parse, and forbidding it can be more work than letting it pass.

Formalizing f-strings

Posted Jan 11, 2023 11:43 UTC (Wed) by smurf (subscriber, #17840) [Link]

"Discourage" isn't the same as "forbid", though. With this change, I can now copy+paste an arbitrary expression directly from my code into an f-string that reports its value, and I don't have to bother with quoting or any other may-not-appear-in-an-fstring-ism.

Formalizing f-strings

Posted Jan 11, 2023 8:24 UTC (Wed) by SLi (subscriber, #53131) [Link] (9 responses)

I started to wonder why nested quotes seem harder to parse than nested parentheses. Is it simply because we're used to parentheses?

It's not. It's because we have separate opening and closing parentheses.

I'm not sure how seriously I'm proposing this, but separate opening and closing quotes exist. Perhaps Python should support them.

Formalizing f-strings

Posted Jan 11, 2023 9:11 UTC (Wed) by dtlin (subscriber, #36537) [Link] (3 responses)

In Perl, q(...)/q<...>/q[...]/q{...} can be used instead of '...', and qq(...)/qq<...>/qq[...]/qq{...} can be used instead of "...".
In Ruby, %q(...)/%q<...>/%q[...]/%q{...} can be used instead of '...', and %Q(...)/%Q<...>/%Q[...]/%Q{...} can be used instead of "...".
(They both allow for other delimiters as well, but then open and close are the same character.)

I don't think this approach would be accepted in Python, though.

Formalizing f-strings

Posted Jan 11, 2023 18:56 UTC (Wed) by mbunkus (subscriber, #87248) [Link]

In Perl you can also use any other single-character delimiter, not just the various pairs of parenthesis, both for single & double quotes. E.g.

my $s = qq|This is "nice", I guess, and $this_var will be interpolated!|;

Even C++ has raw strings with custom delimiters now: R"delimiter( raw_characters )delimiter", e.g.

auto s = R"!(This is "nice", I guess!)!";

Formalizing f-strings

Posted Jan 12, 2023 2:39 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

Python mostly doesn't need it, because of r-prefixed triple quoted strings. Unless you *happen* to need to write three consecutive apostrophes *and* three consecutive double quotation marks in the *same* string literal, it's Good Enough, and so it naturally displaces any alternative based on what other languages may happen to be doing.

(On top of that, Python has, perhaps surprisingly, inherited the "consecutive " "string " "literal " "concatenation" behavior of C, so you can just use two separate string literals if absolutely necessary, and the compiler will concatenate them for you, as if you had written a single string literal. Of course, Python also has peephole optimization and basic constant folding, so this is arguably unnecessary - you could just use plus to concatenate.)

Formalizing f-strings

Posted Jan 12, 2023 6:37 UTC (Thu) by SLi (subscriber, #53131) [Link]

I'm probably as used to straight ASCII quotation marks (and irritated by other styles) as any, but I feel this is could be a case where we don't understand what we're missing out on. Would you also say we wouldn't need separate left and right parentheses if we could just triple them, like |||?

When I think about it, it just seems kind of nonsensical to use the same thing to start and end something.

Formalizing f-strings

Posted Jan 11, 2023 19:00 UTC (Wed) by pavon (guest, #142617) [Link] (4 responses)

I think that the left and right quote marks aren't different enough to stand out in the way that left and right parentheses do. It is more on the level of a subtle typesetting improvement, than jumping out at you, at least for the english convention. Which brings up the complications of there being multiple variations on the opening quote character (is it high or low, and does the character face upwards or downwards. Worse some languages like Finnish and Swedish use what English would consider the closing quote mark for both opening and closing. So do you allow all these variations, or just some? I don't know that it would end up being an improvement, other than letting you copy-paste between a word-processor without breaking the code :)

Formalizing f-strings

Posted Jan 11, 2023 20:18 UTC (Wed) by dtlin (subscriber, #36537) [Link] (3 responses)

Whose left/right quote marks? The French-style guillemets «...» are more visually distinct than either the English-style “...” and less confusable with other punctuation symbols than the German-style „...“.

If those don't stand out enough, there are many CJK brackets to choose from: 《...》, 「...」, 『...』, 【...】, and more.

Of course, I'm not being entirely serious here, but for comparison, Raku allows for all of those quotes, and then some.

Formalizing f-strings

Posted Jan 11, 2023 21:17 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Looking at this page, Sweden and Finland use "fancy quotes", but only the "end" one for both delimiters. `»quoted»` is also recognized apparently.

https://jakubmarian.com/map-of-quotation-marks-in-europea...

Raku

Posted Jan 12, 2023 2:33 UTC (Thu) by stephen.pollei (subscriber, #125364) [Link]

There is actually a lot that I love about raku... wish it was a bit more popular and had better support.

Formalizing f-strings

Posted Jan 12, 2023 11:00 UTC (Thu) by shiar (subscriber, #67206) [Link]

So does (or will) Perl: use experimental 'extra_paired_delimiters' in v5.36.

Formalizing f-strings

Posted Jan 11, 2023 8:51 UTC (Wed) by marcH (subscriber, #57642) [Link]

If the initial implementation had allowed quote re-use then I suspect this discussion would have never happened at all.

While readability has always been strongly encouraged in Python, it is still possible to write obfuscated code in any language with a small effort. As long as there are obvious, more readable alternatives then no one usually cares or talks about obfuscation possibilities. So only because of this historical "accident" people became passionate about this non-question. Whether obfuscated code is allowed or not looks like serious... bikeshedding compared to other, actually important considerations like the implementation's complexity.

> discouraging abuses of the feature via the PEP and various tools may well be enough.

Indeed where this belongs.

Formalizing f-strings

Posted Jan 11, 2023 11:54 UTC (Wed) by eru (subscriber, #2753) [Link] (1 responses)

Nice work, but not sure if this is so useful in real life. f-strings (and similar features in other programming languages) turn the code unreadable, if the expression being expanded is complex (basically anything beyond a variable reference or a very simple expression). Anything more complicated is best handled by introducing auxiliary variables. Or just by using the string concatenation operator.

Formalizing f-strings

Posted Jan 11, 2023 17:53 UTC (Wed) by iabervon (subscriber, #722) [Link]

It'll be very useful to be able to do, even if you never keep code that does it: you can copy some complex expression from nearby code into a log message and see what's really happening in the code you're debugging, without a lot of effort making sure to get refactoring right and not disturb the local scope. Then, when you understand what you should really make the code do, you'll check something in that doesn't have the unreadable code.

For that matter, what constitutes "a very simple expression" is really dependent on the language's common idioms and the capabilities of widely-deployed syntax highlighting. Comma-separated strings from a list is probably that idiomatic in Python, especially if the ".join(" portion doesn't appear in the string literal color, and it's a common thing to only want for a message anyway.

Formalizing f-strings

Posted Jan 12, 2023 13:46 UTC (Thu) by nbecker (subscriber, #35200) [Link] (2 responses)

My problem with f-strings is they don't cooperate well with LaTeX, which uses backslash and braces. This is often seen in text for matplotlib.
Usually to get latex to work in python strings uses raw r' strings. But that's not possible with f-strings.
I don't propose a solution. An example of text you might want in matplotlib using latex could be:
f'$x^2+\sigma^{coef}$'

Formalizing f-strings

Posted Jan 18, 2023 0:23 UTC (Wed) by heiner (guest, #158880) [Link]

There's still plain old %, which works well with dicts. You can do:

r'$x^2+\sigma^%(coef)i$' % {'coef': 4}

Formalizing f-strings

Posted Jan 18, 2023 9:20 UTC (Wed) by anselm (subscriber, #2796) [Link]

If you don't mind the extra braces, then fr'$x^2+\sigma^{{{coef}}}$' should work. (Note how you can combine f'…' and r'…', so fr'\alpha' will do the Right Thing even if '\a' would otherwise come out as '\x07'.)

Tedious, perhaps, but not impossible.

Formalizing f-strings

Posted Jan 25, 2023 15:41 UTC (Wed) by NRArnot (subscriber, #3033) [Link]

I'm trying to see why this isn't a very small storm in a teacup. We have two sorts of triple-quotes! This works just fine:

d = dict( a=1, b=2)
print( f'''{d.get('b')}''' )


Copyright © 2023, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds