|
|
Subscribe / Log in / New account

Easier Python string formatting

By Jonathan Corbet
September 10, 2015
Some languages pride themselves on providing many ways to accomplish any given task. Python, instead, tends to focus on providing a single solution to most problems. There are exceptions, though; the creation of formatted strings would appear to be one of them. Despite the fact that there are (at least) three mechanisms available now, Python's developers have just adopted a plan to add a fourth. With luck, this new formatting mechanism (slated for Python 3.6) will improve the traditionally cumbersome string-formatting facilities available in Python.

Like many interpreted languages, Python is used heavily for string processing tasks. At the output end, that means creating formatted text. Currently, there are three supported ways to get the same result:

    'The answer is %d' % (42,)

    'The answer = {answer}'.format(answer = 42)

    s = string.Template('The answer is $answer')
    s.substitute(answer=42)

The traditional "%" operator suffers from some interesting lexical traps and only supports a small number of types. The format() string method is more flexible, but is somewhat verbose, and the Template class seems to combine the shortcomings of the previous two methods and throws in yet another syntax to boot. All three methods require a separation between the format string and the values that are to be formatted into it, increasing verbosity and, arguably, decreasing readability, while other languages have facilities that do not require that separation.

f-strings

Other languages, such as Perl and Ruby, have more concise string-formatting operations. With the debut of the string interpolation mechanism described in PEP 498, Python will have a similar facility. This PEP introduces a new type of string, called an "f-string" ("formatted string") denoted by an "f" character before the opening quote:

    f'This is an f-string'

F-strings thus join the short list of special string types in Python; others include r'raw' and b'byte' strings. The thing that makes an f-string special is that it is evaluated as a particular type of expression when it is executed. Thus, to replicate the above examples:

    answer = 42
    f'The answer is {answer}'

As can be seen, f-strings obtain the value to be formatted directly from the local (and global) namespace; there is no need to pass it in as a parameter to a formatting function or operator. Beyond that, though, what appears between the brackets can be an arbitrary expression:

    answer = 42
    f'The answer is not {answer+1}'
    f'The root of the answer is {math.sqrt(answer)}'

So formatted output can be created with expressions of just about any complexity. These expressions might even have side effects, though one suspects that would rarely be a good idea.

Under the hood, the execution of f-strings works by evaluating each expression found in curly brackets, then invoking the __format__() method on each result. So the following two lines would have an equivalent effect:

    f'The answer is {answer}'
    'The answer is ' + answer.__format__()

A format string to be passed to __format__() can be appended to the expression with a colon, thus, for example:

    f'The answer is {answer:%04d}'

One can also append "!s" to pass the value to str() first, "!r" to use repr(), or "!a" to use ascii(). So, once again, the following two lines would do the same thing:

    f'The answer is {answer:%04d!r}'
    'The answer is ' + repr(answer).__format__('%04d')

That is the core of the change. There are other details, of course; see the PEP for the full story. The PEP was accepted by Python benevolent dictator for life Guido van Rossum on September 8, so, unless something goes surprisingly wrong somewhere, f-strings will be a part of the Python 3.6 release.

Where next?

PEP 498 was somewhat controversial over the course of its development. There were a number of concerns about how f-strings fit into the Python worldview in general, but there was also a specific concern: security. In particular, Nick Coghlan expressed concerns that f-strings would make it easy to write insecure code. Examples would be usage like:

    os.system(f'cat {file}')
    SQL.run(f'select {column} from {table}')

In either case, if any of the values substituted into the strings are supplied by the user, the result could be the compromise of the whole system. The problem is not that f-strings make it possible to incorporate untrusted data into trusted strings — that can just as easily be done with existing string-formatting mechanisms. And the problem is certainly not that f-strings make string formatting easier in general; Nick's specific concern is that f-strings will be the easiest way to put strings together, while more secure methods remain harder. Using an f-string to format an SQL query will be easier to code (and to read later) than properly escaping the parameters, so developers will be drawn toward the insecure alternative.

His suggestion, as described in PEP 501, is to make the secure way as easy to use as the insecure way. The result is "i-strings"; they look a lot like f-strings in that the syntax is nearly identical:

    i'The answer is {answer}'

There is a key difference, though: while f-strings produce a formatted string immediately on execution, i-strings delay that formatting. An explicit call to a format function is required to do the job. To see the difference, consider the two lines below, which have equivalent effect:

    print(f'The answer is {answer}')
    print(format(i'The answer is {answer}'))

The key to Nick's proposal is that format() can be replaced with another formatting function that knows how to escape dangerous characters in the intended usage scenario. Thus:

    os.system(sh(i'cat {file}'))
    SQL.run(sql(i'select {column} from {table}'))

The sh() formatter would ensure that no shell metacharacters get through, while sql() would prevent SQL-injection attacks. These formatters would be easy enough to use that developers would not be tempted to bypass them. Just as importantly, static analysis software could easily distinguish between safe and unsafe string usage for a given API, making it possible to automatically detect when the wrong type of string is being used.

PEP 501 has been through a long series of revisions, involving significant changes, since first being posted. At times the syntax was rather more complicated, prompting Guido to ask: "Have I died and gone to Perl?". Nick's proposal had originally been intended as an alternative to PEP 498, but, over time, Nick warmed to the f-string approach and came out in favor of its adoption. PEP 501 remains outstanding, though, and will likely be pursued as an extension to f-strings.

That work, too, could conceivably happen in time for the 3.6 release, which is planned to happen in late 2016. Given its volatile history thus far, chances are that the end result will look somewhat different from what has been proposed to date. However it turns out, though, Python should no longer have to defer to other languages when it comes to the ease of creating formatted output.

Index entries for this article
PythonPython Enhancement Proposals (PEP)/PEP 498
PythonStrings


to post comments

Easier Python string formatting

Posted Sep 10, 2015 14:03 UTC (Thu) by richmoore (guest, #53133) [Link] (8 responses)

There's another existing way too:

'the answer is %(answer)s' % ({'answer': 42})

Easier Python string formatting

Posted Sep 10, 2015 14:45 UTC (Thu) by corbet (editor, #1) [Link] (7 responses)

I had just grouped that in as a use of the % operator — didn't want to get into all of the details of it. Especially if you start to get into "%locals()" and other dirty tricks like that...

Easier Python string formatting

Posted Sep 10, 2015 21:24 UTC (Thu) by Fats (guest, #14882) [Link] (5 responses)

But your .format() example then really looks as verbose on purpose.
You could also do:
    'The answer is {}'.format(42)

Easier Python string formatting

Posted Sep 10, 2015 21:38 UTC (Thu) by corbet (editor, #1) [Link] (1 responses)

Whatever. I would posit that

    'The answer is {}'.format(answer)

is still more verbose than:

    f'The answer is {answer}'

The people working on this facility would seem to agree. But a couple of folks have said this now, so perhaps this feeling isn't universal?

Easier Python string formatting

Posted Sep 10, 2015 22:25 UTC (Thu) by maney (subscriber, #12630) [Link]

There's "explicit is better than implicit", though I would agree that that's more of a guideline than a rule. But my immediate reaction from partway through the article, as shared in #chipy, looked like this:

There should be one obviously correct way to do things. Except for string formatting, where three is okay, but four would be better.

Easier Python string formatting

Posted Sep 10, 2015 22:08 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

However, let's consider this:

"Today is {} {}".format(month, date)

Now, let's add some localization:

"Heute ist {} {}".format(date, month)

See the difference? String interpolation is a very useful tool.

Easier Python string formatting

Posted Sep 11, 2015 21:01 UTC (Fri) by mgedmin (subscriber, #34497) [Link] (1 responses)

This raises a good point. I know that

_("Today is {day} {month}").format(...)

works. How will you translate f-strings?

Easier Python string formatting

Posted Sep 11, 2015 21:49 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

I guess the same way. You just don't need to specify all the inputs in the .format() part.

Actually, a couple of years back, I had a helper that used "locals()" method to get all the in-scope variables for the format() call.

Easier Python string formatting

Posted Sep 10, 2015 23:36 UTC (Thu) by vapier (guest, #15768) [Link]

i don't generally consider them the same since they have different syntax. these are not equiv:

d = {'answer': 1}
'The answer = {answer}'.format(**answer)
'The answer = %(answer)d' % answer

the first form has to be:

'The answer = {answer!d}'.format(**answer)

and the python team themselves document them in completely different sections:

https://docs.python.org/2/library/stdtypes.html#string-fo...
https://docs.python.org/2/library/string.html#format-stri...

Easier Python string formatting

Posted Sep 10, 2015 16:07 UTC (Thu) by talex (guest, #19139) [Link]

A similar system is used in E, except that rather than escaping bad characters, it typically creates a structured object instead of a string. Consider this E code:

def name := "O'No! $@?"
sql`INSERT INTO users (userName, karma) VALUES ($name, 0)`

Here, the template is first parsed as an SQL prepared statement, then the arguments are filled in. There's nothing special about the "sql" prefix. The E parser expands this to:

sql__quasiParser.valueMaker("INSERT INTO users (userName, karma) VALUES (${0}, 0)").substitute([name])

sql__quasiParser is an ordinary object (in this case, the result of opening a database connection). By creating objects with such names, any prefix can be defined. Similar parsers can be used for XML, regular expressions, JSON, etc.

There's no risk of using plain string interpolation by mistake, because something that wants XML will expect an XML element, not a string.

See: http://wiki.erights.org/wiki/Walnut/Ordinary_Programming/...

Easier Python string formatting

Posted Sep 10, 2015 17:04 UTC (Thu) by dashesy (guest, #74652) [Link] (2 responses)

There are already plenitude of excellent template magic in Python, like Jinja and django has its own. Why is the language bothering with providing an (unavoidably) inferior solution to what is just a `pip install` away. The best way to do `foo()` in Python is this:
pip install foo_bar
then
from foo_bar import foo
foo()
As if it is not already annoying in one code-base to see strings with single(`), or double(") or tiple(""") quotes (I know PEP discourages some forms), along with b or r or u and now f too! Scattered with `.decode(..)` and `.encode(..)` clutter to make Py3 happy.

Easier Python string formatting

Posted Sep 22, 2015 10:42 UTC (Tue) by MKesper (subscriber, #38539) [Link] (1 responses)

decode() and .encode() clutter is what I remove when converting from Py2 to Py3.

Easier Python string formatting

Posted Sep 22, 2015 17:14 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Uhm, how? You still have to encode/decode incoming data.

Easier Python string formatting

Posted Sep 10, 2015 17:28 UTC (Thu) by josh (subscriber, #17465) [Link]

> The format() string method is more flexible, but is somewhat verbose

.format() doesn't need to be any more verbose than the % operator, if you don't use names:

'The answer = {}'.format(42)

Easier Python string formatting

Posted Sep 10, 2015 22:42 UTC (Thu) by flussence (guest, #85566) [Link]

That snide jab at Perl at the end is heavily ironic. Python's newly added «f""» modifier is identical in function and syntax to Perl 6's «q:c""», which predates it by several years. The only real difference is in implementation: Python's is hardcoded into the language, whereas Perl's is extensible and allows user-defined constructs (say, «q:sql""»).

(«i""» sounds like a 1:1 copy of Javascript's template strings, FWIW.)

Easier Python string formatting

Posted Sep 11, 2015 11:53 UTC (Fri) by jezuch (subscriber, #52988) [Link] (1 responses)

> i'The answer is {answer}'
>
> There is a key difference, though: while f-strings produce a formatted string immediately on execution, i-strings delay that formatting. An explicit call to a format function is required to do the job.

Groan. It's not fixing the insecurity, it's delaying the insecurity. Either use proper placeholder system (whatever it is in your language/library) or go back to PHP :)

Easier Python string formatting

Posted Sep 14, 2015 8:38 UTC (Mon) by epa (subscriber, #39769) [Link]

Many 'proper placeholder systems' end up working by string interpolation in the end (I am thinking of DBMSes which don't have a separate wire protocol for declaring and setting placeholders). As long as the interpolation is done sanely and with knowledge of the proper syntax rules, what's the problem?

Easier Python string formatting

Posted Sep 17, 2015 7:59 UTC (Thu) by Rynor (guest, #85400) [Link]

This is just silly and so contradictory to the Zen of Python.
I'd say .format() already handles all the required functionality in a concise way.

"Explicit is better than implicit."
Using variables from the local and global scope is just asking for unclear code.

"There should be one-- and preferably only one --obvious way to do it."
As said before, there are already too many alternatives. Why add another that just duplicates functionality?

And finally..
"Special cases aren't special enough to break the rules."


Copyright © 2015, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds