Easier Python string formatting
Like many interpreted languages, Python is used heavily for string processing tasks. At the output end, that means creating formatted text. Currently, there are three supported ways to get the same result:
'The answer is %d' % (42,) 'The answer = {answer}'.format(answer = 42) s = string.Template('The answer is $answer') s.substitute(answer=42)
The traditional "%" operator suffers from some interesting lexical traps and only supports a small number of types. The format() string method is more flexible, but is somewhat verbose, and the Template class seems to combine the shortcomings of the previous two methods and throws in yet another syntax to boot. All three methods require a separation between the format string and the values that are to be formatted into it, increasing verbosity and, arguably, decreasing readability, while other languages have facilities that do not require that separation.
f-strings
Other languages, such as Perl and Ruby, have more concise string-formatting operations. With the debut of the string interpolation mechanism described in PEP 498, Python will have a similar facility. This PEP introduces a new type of string, called an "f-string" ("formatted string") denoted by an "f" character before the opening quote:
f'This is an f-string'
F-strings thus join the short list of special string types in Python; others include r'raw' and b'byte' strings. The thing that makes an f-string special is that it is evaluated as a particular type of expression when it is executed. Thus, to replicate the above examples:
answer = 42 f'The answer is {answer}'
As can be seen, f-strings obtain the value to be formatted directly from the local (and global) namespace; there is no need to pass it in as a parameter to a formatting function or operator. Beyond that, though, what appears between the brackets can be an arbitrary expression:
answer = 42 f'The answer is not {answer+1}' f'The root of the answer is {math.sqrt(answer)}'
So formatted output can be created with expressions of just about any complexity. These expressions might even have side effects, though one suspects that would rarely be a good idea.
Under the hood, the execution of f-strings works by evaluating each expression found in curly brackets, then invoking the __format__() method on each result. So the following two lines would have an equivalent effect:
f'The answer is {answer}' 'The answer is ' + answer.__format__()
A format string to be passed to __format__() can be appended to the expression with a colon, thus, for example:
f'The answer is {answer:%04d}'
One can also append "!s" to pass the value to str() first, "!r" to use repr(), or "!a" to use ascii(). So, once again, the following two lines would do the same thing:
f'The answer is {answer:%04d!r}' 'The answer is ' + repr(answer).__format__('%04d')
That is the core of the change. There are other details, of course; see the PEP for the full story. The PEP was accepted by Python benevolent dictator for life Guido van Rossum on September 8, so, unless something goes surprisingly wrong somewhere, f-strings will be a part of the Python 3.6 release.
Where next?
PEP 498 was somewhat controversial over the course of its development. There were a number of concerns about how f-strings fit into the Python worldview in general, but there was also a specific concern: security. In particular, Nick Coghlan expressed concerns that f-strings would make it easy to write insecure code. Examples would be usage like:
os.system(f'cat {file}') SQL.run(f'select {column} from {table}')
In either case, if any of the values substituted into the strings are supplied by the user, the result could be the compromise of the whole system. The problem is not that f-strings make it possible to incorporate untrusted data into trusted strings — that can just as easily be done with existing string-formatting mechanisms. And the problem is certainly not that f-strings make string formatting easier in general; Nick's specific concern is that f-strings will be the easiest way to put strings together, while more secure methods remain harder. Using an f-string to format an SQL query will be easier to code (and to read later) than properly escaping the parameters, so developers will be drawn toward the insecure alternative.
His suggestion, as described in PEP 501, is to make the secure way as easy to use as the insecure way. The result is "i-strings"; they look a lot like f-strings in that the syntax is nearly identical:
i'The answer is {answer}'
There is a key difference, though: while f-strings produce a formatted string immediately on execution, i-strings delay that formatting. An explicit call to a format function is required to do the job. To see the difference, consider the two lines below, which have equivalent effect:
print(f'The answer is {answer}') print(format(i'The answer is {answer}'))
The key to Nick's proposal is that format() can be replaced with another formatting function that knows how to escape dangerous characters in the intended usage scenario. Thus:
os.system(sh(i'cat {file}')) SQL.run(sql(i'select {column} from {table}'))
The sh() formatter would ensure that no shell metacharacters get through, while sql() would prevent SQL-injection attacks. These formatters would be easy enough to use that developers would not be tempted to bypass them. Just as importantly, static analysis software could easily distinguish between safe and unsafe string usage for a given API, making it possible to automatically detect when the wrong type of string is being used.
PEP 501 has been through a long series of revisions, involving significant
changes, since first being posted. At times the syntax was rather more
complicated, prompting Guido to ask:
"Have I died and gone to Perl?
". Nick's proposal had
originally been intended as an alternative to PEP 498, but, over time,
Nick warmed to the f-string approach and
came out in favor of its adoption. PEP 501 remains outstanding,
though, and will likely be pursued as an extension to f-strings.
That work, too, could conceivably happen in time for the 3.6 release, which
is planned to happen in late 2016. Given its volatile history thus far,
chances are that the end result will look somewhat different from what has
been proposed to date. However it turns out, though, Python should no
longer have to defer to other languages when it comes to the ease of
creating formatted output.
Index entries for this article | |
---|---|
Python | Python Enhancement Proposals (PEP)/PEP 498 |
Python | Strings |
Posted Sep 10, 2015 14:03 UTC (Thu)
by richmoore (guest, #53133)
[Link] (8 responses)
'the answer is %(answer)s' % ({'answer': 42})
Posted Sep 10, 2015 14:45 UTC (Thu)
by corbet (editor, #1)
[Link] (7 responses)
Posted Sep 10, 2015 21:24 UTC (Thu)
by Fats (guest, #14882)
[Link] (5 responses)
Posted Sep 10, 2015 21:38 UTC (Thu)
by corbet (editor, #1)
[Link] (1 responses)
is still more verbose than:
The people working on this facility would seem to agree. But a couple of folks have said this now, so perhaps this feeling isn't universal?
Posted Sep 10, 2015 22:25 UTC (Thu)
by maney (subscriber, #12630)
[Link]
There should be one obviously correct way to do things. Except for string formatting, where three is okay, but four would be better.
Posted Sep 10, 2015 22:08 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
"Today is {} {}".format(month, date)
Now, let's add some localization:
"Heute ist {} {}".format(date, month)
See the difference? String interpolation is a very useful tool.
Posted Sep 11, 2015 21:01 UTC (Fri)
by mgedmin (subscriber, #34497)
[Link] (1 responses)
_("Today is {day} {month}").format(...)
works. How will you translate f-strings?
Posted Sep 11, 2015 21:49 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Actually, a couple of years back, I had a helper that used "locals()" method to get all the in-scope variables for the format() call.
Posted Sep 10, 2015 23:36 UTC (Thu)
by vapier (guest, #15768)
[Link]
d = {'answer': 1}
the first form has to be:
'The answer = {answer!d}'.format(**answer)
and the python team themselves document them in completely different sections:
https://docs.python.org/2/library/stdtypes.html#string-fo...
Posted Sep 10, 2015 16:07 UTC (Thu)
by talex (guest, #19139)
[Link]
def name := "O'No! $@?"
Here, the template is first parsed as an SQL prepared statement, then the arguments are filled in. There's nothing special about the "sql" prefix. The E parser expands this to:
sql__quasiParser.valueMaker("INSERT INTO users (userName, karma) VALUES (${0}, 0)").substitute([name])
sql__quasiParser is an ordinary object (in this case, the result of opening a database connection). By creating objects with such names, any prefix can be defined. Similar parsers can be used for XML, regular expressions, JSON, etc.
There's no risk of using plain string interpolation by mistake, because something that wants XML will expect an XML element, not a string.
See: http://wiki.erights.org/wiki/Walnut/Ordinary_Programming/...
Posted Sep 10, 2015 17:04 UTC (Thu)
by dashesy (guest, #74652)
[Link] (2 responses)
Posted Sep 22, 2015 10:42 UTC (Tue)
by MKesper (subscriber, #38539)
[Link] (1 responses)
Posted Sep 22, 2015 17:14 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Sep 10, 2015 17:28 UTC (Thu)
by josh (subscriber, #17465)
[Link]
.format() doesn't need to be any more verbose than the % operator, if you don't use names:
'The answer = {}'.format(42)
Posted Sep 10, 2015 22:42 UTC (Thu)
by flussence (guest, #85566)
[Link]
(«i""» sounds like a 1:1 copy of Javascript's template strings, FWIW.)
Posted Sep 11, 2015 11:53 UTC (Fri)
by jezuch (subscriber, #52988)
[Link] (1 responses)
Groan. It's not fixing the insecurity, it's delaying the insecurity. Either use proper placeholder system (whatever it is in your language/library) or go back to PHP :)
Posted Sep 14, 2015 8:38 UTC (Mon)
by epa (subscriber, #39769)
[Link]
Posted Sep 17, 2015 7:59 UTC (Thu)
by Rynor (guest, #85400)
[Link]
"Explicit is better than implicit."
"There should be one-- and preferably only one --obvious way to do it."
And finally..
Easier Python string formatting
I had just grouped that in as a use of the % operator — didn't want to get into all of the details of it. Especially if you start to get into "%locals()" and other dirty tricks like that...
Easier Python string formatting
But your .format() example then really looks as verbose on purpose.
Easier Python string formatting
You could also do:
'The answer is {}'.format(42)
Whatever. I would posit that
Easier Python string formatting
'The answer is {}'.format(answer)
f'The answer is {answer}'
Easier Python string formatting
Easier Python string formatting
Easier Python string formatting
Easier Python string formatting
Easier Python string formatting
'The answer = {answer}'.format(**answer)
'The answer = %(answer)d' % answer
https://docs.python.org/2/library/string.html#format-stri...
Easier Python string formatting
sql`INSERT INTO users (userName, karma) VALUES ($name, 0)`
There are already plenitude of excellent template magic in Python, like Jinja and django has its own. Why is the language bothering with providing an (unavoidably) inferior solution to what is just a `pip install` away.
The best way to do `foo()` in Python is this:
Easier Python string formatting
pip install foo_bar
then
from foo_bar import foo
foo()
As if it is not already annoying in one code-base to see strings with single(`), or double(") or tiple(""") quotes (I know PEP discourages some forms), along with b or r or u and now f too! Scattered with `.decode(..)` and `.encode(..)` clutter to make Py3 happy.
Easier Python string formatting
Easier Python string formatting
Easier Python string formatting
Easier Python string formatting
Easier Python string formatting
>
> There is a key difference, though: while f-strings produce a formatted string immediately on execution, i-strings delay that formatting. An explicit call to a format function is required to do the job.
Easier Python string formatting
Easier Python string formatting
I'd say .format() already handles all the required functionality in a concise way.
Using variables from the local and global scope is just asking for unclear code.
As said before, there are already too many alternatives. Why add another that just duplicates functionality?
"Special cases aren't special enough to break the rules."