Custom string formatters in Python

By Daroc Alden
August 16, 2024

Python has had formatted string literals (f-strings), a syntactic shorthand for building strings, since 2015. Recently, Jim Baker, Guido van Rossum, and Paul Everitt have proposed PEP 750 ("Tag Strings For Writing Domain-Specific Languages") which would generalize and expand that mechanism to provide Python library writers with additional flexibility. Reactions to the proposed change were somewhat positive, although there was a good deal of discussion of (and opposition to) the PEP's inclusion of lazy evaluation of template parameters.

The proposal

In Python (since version 3.6), programmers can write f-strings to easily interpolate values into strings:

    name = "world"
    print(f"Hello, {name}") # Prints "Hello, world"

This is an improvement on the previous methods for string interpolation, because it makes it clear exactly what is being inserted into the string in each location. F-strings do still have some drawbacks, though. In particular, since the expressions inside braces are evaluated when the string is evaluated, they're not suitable for more complex templating. They also make it easier to write some kinds of security bugs — it's tempting to use them to make SQL queries, even though doing so can make code more susceptible to SQL-injection attacks. The PEP aims to fix both of these, by allowing people to use arbitrary functions as string "tags", taking the place of the "f" in f-strings. For example, it would be possible to write a safe sql() function that could be invoked like this:

    name = "O'Henry"
    # Calls sql().
    # The function inserts 'name' the query properly escaped.
    query = sql"SELECT * FROM Users WHERE name = {name}"
    print(query)
    # Prints "SELECT * FROM Users WHERE name = 'O''Henry'".

Other examples of potential uses include automatically escaping strings in the correct way for other formats (shell commands, URLs, regular expressions, etc.), building custom DSLs that can be embedded in Python programs, or partially replacing the use of templating libraries like Jinja.

The proposed syntax works by calling any function used as a string tag with a sequence of arguments representing fragments of the string and interpolation sites. These are values that implement new Decoded and Interpolation protocols for string components and interpolations, respectively. In the example above, sql() would be called with two arguments: one for the first part of the string, and then a second for the interpolation itself. The function is then free to interpret these values in whatever way it likes, and return an arbitrary object (such as, for example, a compiled regular expression). In particular, the expressions inside braces in the affected string aren't evaluated ahead of time, but are instead evaluated when the function calls the .getvalue() method of an Interpolation object — which the function only needs to call if it needs the value. Interpolation objects also include the original expression as a string, and the optional conversion function or format specification if they were provided.

    def example(*args):
        # This example assumes it will receive a tag string
        # that starts with a string and contains exactly one
        # interpolation
        string_part = args[0] # "Value: "
        interpolation = args[1] # 2 + 3

        # It can reference the original text, and ask for the value
        # to be computed.
        return f"{string_part}{interpolation.expr} = {interpolation.getvalue()}"

    print(example"Value: {2 + 3}")
    # Prints "Value: 2 + 3 = 5"

This does, however, lead to some surprising outcomes, such as making it possible to write code that depends on the assignment of a variable after the place where the tag string is defined. This example shows how that could work, as well as demonstrating the PEP's recommended method to deal with a sequence of Decoded and Interpolation values with a match statement:

    class Delayed:
        def __init__(self, *args):
            self.args = args
        def __str__(self):
            result = ""
            for a in self.args:
                match a:
                    case Decoded() as decoded:
                        result += decoded
                    case Interpolation() as interpolation:
                        result += str(interpolation.getvalue())
            return result

    name = 'Alice'
    fstring = f'My name is {name}' # Always Alice
    delayed = Delayed'My name is {name}'
    for name in ['Bob', 'Charlie']:
        print(delayed)
        # Prints 'My name is Bob' and 'My name is Charlie'

The PEP describes this behavior as lazy evaluation — although unlike true lazy evaluation in languages like Haskell, the library author does need to explicitly ask for the value to be calculated. Despite the potentially unintuitive consequences, lazy evaluation is a feature that the PEP's authors definitely want to see included in the final version, because of the additional flexibility it allows. Specifically, since a tag function could call .getvalue() zero times, one time, or multiple times, a library author could come up with clever uses that aren't possible with an f-string.

Discussing naming

The discussion of the change started with Steve Dower expressing concern that this might cause people to fill up the namespace with short function names, which would be a problem for readability. Dower suggested that perhaps the syntax should be changed to have a single generic string tag that converts a format string into a piece of structured data that could be operated on by a normal function. "It's not quite as syntactic sugar, but it's more extensible and safer to use."

Everitt clarified that the PEP does not propose adding any tag functions to the built-ins or standard library, but acknowledged the point. Baker pointed out that there was no syntactical reason that tag names could not contain dots, which would help with the namespace-pollution problem, though that might be undesirable for other reasons.

If dotted names are going to be allowed, "we should just allow any expression", Brandt Bucher said, noting that the same evolution has already occurred for decorators. Unfortunately, that's not possible, Pablo Galindo Salgado explained, because the Python lexer would have no way to know whether it should switch into the f-string parsing mode (which would be used for all tag strings) without information from the parser.

The current syntax does invite some appealing syntactic sugar, however. Paul Moore gave this example, of a "decimal literal" (for the arbitrary-precision arithmetic decimal library), a frequently requested feature:

    from decimal import Decimal as D
    dec_num = D"2.71828"

Josh Cannon pointed out that the syntax means Python will never get another string prefix, implying that it would make any such changes backward-incompatible. Matt Wozniski agreed, and stated the drawback explicitly. Wozniski later said that Python's existing string prefixes cannot all be implemented in terms of the proposed tag strings.

Discussing semantics

Eric Traut was the first to raise concerns with the lazy evaluation of expressions being interpolated, pointing out that it could potentially lead to surprising and unintuitive consequences. For example, the Delayed class shown above. Traut suggested asking programmers to explicitly write lambda: when they want lazy evaluation:

    tag"Evaluated immediately: {2 + 3}; Not evaluated immediately: {lambda: name}"

Charlie Marsh agreed with Traut, that introducing lazy evaluation would be unintuitive, given that it's a change from how f-strings work. Several people objected to the idea of requiring users to opt into lazy evaluation, though, especially since the suggestion of just using lambda expressions would be fairly verbose. Cornelius Krupp pointed out that the intended use case was to define domain-specific languages — so the expectation is that specific uses will vary, and it's up to callers of a particular tag function to read its documentation, just like with any library function. Moore went so far as to say:

I'll be honest, I haven't really gone through the proposal trying to come up with real world use cases, and as a result I don't yet have a feel for how much value lazy evaluation adds, but I am convinced that no-one will ever use lambda as "explicit lazy evaluation".

Another user asked how lazy evaluation interacts with other Python features such as await and yield — which surprisingly worked inside f-strings without a problem when I tried them. Nobody had a compelling answer, although Dower pointed out another particularly pernicious problem for lazy evaluation: how should it interact with context managers (such as locks)? Allowing unmarked lazy evaluation could make for difficult-to-debug problems. Finally, interactions with type checking are a concern, since lazy evaluation interacts with scoping.

David Lord, a Jinja maintainer, questioned the entire justification for lazy evaluation. Several times during the discussion, people asserted that lazy evaluation was necessary for implementing template languages like Jinja, but Lord didn't think tag strings were actually sufficient for templating. Jinja templates are often rendered multiple times with different values, which lazy evaluation doesn't make easy — the variables at the definition site of the template would need to be changed. Plus, Jinja already has to parse the limited subset of Python that can be used in its templates; if it has to keep parsing the templates in order to handle variable substitutions, lazy evaluation doesn't help with that. Moore agreed that the proposal would not be a suitable replacement for Jinja as it stands, and that he wished "the PEP had some better examples of actual use cases, though."

Everitt shared that the PEP authors chose to keep the more complex examples out of the PEP for simplicity, but "I think we chose wrong." He provided a link to a lengthy tutorial on building an HTML template system using the proposed functionality. Baker added that being able to start sending the start of a template before fully evaluating its latter sections was an important use case for web-server gateway interface (WSGI) or asynchronous-server gateway interface (ASGI) applications. He also said that delayed evaluation let logging libraries easily do the right thing by only taking the time to evaluate the interpolations if the message would actually be logged.

Discussing backticks

Trey Hunner saw how useful the change could be, but, as someone who teaches Python to new users, he was worried that the syntax would be hard to explain or search for.

If backticks or another symbol were used instead of quotes, I could imagine beginners searching for 'Python backtick syntax'. But the current syntax doesn't have a simple answer to 'what should I type into Google/DDG/etc. to look this up'.

The suggestion to use backticks was interesting, Everitt said, noting that JavaScript uses backticks for similar features. Baker said that they had considered backticks, but thought that matching the existing f-string syntax would be more straightforward. Simon Saint-André also said that backticks can be hard to type on non-English keyboards. Dower noted that backticks have traditionally been banned from Python feature proposals due to the difficulty in distinguishing ` from '.

Discussing history

Alyssa Coghlan shared a detailed comparison between this proposal and her work on PEP 501 ("General purpose template string literals"). In short, Coghlan is not convinced that tag strings offer significantly more flexibility than template string literals as proposed in PEP 501, but she does acknowledge that the PEP 750 variety has a more lightweight syntax. Other participants in the discussion were not convinced that simpler syntax was worth the tradeoff. Coghlan has started collecting ideas from the discussion that might apply equally well to PEP 501.

Barry Warsaw, who worked on early Python internationalization efforts, likewise compared the new PEP to existing practice. He pointed out a few ways that tag strings were unsuitable for internationalization, before saying:

I have to also admit to cringing at the syntax, and worry deeply about the cognitive load this will add to readability . I think it will also be difficult to reason about such code. I don't think the PEP addresses this sufficiently in the "How to Teach This" section. And I think history bears out my suspicion that you won't be able to ignore this new syntax until you need it, because if it's possible to do, a lot of code will find clever new ways to use it and the syntax will creep into a lot of code we all end up having to read.

Baker summarized the most important feedback into a set of possible changes to the proposal, including adopting some ideas from PEP 501, and using decorators to make it easier to write tag functions that typecheckers can understand. Coghlan approved of the changes, although at time of writing other commenters have not weighed in. It remains to be seen how the PEP will evolve from here.

Index entries for this article
Python	Python Enhancement Proposals (PEP)/PEP 750
Python	Strings

Forwards incompatibility is a real problem

Posted Aug 16, 2024 16:15 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (22 responses)

> Josh Cannon pointed out that the syntax means Python will never get another string prefix, implying that it would make any such changes backward-incompatible. Matt Wozniski agreed, and stated the drawback explicitly. Wozniski later said that Python's existing string prefixes cannot all be implemented in terms of the proposed tag strings.

This strikes me as a huge problem. You cannot just close off an entire avenue of syntax expansions without a very good reason. This PEP strikes me as a mediocre reason at best.

If you want delayed-evaluation strings, you can just give them a dedicated prefix (e.g. d"{foo}" evaluates foo when you str() it, and f"{foo}" continues to eagerly evaluate foo as it does now), and then you don't need a whole tag protocol. But you can get most of the way there with lambda: f"{foo}" already (you can't get all the way there, because the PEP proposes using a nonstandard scoping rule, but that's neither here nor there).

The SQL example is unconvincing, because nobody should be escaping SQL like that - you should be passing the unformatted string all the way through to the database engine as a prepared statement. For HTML, you probably have to use something like bleach (preferably, something which is not deprecated, but I'm unfamiliar with this space), which is way too complicated to act as a tag function (you need options to specify allowlists etc.). And so it goes.

Forwards incompatibility is a real problem

Posted Aug 16, 2024 16:34 UTC (Fri) by kleptog (subscriber, #1183) [Link] (9 responses)

> The SQL example is unconvincing, because nobody should be escaping SQL like that - you should be passing the unformatted string all the way through to the database engine as a prepared statement.

It could do that. The formatter could replace all the {foo} in the SQL string with a placeholder, and capture the values in a separate list. Then the result can be sent as a single object sql string+parameters to the database engine.

Though I feel that the lazy evaluation would be a footgun here, I wouldn't expect it in the majority of cases.

Forwards incompatibility is a real problem

Posted Aug 16, 2024 17:01 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (3 responses)

And then somebody accidentally writes f instead of sql in one place, and your entire application is suddenly vulnerable to 2003-era SQL injection.

Just Say No.

Forwards incompatibility is a real problem

Posted Aug 16, 2024 19:11 UTC (Fri) by nhaehnle (subscriber, #114772) [Link]

I haven't got a fully formed opinion on this or today, but your objection is the kind of thing a type system can guard against. Require that the value passed into your execute function is of the type that sql"..." produces (which doesn't have to be a plain string), and the risk is eliminated.

Forwards incompatibility is a real problem

Posted Aug 18, 2024 8:01 UTC (Sun) by intelfx (subscriber, #130118) [Link]

> somebody accidentally writes f instead of sql in one place, and your entire application is suddenly vulnerable to 2003-era SQL injection

And… how is that different to what exists today?

If you want to guard against X, then you guard against X, either at the type system level or in runtime or both.

You clearly have an axe to grind with this PEP, but the counterarguments are unconvincing.

Forwards incompatibility is a real problem

Posted Aug 18, 2024 9:41 UTC (Sun) by xecycle (subscriber, #140261) [Link]

In addition to type guards, if the interpolated variable is a string, then it'll likely fail on ordinary input, unless it is carefully written to be sql"... '{var}'". Yes stupid mistakes can always happen, but I think this one is not worse than existing patterns.

Forwards incompatibility is a real problem

Posted Aug 16, 2024 19:14 UTC (Fri) by yeltsin (guest, #171611) [Link]

This is precisely what .NET Entity Framework does:

https://learn.microsoft.com/en-us/ef/core/querying/sql-qu...

Forwards incompatibility is a real problem

Posted Aug 17, 2024 18:44 UTC (Sat) by mokki (subscriber, #33200) [Link] (3 responses)

Wouldn't the sql formatter have to parse the whole SQL to know how and if it should escape the input values.
For example:
query = sql"""SELECT id,{column} FROM "User_{userGroup}" WHERE name = {name}"""

Where column might need " quotes added around itself, the userGroup must not have any quoting or escaping, and name should have single quotes.

Not an expected use case

Posted Aug 18, 2024 7:20 UTC (Sun) by kleptog (subscriber, #1183) [Link] (2 responses)

I figure that usage to be completely unsupported. I don't know of any database engine that supports parameterisation of table or column names. Generally parameterised queries are supposed to be used with executemany() and database engines generally assume that actual structure of the query will be the same for each combination of input parameters. Otherwise there's no performance benefit.

I'm not really convinced this is a good use case, because it's only really helpful for trivial queries (using ? instead of {foo} isn't that hard) and for more complicated cases it doesn't help at all. At some point you need some way more capable like SQLAlchemy.

Not an expected use case

Posted Aug 18, 2024 17:13 UTC (Sun) by mokki (subscriber, #33200) [Link] (1 responses)

Yes, databsaes do not support such syntax, but if this is the replacement for f strings, which do support such cases. And most likely have been used to add placeholder table names. And f strings work, as long as one manually escapes the input parameters correctly.
So if the sql syntax looks like f stings that work with database sql (with sql injection risks), but work differently, depending on which sql driver provides the sql string interpolator and if the sql interpolator uses DB placeholders or just escapes in the client, the users will most likely be very confused.

Not an expected use case

Posted Aug 20, 2024 17:31 UTC (Tue) by iabervon (subscriber, #722) [Link]

The right way to think about this format is like a parameterized statement, but with newer-style, nicer syntax. That is:

cursor.execute(sql"SELECT id FROM table WHERE name={name};")

is the same thing as:

cursor.execute("SELECT id FROM table WHERE name=%s;", name)

and it has the same differences from f-strings that writing "SELECT id, %s FROM table;" % ("column",) does from giving cursor.execute() a string with %s still in it.

Forwards incompatibility is a real problem

Posted Aug 16, 2024 16:36 UTC (Fri) by daroc (editor, #160859) [Link] (6 responses)

I originally had an SQL example that produced a prepared statement, not a string, but the example code was a lot less clear, and we worried about it obscuring the main point. In any case, delaying the evaluation of the whole string is not quite as flexible. I believe the PEP would make it possible to write mini-DSLs with actual loops in them, like so:

    dsl"""
        <ul>
        {for x in list:}
            <li>{x}</li>
        {endfor}
        </ul>
    """

Of course, whether that flexibility will be useful, I can't say. But that's why simply delaying evaluation of the whole string is not a replacement for the PEP's authors' use case.

Forwards incompatibility is a real problem

Posted Aug 16, 2024 16:56 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (4 responses)

Oh, good, now I have a very concrete reason to hate this PEP.

If you want to parse a string into a Turing-complete language, you should not be using something as implicit as a tag function. You should be using the parsing functions in https://docs.python.org/3/library/string.html to explicitly pull it apart, and then write your own normal Python code to implement an interpreter loop. That is frankly not very complicated, and not at all deserving of syntactic sugar.

Forwards incompatibility is a real problem

Posted Aug 18, 2024 8:03 UTC (Sun) by intelfx (subscriber, #130118) [Link] (3 responses)

So you are proposing to throw away proposed shared common infrastructure and hand-roll a parser… for precisely what?

Forwards incompatibility is a real problem

Posted Aug 18, 2024 15:56 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (2 responses)

I just linked to the parser that already exists in the standard library. I am not at all proposing to remove that parser. I'm simply saying it's already got all of the required functionality and does not need to also have a layer of syntactic sugar on top.

Forwards incompatibility is a real problem

Posted Aug 19, 2024 0:09 UTC (Mon) by intelfx (subscriber, #130118) [Link] (1 responses)

Really? When did string module become a parser? Sorry, this has just crossed the line from "unconvincing" to "caricature".

Forwards incompatibility is a real problem

Posted Aug 19, 2024 3:56 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

The Formatter class exposes methods that may be overridden to implement custom formatting logic. So it is a parser for format strings.

> Sorry, this has just crossed the line from "unconvincing" to "caricature".

Could you please do me a favor, and never post comments like this in any FOSS community ever again?

Forwards incompatibility is a real problem

Posted Aug 16, 2024 18:33 UTC (Fri) by npws (subscriber, #168248) [Link]

This is beginning to look a lot like PHP.

Forwards incompatibility is a real problem

Posted Aug 19, 2024 7:42 UTC (Mon) by siim@p6drad-teel.net (subscriber, #72030) [Link] (4 responses)

And perhaps a variant to capture the format string into a generic structure that you can later use normal functions to operate?

Forwards incompatibility is a real problem

Posted Aug 19, 2024 10:11 UTC (Mon) by daroc (editor, #160859) [Link] (3 responses)

That's almost exactly what PEP 501 ("General purpose template literal strings") provides.

Forwards incompatibility is a real problem

Posted Aug 19, 2024 11:19 UTC (Mon) by kleptog (subscriber, #1183) [Link] (2 responses)

Frankly, this seems like a much better approach all round. I really don't see the benefit of:

runquery(sql"SELECT {column} FROM {table} WHERE column={value};")

over

runquery(sql(t"SELECT {column} FROM {table} WHERE column={value};"))

while the latter feels like it requires less mental load.

Forwards incompatibility is a real problem

Posted Aug 19, 2024 11:57 UTC (Mon) by mb (subscriber, #50428) [Link] (1 responses)

What is t""?

Forwards incompatibility is a real problem

Posted Aug 19, 2024 21:36 UTC (Mon) by edgewood (subscriber, #1123) [Link]

Given that kleptog was replying to a comment referencing PEP 501, I assumed it is syntax that had been proposed in that PEP. And indeed it is.

String formatters that produce more than strings

Posted Aug 16, 2024 16:18 UTC (Fri) by jak90 (subscriber, #123821) [Link] (1 responses)

I wish the Python community all the best in figuring out the optimum scope for this problem. Java has string templates as a preview feature per JEP 430 and following (soon landing as a stable feature according to JEP 465) that not only could produce interpolated strings, but e.g. a PreparedStatement for a given SQL connection with the right template processor.

String formatters that produce more than strings

Posted Aug 18, 2024 5:42 UTC (Sun) by alonz (subscriber, #815) [Link]

You're a bit outdated... This JEP was actually withdrawn (JEP 465, issue 8323333).

The issue includes a very interesting discussion of the pros/cons of the proposed syntax, and a proposed way forward that is much more lightweight (and is, incidentally, much more similar to PEP 501).

Non-standard string literals

Posted Aug 18, 2024 11:00 UTC (Sun) by thomas.poulsen (subscriber, #22480) [Link] (1 responses)

The syntax: cmd”some string” looks similar to non-standard string literals in julia: https://docs.julialang.org/en/v1/manual/metaprogramming/#...

Is it also used in other languages?

Non-standard string literals

Posted Aug 18, 2024 15:54 UTC (Sun) by geofft (subscriber, #59789) [Link]

Yes, it's used in JavaScript (as mentioned in the PEP) with backticks. Try this in your browser console:

    function myinterpolate(segments, ...values) {
      return JSON.stringify({"segments": segments, "values": values});
    }
    
    myinterpolate`two plus two is ${2+2}!`

which prints

    '{"segments":["two plus two is ","!"],"values":[4]}'

SQL Injection

Posted Aug 19, 2024 15:11 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link] (12 responses)

SQL injection is what you get when people who really don't get input validation believe their only programming problem is that C is a bad language. Please stop reinventing the square wheel and use parametrized queries instead.

SQL Injection

Posted Aug 21, 2024 5:16 UTC (Wed) by xecycle (subscriber, #140261) [Link] (11 responses)

I have an aggressive opinion on this: it is SQL, the text-based language, the ultimate root cause of all these problems. Had we standardized on an AST spec and had programmers construct the tree using idiomatic approaches in each language, injection would only be a problem if structure of that AST depends on dynamic input, which is not an easy task. SQL is the square wheel, we've built numerous weird roads for it, and we are still talking about how to design a curve that makes this wheel run smooth, in case it's not even a perfect square.

Well, an AST approach is just my random thought and I'm not really sure about it; but I believe sending program text over the wire is the ultimate cause of all this harm.

SQL Injection

Posted Aug 21, 2024 7:50 UTC (Wed) by farnz (subscriber, #17727) [Link] (10 responses)

But SQL itself is not a text-based language; it's a syntax tree with a standard textual serialization form. The issue you're describing is people constructing the syntax tree manually, because languages don't provide good tooling for building ASTs, but do provide good tooling for manipulating text.

Fundamentally, the issue is that we're working at the level of serialized data forms, because that's simpler than writing things to manipulate an AST and then serialize it; and if SQL didn't have a standardized textual form, we'd just come up with per-database textual serializations of the AST; after all, we don't manipulate other forms of code as ASTs even though they're also syntax trees with standard textual serialization forms, we tend to manipulate as text.

SQL Injection

Posted Aug 21, 2024 11:11 UTC (Wed) by Wol (subscriber, #4433) [Link] (8 responses)

My problem with SQL is just that it blows the human mind.

When you do an exam paper at school, you basically do it mini-proof by mini-proof, each building on the previous proofs. Like a series of sequential statements in your classical programming language.

SQL is just one huge proof, that is very difficult to break down into simple chunks, hence it's extremely easy to make mistakes. I guess I'm probably just rewording that comment about "AST tree", just from a different view-point. But that's personally why I hate SQL - it's just far too easy to screw up ...

Cheers,
Wol

SQL as a syntax tree

Posted Aug 21, 2024 11:31 UTC (Wed) by farnz (subscriber, #17727) [Link] (7 responses)

There's a deeper issue there, though - SQL is "just" a representation of relational algebra, and we know (as in have mathematical proofs for) how to mechanically translate any other query description to relational algebra in a form that executes at least as fast on a relational database as it would on a database designed around the query description language you chose. Now, there may be a relational query that's faster than this and gets the same results, but we can do that mechanical translation.

You may notice a parallel here - we have the same relationship between programming languages and the machines they run on, where we know how to translate (e.g.) C source to AArch64 machine code such that running the resulting output on an AArch64 processor is as fast as "running" the C source on a direct implementation of the C abstract machine. You do drop down from C to AArch64 machine code manually in a few places, but by and large, we don't write assembly that much, instead just targeting the places where the performance gain is critical, or where we can't do the thing we want to do in C.

But for SQL, we insist on writing SQL by hand, even when we'd be better off relying on a higher-level layer that translates from some other query format to relational. And that's a bigger deal than whether we write our low-level query in a textual serialization format, or as a syntax tree that our code serializes.

SQL as a syntax tree

Posted Aug 22, 2024 8:42 UTC (Thu) by taladar (subscriber, #68407) [Link] (4 responses)

Most of the "higher level query" layers we have seen historically have been ORMs and similar extremely inefficient and wasteful layers that lack expressiveness though.

Attitudes to higher level query layers

Posted Aug 22, 2024 9:21 UTC (Thu) by farnz (subscriber, #17727) [Link] (3 responses)

TBF, that used to be what people said about higher level programming languages than macro assemblers, too. The difference is that we improved the implementations of higher level programming languages, whereas with databases, we've just said "fall back to raw SQL" even though a better implementation could do a lot better (as, for example, LINQ shows in the Microsoft world).

Attitudes to higher level query layers

Posted Aug 23, 2024 12:09 UTC (Fri) by Wol (subscriber, #4433) [Link] (2 responses)

The thing is, SQL was (and still is) designed to query *ANY* datasource, from flat files upwards.

As such, a large chunk of functionality that belongs in the database, is encoded in SQL instead. So SQL is massively OVER-designed.

As a corollary, because all this functionality is in SQL, relational databases (where this functionality *should* be) don't bother to include it.

To repeat - that's why, when you run a query language DESIGNED for Fourth Normal Form, over a database that IMPLEMENTS Fourth Normal Form, it will knock seven bells out of SQL/Relational.

Cheers,
Wol

Attitudes to higher level query layers

Posted Aug 23, 2024 12:54 UTC (Fri) by kleptog (subscriber, #1183) [Link] (1 responses)

> As such, a large chunk of functionality that belongs in the database, is encoded in SQL instead. So SQL is massively OVER-designed.

That's not a meaningful distinction. The SQL dialect a database supports by definition maps 1-to-1 on the features of the database. The first thing you need to realise is that "SQL" as a language is not something that anyone actually uses, because everyone has their own dialect. It just specifies a (barely useful) minimum.

This is especially noticeable if you're used to something like PostgreSQL and then are required to use MS SQL. So much stuff you took for granted just doesn't work.

Attitudes to higher level query layers

Posted Aug 23, 2024 17:18 UTC (Fri) by Wol (subscriber, #4433) [Link]

> It just specifies a (barely useful) minimum.

Does that include things like "join" (apparently one of the most expensive relational operations)?

Pick query language has no (well pretty much no) concept of join, it belongs in the database schema itself. (I say "pretty much" because the query language can call into database functions.)

In fact an awful lot of what SQL considers an "absolute minimum" probably belongs in the Pick schema.

Cheers,
Wol

SQL as a syntax tree

Posted Aug 22, 2024 11:24 UTC (Thu) by atnot (guest, #124910) [Link] (1 responses)

> SQL is "just" a representation of relational algebra

This is a lot less true than people say it is for any modern implementation of SQL, see:
https://www.scattered-thoughts.net/writing/unexplanations...

> This idea is particularly sticky because it was more or less true 50 years ago, and it's a passable mental model to use when learning sql. But it's an inadequate mental model for building new sql frontends, designing new query languages, or writing tools likes ORMs that abstract over sql.

SQL as a syntax tree

Posted Aug 22, 2024 18:05 UTC (Thu) by kleptog (subscriber, #1183) [Link]

Especially once you add support for recursive CTEs, then SQL becomes Turing complete. [1]

[1] https://wiki.postgresql.org/wiki/Turing_Machine_(with_recursive)

SQL Injection

Posted Aug 21, 2024 15:20 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> we'd just come up with per-database textual serializations of the AST

We already have them. They're called "SQL dialects" :) You can't simply take a complicated query for MSSQL and run it on Postgres or Oracle. Even such details as quoting strings differ.

If you want to see what a true "AST-based SQL" could have looked like, then check LINQ in C#: https://learn.microsoft.com/en-us/dotnet/csharp/linq/

A Bittersweet Postmortem

Posted Jan 17, 2025 2:24 UTC (Fri) by zahlman (guest, #175387) [Link] (1 responses)

This article just came up as newly freed - and in the interim, the proposal has been changed substantially.

The current version proposes a single, common `t` prefix for template strings, which can then be processed by other arbitrary functions:

> Template strings are a generalization of f-strings, using a t in place of the f prefix. Instead of evaluating to str, t-strings evaluate to a new type... Templates provide developers with access to the string and its interpolated values *before* they are combined.

Additionally, interpolation values are captured at the time the Template instance is created, so processing code can access a `.value` attribute of the Interpolation instead of calling a method. (But, as before, the underlying text between the braces is also captured - so, in principle, a formatter could also explicitly look up names in a different namespace.)

This neatly solves all the problems highlighted in the original article here.

The Template can be passed around and re-interpolated as desired, and the act of interpolation is explicit. The values to use, by default, are bound early rather than being looked up at interpolation time - which makes it much clearer how it works (with late binding, there would be a choice between binding to the context where the Template was instantiated, versus the context where it's interpolated) and much simpler to implement.

There is no longer any worry about namespace pollution, since the formatters are ordinary code that can use ordinary techniques for namespace.

There is no longer any need to specify how to associate formatting functions with custom prefixes, nor to figure out support for the custom prefixes in the grammar. In particular, there's no need to figure out the scope of what can be used as a prefix. (I suppose it would have been possible to just add a grammar rule to match an expression immediately followed by a string literal. But that would interfere with implicit literal string concatenation, and raise questions about chaining formatters; and if it were intended to forbid whitespace between the expression and the literal - like how the `f` prefix currently works - then it would become a weird special case in the grammar.)

There is no longer any need to contemplate any special ways to request early or late binding (and Traut's suggested use of `lambda` doesn't actually make much sense here).

Finally, the new feature becomes at least as easy to research (and SEO-optimize about) as "f-strings", because now we can just have "t-strings" - there's always just that one prefix to search for.

----

The bittersweet part - all of this follows Dower's suggestion:

> Dower suggested that perhaps the syntax should be changed to have a single generic string tag that converts a format string into a piece of structured data that could be operated on by a normal function.

But it also produces a system that *works just like* what was proposed in PEP 501. It seems that, in broad strokes, Coghlan's design - from the era when f-strings were developed - was really the right way to do it all along. A shame it took over 9 years to figure this out.

A Bittersweet Postmortem

Posted Jan 17, 2025 3:09 UTC (Fri) by jake (editor, #205) [Link]

> This article just came up as newly freed

Just fyi, this article has been free to view since a week or so after its publication ... since August sometime ...

> and in the interim, the proposal has been changed substantially.

indeed, look for an article on said changes coming your way rather soon :)

very nice summary of the changes ... if you email lwn@lwn.net, we can arrange that you get a subscriber link when the article comes out :)

thanks,

jake