A literal string type for Python

Posted Apr 14, 2022 16:31 UTC (Thu) by tialaramex (subscriber, #21167)
In reply to: A literal string type for Python by milesrout
Parent article: A literal string type for Python

I fear that the "it even works if you've actually done stuff with the strings" using methods we think are safe - undoes too much of the initial value of requiring literals. I understand entirely why they did it, but my instinct is that they've unlocked far too much here. Mechanically it's obviously possible to use the capabilities marked "safe" to produce arbitrary LiteralStrings at runtime, at which point these are clearly not literal strings and I think in the sort of "Oops, I am not really a programmer" code where this safety was most necessary, that's more rather than less likely to be accessible to an attacker.

"EAT BABIES" // clearly expresses your intent, it's your fault, you wrote that.

"EAT" + " BABIES" // I can see why they felt like they should make this work, they've cited examples that do this and it genuinely is still clear at this point what your intent was, although I think it should be discouraged anyway.

doComplicatedStuffBasedOnUserInput("DO NOT EAT BABIES", input) // this still type checks as LiteralString, and might be EAT BABIES yet we can hardly claim now that we're reflecting clear programmer intent when that happens.

The reason to want literals here rather than allowing arbitrary strings is to get closer to requiring intent. I'd rather give up the second example than, as this PEP does allow the third example opportunity to set fire to everything and pretend that's "safe".

Rust of necessity has to require actual literals in formatting (not merely constant strings) because the formatting work is done via the macro system, and the macro system can't see inside variables. But I think even though more sophisticated behaviour would be welcomed by many Rust programmers I personally prefer the literal requirement.

A literal string type for Python

Posted Apr 14, 2022 16:45 UTC (Thu) by mb (subscriber, #50428) [Link] (8 responses)

Why do you think your third example is unsafe?

A literal string type for Python

Posted Apr 15, 2022 20:28 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (7 responses)

Because the resulting string can be EAT BABIES, and it's entirely possible that the programmer did not anticipate the circumstances which allow that? If we're OK with that, then this entire exercise was futile, as we could have also blessed arbitrary strings.

A literal string type for Python

Posted Apr 16, 2022 8:56 UTC (Sat) by mb (subscriber, #50428) [Link] (6 responses)

The issue literal strings are trying to solve is user inputs being pasted into places where a hardcoded string is expected. That's to ensure that the program user cannot get arbitrary control over these strings.

It's not supposed to prevent the programmer from hardcoding the wrong string.

A literal string type for Python

Posted Apr 16, 2022 21:28 UTC (Sat) by tialaramex (subscriber, #21167) [Link] (5 responses)

And that's why I have a problem with:

doComplicatedStuffBasedOnUserInput("DO NOT EAT BABIES", input)

"DO NOT EAT BABIES" is blessed as a LiteralString because it is. No problem so far. But "cleverly" this proposal allows operations (such as truncation, concatenation, duplication and splitting) on LiteralString to produce a LiteralString, and so if doComplicatedStuffBasedOnUserInput has a bug, as it may well do, it can end up producing quite unexpected results, such as "EAT BABIES" and yet they're blessed as LiteralString anyway via this rationale.

Thus, the program user in fact gets arbitrary control over these strings in at least some cases, whereas that's definitively not the situation in languages where there's an actual literal string type. In exchange, Python gets to write "WO" + "RDS" and have that be a LiteralString whereas in the other languages it is not. I think that's a bad trade, despite being very clever.

A literal string type for Python

Posted Apr 19, 2022 4:58 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (4 responses)

Looking through Appendix C of the PEP (which lists the operations supported), I see removeprefix/removesuffix, but not slicing (__getitem__), so you would have to write something like s.removeprefix("DO NOT ") to get the outcome which you describe (i.e. you *can't* write s[:7] or something like that). If you explicitly write removeprefix("DO NOT "), then IMHO it's your own damn fault for removing a prefix which you apparently wanted to keep.

A literal string type for Python

Posted Apr 19, 2022 11:06 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (3 responses)

> you *can't* write s[:7] or something like that

If there were LiteralNumber, one might be able to do that, but without, there's no difference between a literal 7 and a 7 coming in from "the outside" through a variable. Though there are a number of other methods that take SupportsIndex that might now be suspicious to me…

A literal string type for Python

Posted Apr 19, 2022 15:23 UTC (Tue) by gbleaney (guest, #158077) [Link] (2 responses)

PEP author here. Appendix B provides a trivial function for turning any regular external string into a 'LiteralString':
https://peps.python.org/pep-0675/#appendix-b-limitations

If a developer want to circumvent the protections of 'LiteralString', they can easily do it. They don't even need fancy functions like the example we gave, they can just add a '# pyre-ignore' (or equivalent lint suppression comment for their typechecker of choice). The goal is to protect against accidental mistakes, not malicious or implausible behaviour by developers.

A literal string type for Python

Posted Apr 24, 2022 13:39 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (1 responses)

I guess my problem is that I'm less confident the problematic case is "implausible".

If I'm correct the proof of course would likely arrive too late. ie, this PEP succeeds, everybody gets used to the behaviour as documented, and then a hole is found in some code, say, a popular Django app, where users can manipulate a LiteralString so as to cause mischief. I'm certain that the instinct will be to blame the app programmer, but of course that's missing the whole point of these protections, programmers are human and as such lack foresight.

To be quite fair, the other way forward can also be dangerous. In C++ for example std::format() resolutely insists on a constant format string, so that's pretty safe (it needn't be a literal, but it can't be sensitive to user input as that's not constant), but it necessitates providing std::vformat() which does not take a constant format string, and so programmers may be tempted to call std::vformat() rather than re-factor some code to ensure the format strings are actually constant... Defensive programming is possible, maybe even encouraged, but it's probably easier to do the Wrong Thing™ in many cases than it should be.

A literal string type for Python

Posted Apr 25, 2022 7:50 UTC (Mon) by farnz (subscriber, #17727) [Link]

To a large extent, though, these sound like the same problem as unsafe in Rust; sure, I can wrap all sorts of crawling horrors in unsafe, and have a Safe Rust API on top so that when you look at my crate's documentation, it's not obvious that I've done this.

And similar to Unsafe Rust, the answer is tool-assisted review of code you're planning to use that highlights the areas of code that need extra attention - just as a Rust-aware review system calls out unsafe wherever it appears for extra human attention, so a Python-aware review system needs to call out manipulation of LiteralString that results in a LiteralString typed output for extra human attention.