|
|
Subscribe / Log in / New account

A literal string type for Python

A literal string type for Python

Posted Apr 15, 2022 20:28 UTC (Fri) by tialaramex (subscriber, #21167)
In reply to: A literal string type for Python by mb
Parent article: A literal string type for Python

Because the resulting string can be EAT BABIES, and it's entirely possible that the programmer did not anticipate the circumstances which allow that? If we're OK with that, then this entire exercise was futile, as we could have also blessed arbitrary strings.


to post comments

A literal string type for Python

Posted Apr 16, 2022 8:56 UTC (Sat) by mb (subscriber, #50428) [Link] (6 responses)

The issue literal strings are trying to solve is user inputs being pasted into places where a hardcoded string is expected. That's to ensure that the program user cannot get arbitrary control over these strings.

It's not supposed to prevent the programmer from hardcoding the wrong string.

A literal string type for Python

Posted Apr 16, 2022 21:28 UTC (Sat) by tialaramex (subscriber, #21167) [Link] (5 responses)

And that's why I have a problem with:

doComplicatedStuffBasedOnUserInput("DO NOT EAT BABIES", input)

"DO NOT EAT BABIES" is blessed as a LiteralString because it is. No problem so far. But "cleverly" this proposal allows operations (such as truncation, concatenation, duplication and splitting) on LiteralString to produce a LiteralString, and so if doComplicatedStuffBasedOnUserInput has a bug, as it may well do, it can end up producing quite unexpected results, such as "EAT BABIES" and yet they're blessed as LiteralString anyway via this rationale.

Thus, the program user in fact gets arbitrary control over these strings in at least some cases, whereas that's definitively not the situation in languages where there's an actual literal string type. In exchange, Python gets to write "WO" + "RDS" and have that be a LiteralString whereas in the other languages it is not. I think that's a bad trade, despite being very clever.

A literal string type for Python

Posted Apr 19, 2022 4:58 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (4 responses)

Looking through Appendix C of the PEP (which lists the operations supported), I see removeprefix/removesuffix, but not slicing (__getitem__), so you would have to write something like s.removeprefix("DO NOT ") to get the outcome which you describe (i.e. you *can't* write s[:7] or something like that). If you explicitly write removeprefix("DO NOT "), then IMHO it's your own damn fault for removing a prefix which you apparently wanted to keep.

A literal string type for Python

Posted Apr 19, 2022 11:06 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (3 responses)

> you *can't* write s[:7] or something like that

If there were LiteralNumber, one might be able to do that, but without, there's no difference between a literal 7 and a 7 coming in from "the outside" through a variable. Though there are a number of other methods that take SupportsIndex that might now be suspicious to me…

A literal string type for Python

Posted Apr 19, 2022 15:23 UTC (Tue) by gbleaney (guest, #158077) [Link] (2 responses)

PEP author here. Appendix B provides a trivial function for turning any regular external string into a 'LiteralString':
https://peps.python.org/pep-0675/#appendix-b-limitations

If a developer want to circumvent the protections of 'LiteralString', they can easily do it. They don't even need fancy functions like the example we gave, they can just add a '# pyre-ignore' (or equivalent lint suppression comment for their typechecker of choice). The goal is to protect against accidental mistakes, not malicious or implausible behaviour by developers.

A literal string type for Python

Posted Apr 24, 2022 13:39 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (1 responses)

I guess my problem is that I'm less confident the problematic case is "implausible".

If I'm correct the proof of course would likely arrive too late. ie, this PEP succeeds, everybody gets used to the behaviour as documented, and then a hole is found in some code, say, a popular Django app, where users can manipulate a LiteralString so as to cause mischief. I'm certain that the instinct will be to blame the app programmer, but of course that's missing the whole point of these protections, programmers are human and as such lack foresight.

To be quite fair, the other way forward can also be dangerous. In C++ for example std::format() resolutely insists on a constant format string, so that's pretty safe (it needn't be a literal, but it can't be sensitive to user input as that's not constant), but it necessitates providing std::vformat() which does not take a constant format string, and so programmers may be tempted to call std::vformat() rather than re-factor some code to ensure the format strings are actually constant... Defensive programming is possible, maybe even encouraged, but it's probably easier to do the Wrong Thing™ in many cases than it should be.

A literal string type for Python

Posted Apr 25, 2022 7:50 UTC (Mon) by farnz (subscriber, #17727) [Link]

To a large extent, though, these sound like the same problem as unsafe in Rust; sure, I can wrap all sorts of crawling horrors in unsafe, and have a Safe Rust API on top so that when you look at my crate's documentation, it's not obvious that I've done this.

And similar to Unsafe Rust, the answer is tool-assisted review of code you're planning to use that highlights the areas of code that need extra attention - just as a Rust-aware review system calls out unsafe wherever it appears for extra human attention, so a Python-aware review system needs to call out manipulation of LiteralString that results in a LiteralString typed output for extra human attention.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds