LWN: Comments on "A literal string type for Python" https://lwn.net/Articles/891082/ This is a special feed containing comments posted to the individual LWN article titled "A literal string type for Python". en-us Sun, 21 Sep 2025 00:54:51 +0000 Sun, 21 Sep 2025 00:54:51 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net A literal string type for Python https://lwn.net/Articles/914236/ https://lwn.net/Articles/914236/ craig.francis <div class="FormattedComment"> Hi nye, bit weird to see you mention the PHP RFC for is_literal(), I'm the author :-)<br> <p> I completely agree with everything you said - taint checking is flawed, concatenation is fine, and the extra functions PEP 675 include make me feel a bit uncomfortable as well (but, to be fair, I cannot think of a vulnerability from them, I just can't say with 100% confidence they will be fine for every single context).<br> <p> Anyway... I'm just looking at how the Python implementation works, now 3.11 is out, because I need to go back to the PHP Internals Developers to try again.<br> <p> As I'm not a Python developer, do you think the following is a good example of this feature being used:<br> <p> <a rel="nofollow" href="https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/python/main.py">https://github.com/craigfrancis/php-is-literal-rfc/blob/m...</a><br> <p> <a rel="nofollow" href="https://eiv.dev/python-pyre/">https://eiv.dev/python-pyre/</a><br> <p> ---<br> <p> As to reasons for the PHP RFC rejection... it was not clear, most people who voted against did not comment (bit weird, considering RFC stands for "Request for Comments"), two people didn't want it to support string concatenation (they believe it would help find issues, but I've found that hasn't been the case; instead it does make adoption much easier due to the amount of existing code that uses concatenation), three people believe these checks should only be done by Static Analysis (the most optimistic stat I can find is 33% of PHP developers use Static Analysis[0], which I support, and can now be done with the `literal-string` type in Psalm and PHPStan, but I don't believe it will ever get to 100%), one person believed this should be solved though better documentation... and someone thought the idea was flawed, because a *malicious* developer could write the user-value into a new PHP file (e.g. `&lt;?php return "$user_value"; ?&gt;`), and execute it.<br> <p> [0] <a rel="nofollow" href="https://www.jetbrains.com/lp/devecosystem-2021/php/">https://www.jetbrains.com/lp/devecosystem-2021/php/</a><br> </div> Wed, 09 Nov 2022 17:00:29 +0000 A literal string type for Python https://lwn.net/Articles/892699/ https://lwn.net/Articles/892699/ nye <div class="FormattedComment"> The problem with taint checking is that experience has shown that - even if it&#x27;s always correct, which it often isn&#x27;t - it leads a surprising number of programmers to assume that because the evil bit isn&#x27;t set, then it must therefore be good. In other words, taint checking separates data into &quot;definitely unsafe&quot; and &quot;might be safe assuming you&#x27;re using it correctly, whatever that might mean&quot;, whereas many developers treat is as meaning &quot;maybe unsafe&quot; versus &quot;definitely safe&quot;.<br> <p> By restricting the feature to simply &quot;is this a literal string, or derived from literal strings purely by means of concatenation&quot;[0], the meaning is well-defined and easier to understand. In other words, it depends less upon programmer education, which is a strategy that has been repeatedly proven ineffective.<br> <p> There is some discussion about this in <a rel="nofollow" href="https://wiki.php.net/rfc/is_literal">https://wiki.php.net/rfc/is_literal</a> if you&#x27;re interested - that&#x27;s the proposal for a very similar feature in PHP, which sadly did not pass for reasons I&#x27;ve not yet investigated.<br> <p> [0] This PEP is a bit broader than that and does include some operations that create substrings, which makes me uncomfortable.<br> </div> Tue, 26 Apr 2022 15:25:02 +0000 A literal string type for Python https://lwn.net/Articles/892495/ https://lwn.net/Articles/892495/ farnz <p>To a large extent, though, these sound like the same problem as <tt>unsafe</tt> in Rust; sure, I can wrap all sorts of crawling horrors in <tt>unsafe</tt>, and have a Safe Rust API on top so that when you look at my crate's documentation, it's not obvious that I've done this. <p>And similar to Unsafe Rust, the answer is tool-assisted review of code you're planning to use that highlights the areas of code that need extra attention - just as a Rust-aware review system calls out <tt>unsafe</tt> wherever it appears for extra human attention, so a Python-aware review system needs to call out manipulation of <tt>LiteralString</tt> that results in a <tt>LiteralString</tt> typed output for extra human attention. Mon, 25 Apr 2022 07:50:51 +0000 A literal string type for Python https://lwn.net/Articles/892469/ https://lwn.net/Articles/892469/ tialaramex <div class="FormattedComment"> I guess my problem is that I&#x27;m less confident the problematic case is &quot;implausible&quot;.<br> <p> If I&#x27;m correct the proof of course would likely arrive too late. ie, this PEP succeeds, everybody gets used to the behaviour as documented, and then a hole is found in some code, say, a popular Django app, where users can manipulate a LiteralString so as to cause mischief. I&#x27;m certain that the instinct will be to blame the app programmer, but of course that&#x27;s missing the whole point of these protections, programmers are human and as such lack foresight.<br> <p> To be quite fair, the other way forward can also be dangerous. In C++ for example std::format() resolutely insists on a constant format string, so that&#x27;s pretty safe (it needn&#x27;t be a literal, but it can&#x27;t be sensitive to user input as that&#x27;s not constant), but it necessitates providing std::vformat() which does not take a constant format string, and so programmers may be tempted to call std::vformat() rather than re-factor some code to ensure the format strings are actually constant... Defensive programming is possible, maybe even encouraged, but it&#x27;s probably easier to do the Wrong Thing™ in many cases than it should be.<br> </div> Sun, 24 Apr 2022 13:39:43 +0000 A literal string type for Python https://lwn.net/Articles/891875/ https://lwn.net/Articles/891875/ gbleaney <div class="FormattedComment"> PEP author here. Appendix B provides a trivial function for turning any regular external string into a &#x27;LiteralString&#x27;:<br> <a href="https://peps.python.org/pep-0675/#appendix-b-limitations">https://peps.python.org/pep-0675/#appendix-b-limitations</a><br> <p> If a developer want to circumvent the protections of &#x27;LiteralString&#x27;, they can easily do it. They don&#x27;t even need fancy functions like the example we gave, they can just add a &#x27;# pyre-ignore&#x27; (or equivalent lint suppression comment for their typechecker of choice). The goal is to protect against accidental mistakes, not malicious or implausible behaviour by developers.<br> </div> Tue, 19 Apr 2022 15:23:25 +0000 A literal string type for Python https://lwn.net/Articles/891822/ https://lwn.net/Articles/891822/ mathstuf <div class="FormattedComment"> <font class="QuotedText">&gt; you *can&#x27;t* write s[:7] or something like that</font><br> <p> If there were LiteralNumber, one might be able to do that, but without, there&#x27;s no difference between a literal 7 and a 7 coming in from &quot;the outside&quot; through a variable. Though there are a number of other methods that take SupportsIndex that might now be suspicious to me…<br> </div> Tue, 19 Apr 2022 11:06:38 +0000 A literal string type for Python https://lwn.net/Articles/891760/ https://lwn.net/Articles/891760/ NYKevin <div class="FormattedComment"> Looking through Appendix C of the PEP (which lists the operations supported), I see removeprefix/removesuffix, but not slicing (__getitem__), so you would have to write something like s.removeprefix(&quot;DO NOT &quot;) to get the outcome which you describe (i.e. you *can&#x27;t* write s[:7] or something like that). If you explicitly write removeprefix(&quot;DO NOT &quot;), then IMHO it&#x27;s your own damn fault for removing a prefix which you apparently wanted to keep.<br> </div> Tue, 19 Apr 2022 04:58:58 +0000 A literal string type for Python https://lwn.net/Articles/891626/ https://lwn.net/Articles/891626/ felix.s <p>There is already a PEP for that, <a rel="nofollow" href="https://peps.python.org/pep-0501/">PEP 501</a>. It was (as I recall) drafted in parallel with <a rel="nofollow" href="https://peps.python.org/pep-0498/">PEP 498</a>, but the former was deferred because the Python developers wanted to get ‘further experience’ with naïve interpolation, as if they failed to understand that this will just incentivise developers to introduce bugs into their code by reaching for the injection-prone naïve f-strings instead, the very thing they are trying to paper over here. Or perhaps, if we take the deferral rationale seriously, they deliberately aimed for the PHP approach, where they first introduce a simplistic, half-baked design into the language, only to have to resort to painful after-the-fact fixes over the following years. <p>Meanwhile, JavaScript has had an equivalent to PEP 501 (tagged template strings) from the very start, added at the same time as naïve interpolation. I still find it hardly ideal (it's a bit too easy to simply forget the tag), and say what you want about the TC39, but at least they seem to understand developer incentives well. At this point, and I can’t believe I am saying this, I am starting to see JavaScript as a better Python than Python. Sun, 17 Apr 2022 11:30:59 +0000 A literal string type for Python https://lwn.net/Articles/891602/ https://lwn.net/Articles/891602/ tialaramex <div class="FormattedComment"> And that&#x27;s why I have a problem with:<br> <p> doComplicatedStuffBasedOnUserInput(&quot;DO NOT EAT BABIES&quot;, input)<br> <p> &quot;DO NOT EAT BABIES&quot; is blessed as a LiteralString because it is. No problem so far. But &quot;cleverly&quot; this proposal allows operations (such as truncation, concatenation, duplication and splitting) on LiteralString to produce a LiteralString, and so if doComplicatedStuffBasedOnUserInput has a bug, as it may well do, it can end up producing quite unexpected results, such as &quot;EAT BABIES&quot; and yet they&#x27;re blessed as LiteralString anyway via this rationale.<br> <p> Thus, the program user in fact gets arbitrary control over these strings in at least some cases, whereas that&#x27;s definitively not the situation in languages where there&#x27;s an actual literal string type. In exchange, Python gets to write &quot;WO&quot; + &quot;RDS&quot; and have that be a LiteralString whereas in the other languages it is not. I think that&#x27;s a bad trade, despite being very clever.<br> </div> Sat, 16 Apr 2022 21:28:07 +0000 A literal string type for Python https://lwn.net/Articles/891509/ https://lwn.net/Articles/891509/ mb <div class="FormattedComment"> The issue literal strings are trying to solve is user inputs being pasted into places where a hardcoded string is expected. That&#x27;s to ensure that the program user cannot get arbitrary control over these strings.<br> <p> It&#x27;s not supposed to prevent the programmer from hardcoding the wrong string.<br> </div> Sat, 16 Apr 2022 08:56:16 +0000 A literal string type for Python https://lwn.net/Articles/891485/ https://lwn.net/Articles/891485/ tialaramex <div class="FormattedComment"> Because the resulting string can be EAT BABIES, and it&#x27;s entirely possible that the programmer did not anticipate the circumstances which allow that? If we&#x27;re OK with that, then this entire exercise was futile, as we could have also blessed arbitrary strings.<br> </div> Fri, 15 Apr 2022 20:28:01 +0000 A literal string type for Python https://lwn.net/Articles/891369/ https://lwn.net/Articles/891369/ iabervon <div class="FormattedComment"> A related thing I&#x27;d like to see would be something like ft&quot;SELECT * FROM data WHERE user_id = {user_id}&quot;, that evaluates to (&quot;SELECT * FROM data WHERE user_id = {}&quot;, (&quot;user123&quot;,)), and then conn.execute((LiteralString, args)) could use the fact that Python has support for parsing format strings and working with the result in more interesting ways than making a new str. That is, it would replace each substitution with a &quot;?&quot; and add the value to the arguments, instead of replacing the substitution with the string representation of the value.<br> <p> As a side note, secure coders these days have gotten really bad at writing insecure code. The example in the PEP just won&#x27;t work at all, since it&#x27;ll result in returning rows where data.user_id = data.user123 rather than ones where data.user_id = &#x27;user123&#x27;. The insecure code that people would actually write is f&quot;SELECT * FROM data WHERE user_id = &#x27;{user_id}&#x27;;&quot;, with a set of single quotes that the attack could close.<br> </div> Thu, 14 Apr 2022 19:17:23 +0000 A literal string type for Python https://lwn.net/Articles/891366/ https://lwn.net/Articles/891366/ mb <div class="FormattedComment"> Why do you think your third example is unsafe?<br> </div> Thu, 14 Apr 2022 16:45:34 +0000 A literal string type for Python https://lwn.net/Articles/891364/ https://lwn.net/Articles/891364/ tialaramex <div class="FormattedComment"> I fear that the &quot;it even works if you&#x27;ve actually done stuff with the strings&quot; using methods we think are safe - undoes too much of the initial value of requiring literals. I understand entirely why they did it, but my instinct is that they&#x27;ve unlocked far too much here. Mechanically it&#x27;s obviously possible to use the capabilities marked &quot;safe&quot; to produce arbitrary LiteralStrings at runtime, at which point these are clearly not literal strings and I think in the sort of &quot;Oops, I am not really a programmer&quot; code where this safety was most necessary, that&#x27;s more rather than less likely to be accessible to an attacker.<br> <p> &quot;EAT BABIES&quot; // clearly expresses your intent, it&#x27;s your fault, you wrote that.<br> <p> &quot;EAT&quot; + &quot; BABIES&quot; // I can see why they felt like they should make this work, they&#x27;ve cited examples that do this and it genuinely is still clear at this point what your intent was, although I think it should be discouraged anyway.<br> <p> doComplicatedStuffBasedOnUserInput(&quot;DO NOT EAT BABIES&quot;, input) // this still type checks as LiteralString, and might be EAT BABIES yet we can hardly claim now that we&#x27;re reflecting clear programmer intent when that happens.<br> <p> The reason to want literals here rather than allowing arbitrary strings is to get closer to requiring intent. I&#x27;d rather give up the second example than, as this PEP does allow the third example opportunity to set fire to everything and pretend that&#x27;s &quot;safe&quot;.<br> <p> Rust of necessity has to require actual literals in formatting (not merely constant strings) because the formatting work is done via the macro system, and the macro system can&#x27;t see inside variables. But I think even though more sophisticated behaviour would be welcomed by many Rust programmers I personally prefer the literal requirement.<br> </div> Thu, 14 Apr 2022 16:31:18 +0000 A literal string type for Python https://lwn.net/Articles/891310/ https://lwn.net/Articles/891310/ NRArnot <div class="FormattedComment"> Django has something similar. When ordinary strings are rendered into an HTML template, the HTML special characters are escaped (&quot;&lt;&quot; into &quot;&amp;lt;&quot; etc.). The programmer can create &quot;safe&quot; strings (type Safestring) which behave like ordinary strings in most ways, but which do not get HTML-escaped. They remain safe until they are combined with ordinary strings by concatenation etc.<br> </div> Thu, 14 Apr 2022 08:39:27 +0000 A literal string type for Python https://lwn.net/Articles/891300/ https://lwn.net/Articles/891300/ ovitters <div class="FormattedComment"> I didn&#x27;t read the article, but the problem reminds me of why Bugzilla uses the taint mode of Perl. It was really handy to catch loads of user input handling bugs. Despite that, Bugzilla still was affected by a few user input security issues. Basically things that were marked as safe that (despite reviews) was eventually proven not to be safe. I really appreciated the taint mode and it&#x27;s surprising how few programming languages lack a similar concept. This as handling user input is so common.<br> </div> Thu, 14 Apr 2022 07:26:22 +0000 A literal string type for Python https://lwn.net/Articles/891277/ https://lwn.net/Articles/891277/ milesrout <div class="FormattedComment"> This seems like a very good design. It speaks volumes that (at least from what I can tell from reading this article) the type system did not need to be modified to make this work. Many type systems in common use today could not represent this sort of subtyping relationship, as doing LiteralString.__add__(LiteralString) would give back a str. And it all disappears into the background without getting in your way if you just want to write normal Python without type annotations.<br> </div> Thu, 14 Apr 2022 01:20:35 +0000