Dynamism or performance: pick two

Posted Aug 5, 2025 18:01 UTC (Tue) by q3cpma (subscriber, #120859)
In reply to: Dynamism or performance: pick two by q3cpma
Parent article: Python performance myths and fairy tales

Though, about the Python conundrum at hand where it obviously can't redesign itself right now, I think doing it like CL optimizing compilers do might be a good idea: if you toggle a flag, type declarations change from assertions to "trusted blindly by the compiler", so that the usual static typing optimizer can do its job (https://www.sbcl.org/manual/#Declarations-as-Assertions).

If Python coupled that mechanism (again, should be able to only enable at scope level or at least function level) with a way to declare that a specific function/variable is frozen (thus inlinable) and that change is an UB, it may work without upheaving the ecosystem.

Perl got a strict mode, I think Python could get a "fast" one.

Dynamism or performance: pick two

Posted Aug 6, 2025 14:32 UTC (Wed) by DemiMarie (subscriber, #164188) [Link]

UB is not a good idea unless you want to gain C’s insecurity as well. Exact type checks (“is this exactly of this type?”) are cheap.

The "guard" trick from JITs (including PyPy)

Posted Aug 6, 2025 17:20 UTC (Wed) by farnz (subscriber, #17727) [Link] (2 responses)

There's a trick you can use, called a "guard", that would avoid it being UB.

A guard is a cheap to evaluate runtime test, that returns either "true" or "false". If you're testing something that can get expensive, you handle that by returning a safe approximation to the result quickly; so, for example, your guard might look at type hints and return "yes, the runtime types match the hints" or "no, the runtime types might not match". In the case of depending on the type of a global variable, you might approximate by saying "no, the runtime types might not match" if any global variable has changed since the guard was created; this is OK, because it'll push you down the slow path even when you could go down the fast path.

You then generate one or more guards as you generate an optimized function; the essential guard is one that tells you to go down the slow path if runtime types don't match the type hints you depended on. You can then add other guards to tell you if you need to regenerate the optimized path, or to take you down a very fast path in limited circumstances (e.g. a guard that says that if all the input values are less than 2**28, then you can use 32 bit registers throughout).

Finally, you put a conditional at the top of every optimized function; "if the guard is true, call the optimized form. Else, go down the slow path".

From here, there's a whole bunch of research you can draw upon (some of it incorporated into PyPy) that helps you minimise the number of guard checks you actually run at runtime; for example, you can know that if the guard of function bar returns "true" on the first loop iteration, it will continue to return "true" on all future iterations, and hoist the guard outside the loop, or that if you combine the guards for foo and bar, then foo can directly call the optimized form of bar, bypassing its guard check.

The result of this is teaching people that "high performance" Python needs you to write code such that the guards are simple and never false (no touching globals outside of top-level functions, for example, or make sure that your code type checks). People who ignore that (for debugging, or because they don't care) still get working Python; people who pay attention get fast Python.

The "guard" trick from JITs (including PyPy)

Posted Aug 6, 2025 18:24 UTC (Wed) by euclidian (subscriber, #145308) [Link] (1 responses)

Just thinking about the guard trick - and I am sure there are ways of fixing it - doesn't the basic way of doing it break with multithreaded python (without the GIL).

Ie you have passed your guard check and are in your optimized code but another thread breaks some of the assumptions you guarded against - modifying a global say.

The "guard" trick from JITs (including PyPy)

Posted Aug 11, 2025 9:08 UTC (Mon) by farnz (subscriber, #17727) [Link]

The most trivial fix that works is to put the guard next to the access, rather than at the beginning of the function.

You can then test the guard inside the optimized code, and continue in the optimized path if it passes, or switch back to the slow path if it fails. You'd usually have the "guard failed" path be the one that saves state ready for the slow path - so optimized with guard passing can keep things in registers, but if it ever fails at runtime (e.g. you optimized based on the value of a global, that global has changed), you store everything and switch back to slow path.

There's more complex options, of course, but that's the simplest; indeed, in the literature relating to guard motion and elision, it's sometimes assumed that you'll place a guard next to each place where you depend on a guarded assumption, and then depend on being able to show that moving the guard earlier (and possibly combining it with other guards), or even completely eliding it, is acceptable.

This does require guards to be extremely cheap to evaluate; your optimized code is going to check a lot of them unless your guard motion + elision system is very good.