Shaw: Python 3.13 gets a JIT

Posted Jan 11, 2024 9:03 UTC (Thu) by gabrielesvelto (guest, #83884)
In reply to: Shaw: Python 3.13 gets a JIT by lwnuser573
Parent article: Shaw: Python 3.13 gets a JIT

I find it kind of odd that they claim this is a new technique, because something similar - under the name of inline-threading - has been around since forever. Here's the original article about it (1998): https://dl.acm.org/doi/pdf/10.1145/277650.277743

And here's SableVM's implementation for a practical application (2003): https://www.sable.mcgill.ca/publications/papers/2003-2/sa...

Shaw: Python 3.13 gets a JIT

Posted Jan 11, 2024 10:46 UTC (Thu) by atnot (subscriber, #124910) [Link] (1 responses)

I had the same thought: this reminds me a lot of what firefox calls the "baseline JIT"

> The Baseline JIT. Each bytecode instruction is compiled directly to a small piece of machine code.

https://hacks.mozilla.org/2019/08/the-baseline-interprete...

Apple Webkit's JavaScriptCore seems to use the same concept and terminology

> In a nutshell, the Baseline JIT is a template JIT and what that means is that it generates specific machine code for the bytecode operation. There are two key factors that allow the Baseline JIT a speed up over the LLInt3:
> Removal of interpreter dispatch. Interpreter dispatch is the costliest part of interpretation, since the indirect branches used for selecting the implementation of an opcode are hard for the CPU to predict.

https://zon8.re/posts/jsc-internals-part2-the-llint-and-b...

Shaw: Python 3.13 gets a JIT

Posted Jan 11, 2024 12:00 UTC (Thu) by Wol (subscriber, #4433) [Link]

My immediate reaction to this is p-code.

Compile a high-level language to p-code, and you're going to be spending nearly all your time inside internal function calls, and your interpreter is going to be doing very little.

Okay, it's not a jit, it's not patched your code to machine code, but the effect is pretty much the same ...

Cheers,
Wol

Shaw: Python 3.13 gets a JIT

Posted Jan 11, 2024 12:29 UTC (Thu) by qwertyface (subscriber, #84167) [Link] (2 responses)

I had a quick scan of those papers, and I don't think that they're talking about quite the same class of thing — copy and patch seems to be an implementation strategy for code generation within a somewhat conventional JIT; inline-threading is more like a style of "JIT". Which isn't to say that the copy-and-patch approach is a new technique (it's not, it's used by a Lua interpreter as I understand it).

Basically copy and patch is a cheap and (relatively) easy approach to generating executable machine code at runtime - instead of having to know how to write machine code for your platform (e.g. instruction encodings for different instructions), you write short template functions in C and compile with clang at build-time (LLVM provides a lot of control over calling convention compared to GCC, apparently), and store the resulting code as data embedded in the executable (with some preprocessing). Then at runtime, dynamic code generation is memory to memory copies followed by some fixups.

I guess a lot of code generation is template driven, the question really is how you get the templates.

Shaw: Python 3.13 gets a JIT

Posted Jan 11, 2024 13:26 UTC (Thu) by gabrielesvelto (guest, #83884) [Link] (1 responses)

What you described is exactly the way this was implemented in SableVM. That's why I don't think this is a novel technique.

Shaw: Python 3.13 gets a JIT

Posted Jan 11, 2024 15:05 UTC (Thu) by qwertyface (subscriber, #84167) [Link]

I don't think that they claim it's a novel technique, just that the JIT is new to CPython. It isn't the usual way to implement this either, It's not what PyPy does for example, nor hotspot or the CLR.

Shaw: Python 3.13 gets a JIT

Posted Jan 11, 2024 13:53 UTC (Thu) by anton (subscriber, #25547) [Link]

Inline threading is copying without patching. There is still a VM instruction pointer around, and it is used for getting at the immediate arguments that are patched in with copy-and-patch, including, for control flow, the VM-level branch targets; so inline threading performs control flow by loading the new VM instruction pointer, then looking up the machine code there, and jumping to it.

In Retargeting JIT compilers by using C-compiler generated executable code (as ) we used a copy-and-patch approach to get rid of the VM instruction pointer. We derived the patch information by comparing two variants of each code fragment with different immediate arguments (and some architecture-specific knowledge about possible encodings). I'll have to read Copy-and-Patch Compilation to find what they did differently.

Earlier work in that vein was Automatic, template-based run-time specialization, but that's in the context of a specializer rather than a language implementation, and they used a different way to get the patch information.