Shaw: Python 3.13 gets a JIT
Shaw: Python 3.13 gets a JIT
Posted Jan 11, 2024 15:21 UTC (Thu) by Wol (subscriber, #4433)In reply to: Shaw: Python 3.13 gets a JIT by anton
Parent article: Shaw: Python 3.13 gets a JIT
Or increase the weight (as in the amount of real work each individual instruction does), which again reduces the relative impact of the interpreter.
Cheers,
Wol
Posted Jan 11, 2024 15:44 UTC (Thu)
by qwertyface (subscriber, #84167)
[Link]
Posted Jan 11, 2024 17:35 UTC (Thu)
by anton (subscriber, #25547)
[Link]
E.g., I translated However, Forth does not have arbitrary-length integers nor run-time type checking. Python is by necessity more heavy-weight, but it should be possible to check the types of s and x to be small integers at the start, and then compile the
Shaw: Python 3.13 gets a JIT
Python has followed the path of heavy-weight stuff since the start, and where it works, it's fine. E.g., when there is just a little work in Python that just calls library functions in C, and the lion's share of work is done in that library. But when it does not work fine, it results in programs that run much slower than C code and also quite a bit slower than code from a code-copying interpreter with light-weight operations.
Shaw: Python 3.13 gets a JIT
my_mul
above into Forth code that's close to the Python version (using locals and a while loop) rather than idiomatic:
: my_mul {: x y -- s :}
x 1 begin {: s i :}
i y < while
s x +
i 1 +
repeat
s ;
gforth-fast
, a code-copying interpreter (without patching), produces the following code:
$7F2156294F80 >l 1->1
0x00007f2155f3c002: mov %rbp,%rax
0x00007f2155f3c005: add $0x8,%r13
0x00007f2155f3c009: lea -0x8(%rbp),%rbp
0x00007f2155f3c00d: mov %r8,-0x8(%rax)
0x00007f2155f3c011: mov 0x0(%r13),%r8
$7F2156294F88 >l 1->0
0x00007f2155f3c015: mov %rbp,%rax
0x00007f2155f3c018: lea -0x8(%rbp),%rbp
0x00007f2155f3c01c: mov %r8,-0x8(%rax)
$7F2156294F90 @local0 0->1
0x00007f2155f3c020: mov 0x0(%rbp),%r8
$7F2156294F98 lit 1->1
$7F2156294FA0 #1
0x00007f2155f3c024: mov %r8,0x0(%r13)
0x00007f2155f3c028: sub $0x8,%r13
0x00007f2155f3c02c: mov 0x20(%rbx),%r8
0x00007f2155f3c030: add $0x28,%rbx
$7F2156294FA8 >l 1->1
0x00007f2155f3c034: mov %rbp,%rax
0x00007f2155f3c037: add $0x8,%r13
0x00007f2155f3c03b: lea -0x8(%rbp),%rbp
0x00007f2155f3c03f: mov %r8,-0x8(%rax)
0x00007f2155f3c043: mov 0x0(%r13),%r8
$7F2156294FB0 >l 1->0
0x00007f2155f3c047: mov %rbp,%rax
0x00007f2155f3c04a: lea -0x8(%rbp),%rbp
0x00007f2155f3c04e: mov %r8,-0x8(%rax)
$7F2156294FB8 @local1 0->1
0x00007f2155f3c052: mov 0x8(%rbp),%r8
$7F2156294FC0 @local3 1->1
0x00007f2155f3c056: mov %r8,0x0(%r13)
0x00007f2155f3c05a: mov 0x18(%rbp),%r8
0x00007f2155f3c05e: sub $0x8,%r13
$7F2156294FC8 < ?branch 1->1
$7F2156294FD0 ?branch
$7F2156294FD8 <my_mul+$A8>
0x00007f2155f3c062: add $0x38,%rbx
0x00007f2155f3c066: mov 0x8(%r13),%rax
0x00007f2155f3c06a: add $0x10,%r13
0x00007f2155f3c06e: mov -0x8(%rbx),%rsi
0x00007f2155f3c072: cmp %r8,%rax
0x00007f2155f3c075: mov 0x0(%r13),%r8
0x00007f2155f3c079: jl 0x7f2155f3c083
0x00007f2155f3c07b: mov (%rsi),%rax
0x00007f2155f3c07e: mov %rsi,%rbx
0x00007f2155f3c081: jmp *%rax
$7F2156294FE0 @local0 1->2
0x00007f2155f3c083: mov 0x0(%rbp),%r15
$7F2156294FE8 @local2 2->3
0x00007f2155f3c087: mov 0x10(%rbp),%r9
$7F2156294FF0 + 3->2
0x00007f2155f3c08b: add %r9,%r15
$7F2156294FF8 @local1 2->1
0x00007f2155f3c08e: mov %r15,-0x8(%r13)
0x00007f2155f3c092: sub $0x10,%r13
0x00007f2155f3c096: mov %r8,0x10(%r13)
0x00007f2155f3c09a: mov 0x8(%rbp),%r8
$7F2156295000 lit+ 1->1
$7F2156295008 #1
0x00007f2155f3c09e: add 0x28(%rbx),%r8
$7F2156295010 lp+2 1->1
0x00007f2155f3c0a2: add $0x10,%rbp
$7F2156295018 branch 1->1
$7F2156295020 <my_mul+$28>
0x00007f2155f3c0a6: mov 0x40(%rbx),%rbx
0x00007f2155f3c0aa: mov (%rbx),%rax
0x00007f2155f3c0ad: jmp *%rax
0x00007f2155f3c0af: nop
$7F2156295028 @local0 1->1
0x00007f2155f3c0b0: mov %r8,0x0(%r13)
0x00007f2155f3c0b4: sub $0x8,%r13
0x00007f2155f3c0b8: mov 0x0(%rbp),%r8
$7F2156295030 lp+!# 1->1
$7F2156295038 #32
0x00007f2155f3c0bc: add $0x18,%rbx
0x00007f2155f3c0c0: add -0x8(%rbx),%rbp
$7F2156295040 ;s 1->1
0x00007f2155f3c0c4: mov (%r14),%rbx
0x00007f2155f3c0c7: add $0x8,%r14
0x00007f2155f3c0cb: mov (%rbx),%rax
0x00007f2155f3c0ce: jmp *%rax
Note that the +
(integer addition) at 7F2156294FF0 is implemented with one instruction. There is also a "heavy-weight" VM instruction lit+
that results from combining the sequence 1 +
. It also results in one instruction (on AMD64).
+
of s+x
into
add %r9,%r15
jo slow_path
and do that in a copy-and-patch system.