4K stacks by default?
Posted Apr 25, 2008 4:22 UTC (Fri) by jzbiciak
(✭ supporter ✭
In reply to: 4K stacks by default?
Parent article: 4K stacks by default?
If parameters are passed on the stack, the argument frame basically exists on the stack for the entire duration of the function. If those same arguments are passed in registers, the arguments exist only as long as they're needed. If they're unused, consumed before a funtion call or passed down the call chain, they don't need to go to the stack.
The only things that need to go on the stack as you go down the call chain are values that are live across the call that don't have other storage--compiler temps and arguments are used after the call.
I haven't looked at the document linked above, but I wouldn't be surprised if the x86-64 calling convention also splits the GPRs between caller-saves vs. callee-saves, thereby also reducing the number of slots reserved for values live-across calls.
Separate of compiler temps and live-across call values are spill values. In my experience, modern compilers allocate a stack frame once at the start of a function and maintain it through the hlife of the function (alloca() being a notable exception, allocating beyond the static frame). If a function has a lot of spilled values, these too get statically allocated. x86 has less than half as many general purpose registers as x86-64, resulting in greater numbers of spilled variables as well.
How about an example? Here's the function prolog from ay8910_write in my Intellivision emulator, compiled for x86:
subl $60, %esp #,
The function allocates a 60 byte stack frame for itself, in addition to 12 bytes for arguments 2 through 4. (Only the first argument gets passed in a register as I recall). That's 72 bytes. Here's the same function prolog on x86-64:
movq %r13, -24(%rsp) #,
movq %r14, -16(%rsp) #,
movq %rdi, %r13 # bus, bus
movq %r15, -8(%rsp) #,
movq %rbx, -48(%rsp) #,
movl %edx, %r15d # addr, addr
movq %rbp, -40(%rsp) #,
movq %r12, -32(%rsp) #,
subq $56, %rsp #,
This version allocated 56 bytes, and had all its arguments passed in registers. That's 16 bytes smaller.
I picked this function not because it's some extraordinary function, but rather because it's moderately sized with a moderate number of arguments, and it's smack dab in the middle of a call chain. And it's in production code.
to post comments)