It is not "pointer memory bandwidth bound test cases".
A vm like python uses a *lot* of pointers:
- a list of 'n' items is a buffer of 'n' pointers. Same for tuples.
- a dictionary of 'n' items is a buffer of ~6*n pointers
- every string item carries a pointer
- every instance is a dictionary plus a couple of pointers
C programmers think with memory buffers but for dynamic languages where objects work by reference are mostly based on tons of pointers; this is what makes them dynamic. And yes, making all those pointers half their size is very important. Because imagine that when you want to look up something in a list, this list is fetched to the cache and all the pointers are traversed while looking for the item. Fetching a 2k buffer is better than fetching a 4k buffer. In fact, x86 might be more suitable than x86-64 for such vms!
(It would be very interesting to see some python benchmarks for x32 vs x86, nontheless)
Now, one may say that "if you want speed, do it in C". However making a dynamic language faster will benefit thousands of programs written in that language, which is important for some people..
Using pointer offsets suffers from one extra indirection and will kill a big part of the cache. On the other hand, pointing to more than 4G of things is an overkill.