> real 32-bit ARM servers ... 4x the cache, 4x the RAM
With the ARM instruction set, unless using Thumb, you have each instruction coded on 32 bits, so 256 assembly instructions weight 1 Kbyte.
You will need a lot more code cache than a ia32 processor to compete, probably around twice the amount.