Probably you'd only get to use half of each register -- odd-numbered registers for odd slices, even for even -- when doing carry arithmetic.
A great advantage of an SSE ABI is that the kernel promises not touch those registers. If it could then be persuaded not to push the other registers, context switches ought to get very quick -- great for interrupt latency. That is, until you build the kernel using the SSE ABI, too...