Aha, glad I'm not the only one with this issue. We recently found out that if you have more than 8 CPUs the round-robin IRQ distribution of the IO-APIC is disabled so that *all* IRQs from a network card (soft and hard) all end up on one CPU. As you point out, the scheduler doesn't handle this very well. (It was the first time I saw a CPU spending 90% of its time in kernel space while the other CPUs were almost idle.)
The only response we got from kernel developers was that "round-robin IRQs suck" which completely sidesteps the point that what happens now doesn't work at all, and any perceived suckyness of round-robin IRQs would at least be spread evenly.
Threaded IRQs do seem to be the solution here, I hope they are implemented soon.