How is this related to MSI-X, the system whereby network cards can assert different MSI interrupts based on a checksum in the header. This allows the load to be spread across CPUs in much the same way as suggested above.
I'm also wondering how this interacts with PCAP. If you have a machine with a dozen processes attached to an interface, then the packet needs to be copied to several different places in userspace (assuming MMAP ring-buffers). These are all going to be running on different CPUs so I don't think the above processing will help. But perhaps the actual BPF filtering can be spread out over multiple CPUs?
I ran into a problem this week, where IO-APIC round-robin interrupt routing is disabled on machines with >= 8 CPUs, which means if you don't have MSI-X you have >50% of one CPU dedicated to interrupt processing. The scheduler doesn't know this, leading to some odd effects. So if the above system works on ordinary MSI network cards this could be a solution,