> The kernel implementation is thoroughly debugged, mature, patched regularly,
Agreed.
> and faster to boot.
No. Right now, the driver gets the packet, later the kernel gets around to looking at it (maybe doing checksums, etc), and then much later userspace requests it. If each of these happen on a different CPU, you will waste thousands (ten thousands?) of instructions because of CPU caches and data locking issues.
To become a better programmer, read this: http://lwn.net/Articles/250967/
It's really long, but pay attention to the parts where loops get 10x faster just by re-arranging data structures.
> The real problem here is what Apache does after it reads data from a socket.
Agreed. TCP sockets is icing on the cake once you've solved the current bottleneck.
> Here's one uninformed idea: accept connections and read HTTP requests in one master process, asynchronously.
But why does that process *have* to be Apache? Just put another web server in front of it (Nginx,Varnish, etc.). Apaches is really a "big and expensive" single-threaded application server (mod_php, mod_passenger, mod_perl, etc). In fact, Apache isn't especially good at serving static files either. I have to admit, Nginx has the best architecture, (but consider Varnish if caching is a big win).
It's like when you go to the big warehouse stores, and before you get to the front of the line, some guy with a scanner has already scanned your cart, and all you do is pay at the register without waiting.