Low-latency Ethernet device polling
Network interfaces, like most reasonable peripheral devices, are capable of interrupting the CPU whenever a packet arrives. But even a moderately busy interface can handle hundreds or thousands of packets per second; per-packet interrupts would quickly overwhelm the processor with interrupt-handling work, leaving little time for getting useful tasks done. So most interface drivers will disable the per-packet interrupt when the traffic level is high enough and, with cooperation from the core networking stack, occasionally poll the device for new packets. There are a number of advantages to doing things this way: vast numbers of interrupts can be avoided, incoming packets can be more efficiently processed in batches, and, if packets must be dropped in response to load, they can be discarded in the interface before they ever hit the network stack. Polling is thus a win for almost all situations where there is any significant amount of traffic at all.
Extreme low-latency users see things differently, though. The time between a packet's arrival and the next poll is just the sort of latency that they are trying to avoid. Re-enabling interrupts is not a workable solution, though; interrupts, too, are a source of latency. Thus the drive for user-space solutions where an application can simply poll the interface for new packets whenever it is prepared to handle new messages.
Eliezer Tamir has posted an alternative solution in the form of the low-latency Ethernet device polling patch set. With this patch, an application can enable polling for new packets directly in the device driver, with the result that those packets will quickly find their way into the network stack.
The patch adds a new member to the net_device_ops structure:
int (*ndo_ll_poll)(struct napi_struct *dev);
This function should cause the driver to check the interface for new packets and flush them into the network stack if they exist; it should not block. The return value is the number of packets it pushed into the stack, or zero if no packets were available. Other return values include LL_FLUSH_BUSY, indicating that ongoing activity prevented the processing of packets (the inability to take a lock would be an example) or LL_FLUSH_FAILED, indicating some sort of error. The latter value will cause polling to stop; LL_FLUSH_BUSY, instead, appears to be entirely ignored.
Within the networking stack, the ndo_ll_poll() function will be called whenever polling the interface seems like the right thing to do. One obvious case is in response to the poll() system call. Sockets marked as non-blocking will only poll once; otherwise polling will continue until some packets destined for the relevant socket find their way into the networking stack, up until the maximum time controlled by the ip_low_latency_poll sysctl knob. The default value for that knob is zero (meaning that the interface will only be polled once), but the "recommended value" is 50µs. The end result is that, if unprocessed packets exist when poll() is called (or arrive shortly thereafter), they will be flushed into the stack and made available immediately, with no need to wait for the stack itself to get around to polling the interface.
Another patch in the series adds another call site in the TCP code. If a read() is issued on an established TCP connection and no data is ready for return to user space, the driver will be polled to see if some data can be pushed into the system. So there is no need for a separate poll() call to get polling on a TCP socket.
This patch set makes polling easy to use by applications; once it is configured into the kernel, no application changes are needed at all. On the other hand, the lack of application control means that every poll() or TCP read() will go into the polling code and, potentially, busy-wait for as long as the ip_low_latency_poll knob allows. It is not hard to imagine that, on many latency-sensitive systems, the hard response-time requirements really only apply to some connections, while others have no such requirements. Polling on those less-stringent sockets could, conceivably, create new latency problems on the sockets that the user really cares about. So, while no reviewer has called for it yet, it would not be surprising to see the addition of a setsockopt() operation to enable or disable polling for specific sockets before this code is merged.
It almost certainly will be merged at some point; networking maintainer
Dave Miller responded to an earlier posting
with "I just wanted to say that I like this work a lot.
"
There are still details to be worked out and, presumably, a few more rounds
of review to be done, so low-latency sockets may not be ready for the 3.11
merge window. But it would be surprising if this work took much longer
than that to get into the mainline kernel.
| Index entries for this article | |
|---|---|
| Kernel | Networking |
