NETIF_F_LLTX and race conditions
[Posted February 1, 2005 by corbet]
Network drivers must provide a function (
hard_start_xmit()) for
the networking layer to call whenever it decides the time has come to send
out a packet. Normally, calls to
hard_start_xmit() are serialized
with a spinlock (
xmit_lock) in the
net_device structure.
In this way, the networking subsystem guarantees that it will not attempt
to send multiple packets simultaneously on the same interface.
This method works, but it is not quite ideal, especially for
high-performance network adaptors. Most drivers already implement
their own internal locking, rendering xmit_lock redundant. The
xmit_lock can also cause a certain amount of cache line bouncing
on SMP systems with a lot of networking traffic. To work around these
problems, the NETIF_F_LLTX "feature" flag was added in 2.6.9. If
a driver sets NETIF_F_LLTX on its interface, it is declaring that
it performs its own locking, and its hard_start_xmit() function
will be called without the xmit_lock held.
All seemed well for a while, but, back in December, Roland Dreier noticed a problem. When a network driver
notices that an interface's transmit buffers are too full to accept any
more packets, it calls netif_stop_queue() to inform the networking
layer. Its hard_start_xmit() method should then not be called
until the driver (with a call to netif_wake_queue()) indicates
that new packets can, once again be accepted. Network drivers thus can
count on not being asked to transmit packets when they have stopped the
queue.
Unless, as it turns out, they have set NETIF_F_LLTX. The lack of
transmit locking in the networking layer itself leads to a situation where
hard_start_xmit() can be called simultaneously on multiple
processors; hard_start_xmit() is supposed to handle that situation
with its own locking. But, if one hard_start_xmit() call fills
the transmit buffer and stops the queue, the second call will proceed in a
state it had not expected: it has a packet to transmit but no place to put
it. In most cases, this race leads to a strange error message in the
system logs. In a poorly-written driver, worse things could happen.
Roland's initial problem report included a patch which silenced the log
message. The networking hackers did not like
that solution, however; they feared that it could hide serious
(unrelated) bugs. So they set out to come up with a better solution. The
result was a lengthy patch which made some significant changes to how
network driver locking works. Uses of xmit_lock were changed to
disable interrupts, so that lock could be used in interrupt handlers as
well. Drivers could then use xmit_lock (rather than their own
lock) for internal locking. The NETIF_F_LLTX flag was redefined
to indicate that the transmit routine was completely lockless, a condition
which only applies to certain types of software device. The end result was
most of the advantages of NETIF_F_LLTX but with the race condition
solved. A version of this patch was merged as part of 2.6.11-rc2.
Unfortunately, there were some difficulties. The locking changes led to
deadlocks in certain situations where the driver would try to grab a lock
already held by the networking code which called it. Network drivers had
to be careful not to do anything (such as spin_unlock_irq()) which
would enable interrupts while xmit_lock was held.
dev_kfree_skb() could no longer be called in any place where
xmit_lock was held, since its use is not legal when interrupts are
disabled. Overall, there were enough problems with this approach that the
patch was backed out after the -rc2 release, and the developers started
over.
The current approach, as proposed by David
Miller, is to leave things as they are and silence the log message. The
patch has been tweaked a bit since first proposed by Roland in December; it
now tries to distinguish the NETIF_F_LLTX race from other (more
serious) calls to hard_start_xmit() with the transmit buffer
full. This is done by checking to see if the queue has been stopped; if
so, it is a harmless race and transmission of the packet is silently
deferred. If the queue is still running, however, then something has gone
wrong somewhere. This change must be made in all drivers which use
NETIF_F_LLTX - a relatively small set. It's a small change, but
it is a change in the rules for network drivers and worth being aware of.
(
Log in to post comments)