NETIF_F_LLTX and race conditions

[Posted February 1, 2005 by corbet]

Network drivers must provide a function (hard_start_xmit()) for the networking layer to call whenever it decides the time has come to send out a packet. Normally, calls to hard_start_xmit() are serialized with a spinlock (xmit_lock) in the net_device structure. In this way, the networking subsystem guarantees that it will not attempt to send multiple packets simultaneously on the same interface.

This method works, but it is not quite ideal, especially for high-performance network adaptors. Most drivers already implement their own internal locking, rendering xmit_lock redundant. The xmit_lock can also cause a certain amount of cache line bouncing on SMP systems with a lot of networking traffic. To work around these problems, the NETIF_F_LLTX "feature" flag was added in 2.6.9. If a driver sets NETIF_F_LLTX on its interface, it is declaring that it performs its own locking, and its hard_start_xmit() function will be called without the xmit_lock held.

All seemed well for a while, but, back in December, Roland Dreier noticed a problem. When a network driver notices that an interface's transmit buffers are too full to accept any more packets, it calls netif_stop_queue() to inform the networking layer. Its hard_start_xmit() method should then not be called until the driver (with a call to netif_wake_queue()) indicates that new packets can, once again be accepted. Network drivers thus can count on not being asked to transmit packets when they have stopped the queue.

Unless, as it turns out, they have set NETIF_F_LLTX. The lack of transmit locking in the networking layer itself leads to a situation where hard_start_xmit() can be called simultaneously on multiple processors; hard_start_xmit() is supposed to handle that situation with its own locking. But, if one hard_start_xmit() call fills the transmit buffer and stops the queue, the second call will proceed in a state it had not expected: it has a packet to transmit but no place to put it. In most cases, this race leads to a strange error message in the system logs. In a poorly-written driver, worse things could happen.

Roland's initial problem report included a patch which silenced the log message. The networking hackers did not like that solution, however; they feared that it could hide serious (unrelated) bugs. So they set out to come up with a better solution. The result was a lengthy patch which made some significant changes to how network driver locking works. Uses of xmit_lock were changed to disable interrupts, so that lock could be used in interrupt handlers as well. Drivers could then use xmit_lock (rather than their own lock) for internal locking. The NETIF_F_LLTX flag was redefined to indicate that the transmit routine was completely lockless, a condition which only applies to certain types of software device. The end result was most of the advantages of NETIF_F_LLTX but with the race condition solved. A version of this patch was merged as part of 2.6.11-rc2.

Unfortunately, there were some difficulties. The locking changes led to deadlocks in certain situations where the driver would try to grab a lock already held by the networking code which called it. Network drivers had to be careful not to do anything (such as spin_unlock_irq()) which would enable interrupts while xmit_lock was held. dev_kfree_skb() could no longer be called in any place where xmit_lock was held, since its use is not legal when interrupts are disabled. Overall, there were enough problems with this approach that the patch was backed out after the -rc2 release, and the developers started over.

The current approach, as proposed by David Miller, is to leave things as they are and silence the log message. The patch has been tweaked a bit since first proposed by Roland in December; it now tries to distinguish the NETIF_F_LLTX race from other (more serious) calls to hard_start_xmit() with the transmit buffer full. This is done by checking to see if the queue has been stopped; if so, it is a harmless race and transmission of the packet is silently deferred. If the queue is still running, however, then something has gone wrong somewhere. This change must be made in all drivers which use NETIF_F_LLTX - a relatively small set. It's a small change, but it is a change in the rules for network drivers and worth being aware of.

Index entries for this article
Kernel	Device drivers
Kernel	NETIF_F_LLTX
Kernel	Networking/hard_start_xmit() locking
Kernel	Race conditions

NETIF_F_LLTX and race conditions

Posted Feb 3, 2005 6:02 UTC (Thu) by jwb (guest, #15467) [Link] (7 responses)

With these network locking changes, and a brand new SCSI layer, I can hardly wait to roll out the new stable kernel on all my productions machines on the day it is released!

NETIF_F_LLTX and race conditions

Posted Feb 3, 2005 15:40 UTC (Thu) by melauer (guest, #2438) [Link] (3 responses)

> With these network locking changes, and a brand new SCSI layer, I can
> hardly wait to roll out the new stable kernel on all my productions
> machines on the day it is released!

The definition of "stable kernel" has changed. The latest kernel release in an even-numbered series is not the "stable kernel" anymore. Now that releases which just fix bugs (e.g. security holes) and releases which add features have been thoroughly conflated, that's the way it's gotta be. The latest kernel release from your disto or hardware vendor is the "stable kernel" now. Presumably it's an older kernel with backported bugfixes.

stable kernel

Posted Feb 4, 2005 0:48 UTC (Fri) by giraffedata (guest, #1954) [Link] (2 responses)

Presumably it's an older kernel with backported bugfixes.

And that is, incidentally, probably based on 2.4.

stable kernel

Posted Feb 6, 2005 1:17 UTC (Sun) by barryn (subscriber, #5996) [Link] (1 responses)

> And that is, incidentally, probably based on 2.4.

Not if you're running any of the following distributions (and some others too):

Fedora Core 2 or 3
(once it comes out) Red Hat Enterprise Linux 4 (or recompiled clones thereof)
SuSE Linux Enterprise Server 9
SuSE Linux 9.1 or 9.2
Ubuntu
Mandrake 10.x

2.6 is slowly but steadily taking over...

stable kernel

Posted Feb 6, 2005 3:38 UTC (Sun) by giraffedata (guest, #1954) [Link]

I presume you're saying that the distributions mentioned are providing stabilized 2.6-based kernels, or recommending them, or abandonning support of 2.4-based kernels.

But I still maintain that if you find a stable Linux kernel, it's more likely to be based on 2.4, because these 2.6-based ones simply aren't stable in the way we got used to in the 2.4 days. The code in 2.6-based kernels is substantially newer and less exposed than in the 2.4-based ones.

I'm still hopeful that the distributions will stick with an old 2.6 level and let it stabilize, but so far I haven't seen the evidence that they will. If they frequently "upgrade" by grabbing all of Linus's recent changes, we'll still have to look to something 2.4-based for any kind of stability.

Stability of release kernels

Posted Feb 3, 2005 19:51 UTC (Thu) by shane (subscriber, #3335) [Link] (2 responses)

If you need to run the absolute latest kernel, and care about stability, then you need to set up a test environment and test each release before putting it into production. This is true no matter what pre-release testing procedure is built into the kernel release cycle - chances are no kernel developer has your production environment as a desktop machine!

If you're lazy, you wait for a kernel to age, like a fine wine.

Or you can do what most people do, and use the kernel that comes with your distribution.

Stability of release kernels

Posted Feb 4, 2005 0:54 UTC (Fri) by giraffedata (guest, #1954) [Link] (1 responses)

If you're lazy, you wait for a kernel to age, like a fine wine.

Aging doesn't make the bugs go away. And bugs are always there.

The kind of aging you're talking about happens to a series of kernels, not a particular one, and is more precisely called "stabilizing." That doesn't happen any more in kernel.org kernel series, but does in some Linux distribution kernel series.

Stability of release kernels

Posted Feb 5, 2005 2:45 UTC (Sat) by set (guest, #4788) [Link]

Another choice would be either Alan Cox' ac kernel series,

or Andreas Salomon's series:
http://www.acm.cs.rpi.edu/~dilinger/patches/2.6.10/as3/

(from his first 2.6.10-as1 announcement:)
"I'm announcing a new kernel tree; -as. The goal of this tree is to form
a stable base for vendors/distributors to use for their kernels. In
order to do this, I intend to include only security fixes and obvious
bugfixes, from various sources. I do not intend to include driver
updates, large subsystem fixes, cleanups, and so on. Basically, this is
what I'd want 2.6.10.1 to contain."