Eliminating tasklets

Posted Jun 28, 2007 10:22 UTC (Thu) by rwmj (subscriber, #5474)
I've always been confused by tasklets (and bh's which preceeded them IIRC).

Can someone explain to me (a dabbler in the kernel at best) why tasklets are needed, and why you can't just execute the work inside the interrupt handler? Or alternatively give an example of work which cannot be done either inside the handler, nor in the context of the process, but needs to go in a tasklet instead?


Posted Jun 28, 2007 12:58 UTC (Thu) by nevets (subscriber, #11875) [Link]

My post about the tasklet-to-workqueue conversion contained a reference to a nice paper

Softirqs and tasklets replaced bottom halves, because bottom halves were a large bottle neck on SMP systems. If a bottom half was running on one CPU no other bottom halves could run on any other CPU. It's obvious how these wouldn't scale.

Softirqs and tasklets replaced bottom halves. The difference between softirqs and tasklets, is that a softirq is guaranteed to run on the CPU it was scheduled on, where as tasklets don't have that guarantee. Also the same tasklet can not run on two separate CPUS at the same time, where as a softirq can. Don't confuse the tasklet restriction with that of the bottom halves. Two different tasklets can run on two different CPUs, just not the same one.

Now to answer your question. I can't argue why we have tasklets (I'm trying to get rid of them ;-) but I'll give the best example of why we have softirqs. That's the networking code. Say you get a network packet. But to process that packet, it takes a lot of work. If you do that in the interrupt handler, no other interrupts can happen on that IRQ line. That would cause a large latency to incoming interrupts and perhaps you'll overflow the buffers and drop packets. So the interrupt handler only moves the data off to a network receive queue, and returns. But this packet still needs to be processed right away. Before anything else. So it goes off to a softirq for processing. Now you still allow for interrupts to come in. Perhaps the network interrupt comes in again on another CPU. The other CPU can start processing that packet with a softirq on that CPU, even before the first packet was done processing.

See how this can scale well? But the same tasklet can't run on two different CPUs, so it doesn'h have this advantage. In fact if a tasklet is scheduled to run on another CPU but is waiting for other tasklets to finish, and you try to schedule the tasklet on a CPU that's not currently processing tasklets, it will notice that the tasklet is already scheduled to run and not do anything. So tasklets are not so reliable when it comes to latencies. Hence, why I'm working on getting rid of them, since I don't beleive they accomplish what people think they do.

Posted Jul 2, 2007 12:52 UTC (Mon) by rankincj (subscriber, #4865) [Link]

At least one device I know receives network data via "bulk" URBs, and I believe that URB callback functions are run in the hard IRQ context of the USB hub device. Is there a better place than a tasklet to offload the work into in this case?

Posted Jul 5, 2007 5:43 UTC (Thu) by HalfMoon (guest, #3211) [Link]

All networking drivers, USB or otherwise, hand packets off to be processed in a network tasklet. So no matter what that particular device's driver does, most of the work is already done in a tasklet.

If that USB networking device uses the "usbnet" framework, it won't do much at all in hardirq context. That driver just queues its RX packets to its own tasklet, then immediately resubmits the URB with a new skbuff. (And then the bulk-IN callback can be called immediately with the next packet. For high speed devices, it's quite realistic to get multiple back-to-back packets like that.) So: only "usb stuff" is done in hardirq context, and all the network stuff is done in a tasklet.

There are other USB network drivers which work differently, mostly older drivers for older chips ... thing is, to get the best throughput on a USB network device you need to maintain a queue of packets in the hardware, and only the usbnet framework does that.

Posted Jun 28, 2007 14:50 UTC (Thu) by arjan (subscriber, #36785) [Link]

if you do all work in the irq handler, latency will suck... remember that irq handlers often run with irq's disabled (and at minimum, it's own irq will not happen even if others might).

Offloading the "hard work" out of the hard irq handler means that you can service the hardware short and sweet, with the lowest latency possible. And that the longer taking work gets batched and processed effectively...

Posted Jun 29, 2007 21:15 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

But note that the latency that gets improved is the latency of processing interrupts, not the latency of anything a process does. When you consider that a tasklet can't sleep and runs before the CPU returns to regular process stuff, and limit your view to single CPU systems, it isn't as clear that rescheduling interrupt handling for a different time helps any latency. A program that gets interrupted still is not going to get control back until all that interrupt processing is done.

Here's the latency that gets improved: Consider 10 interrupts of the same class that happen one after another. The first 9 take 1ms to service and nobody's urgently waiting for the result. #10 only takes a microsecond, and if you don't respond within 1ms, expensive hardware will go idle. Without tasklets, those interrupts get serviced in order of arrival, so expensive hardware will be idle for 8 ms. With tasklets, you make the code for 1-9 reschedule their work to tasklets (only takes a microsecond to reschedule) and #10 completes in 10 microseconds, soon enough to keep the expensive hardware busy.

Posted Jun 30, 2007 6:47 UTC (Sat) by dlang (subscriber, #313) [Link]

with workqueues it's not the case that all the interrupt related processing must be completed before userspace gets a chance to run again. with tasklets that is the case. so the switch means that a userspace program that's waiting for some data doesn't need to keep getting delayed while the spu is handling other incomeing data.

