Improving lost and spurious IRQ handling
One problem which is familiar to driver authors is missing interrupts. A driver will typically set up an I/O operation, get it started, then wait until an interrupt indicating completion arrives. If that interrupt never shows up, the driver can end up waiting for a very long time. Missing interrupts can have a number of causes, including flaky devices or an interrupt routing problem somewhere in the system. Either way, if the driver author has not anticipated this situation and taken the appropriate measures - setting a timeout, for example - things will not end well.
Waiting for interrupt timeouts will slow a device's performance considerably, though. That problem can be mitigated by polling the device state frequently, but rapid polling has its own costs. In an attempt to obtain the best results consistently, Tejun's patch adds a new driver API:
#include <linux/interrupt.h> struct irq_expect *init_irq_expect(unsigned int irq, void *dev_id); void expect_irq(struct irq_expect *exp); void unexpect_irq(struct irq_expect *exp, bool timedout);
A call to init_irq_expect() will allocate an opaque token to be used with the other two functions; it should be passed the interrupt number of interest and the same dev_id value as was used to allocate the interrupt initially. When the driver initiates an action which should result in a device interrupt, it should make a call to expect_irq(). When the operation is completed, unexpect_irq() should be called, with timedout indicating whether the operation timed out (the interrupt did not arrive). Note that it's not necessary for the driver to free the struct irq_expect structure; that will happen automatically when the interrupt is released.
A call to expect_irq() will initiate polling on the given interrupt line, where "polling" means making an occasional call to the device's interrupt handler. Initially, that polling is quite slow. If it turns out that the device is dropping interrupts (as indicated by the timedout parameter to unexpect_irq()), the polling frequency will be increased - up to once every millisecond. Working devices should interrupt before the slow poll period passes, so the result should be no real polling at all on reliable devices. If there is a problem with interrupt delivery, though, the kernel will automatically take responsibility for poking the interrupt handler when interrupts are expected.
This interface works well if the driver knows when to expect interrupts, but not all devices work that way. For hardware which can interrupt at any time, there is an "IRQ watching" API instead:
void watch_irq(unsigned int irq, void *dev_id);
This function will begin polling of the specified interrupt line; it will also initiate tracking of interrupt delivery status. If it determines that interrupts are being lost (as determined by an IRQ_HANDLED return status from a polled call to the handler), it will continue to poll at a higher frequency. Otherwise, eventually, interrupt delivery will be deemed to be reliable and polling will be turned off.
Tejun's patch also changes the way that the kernel responds to spurious interrupts - those which no driver is interested in. Current kernels count the number of interrupts on each line for which no handler returned IRQ_HANDLED; if 99,000 out of 100,000 interrupts are spurious, the kernel loses patience, disables the interrupt line forevermore, and starts polling the line instead. There is a real cost to this action, which is why the kernel allows spurious interrupts to get to such a high proportion of the total. Once the response is triggered, there is no going back, even if the spurious interrupts were the result of a brief hardware glitch.
With the adaptive polling mechanisms put into place to support the above features, the kernel is also able to take a more flexible approach to handling of spurious interrupts. 9,900 bad interrupts out of 10,000 are now enough to cause the spurious interrupt handling mechanism to kick in; as before, it disables the interrupt and begins polling. After a period, though, the new code will reenable the interrupt line, just to see what happens. If the source of spurious interrupts has stopped, the interrupt can be used as before. If, instead, spurious interrupts are still being delivered, the line will be blocked again for a longer period of time.
There has not been a lot of discussion of this patch set so far; one comment worried that polling could cause users
not to realize that there are problems in their systems. But Tejun says
that this kind of response is required to get reasonably solid behavior out
of flaky hardware, and nobody seems to want to challenge that claim. So it
seems fairly likely that a future version of this patch will find its way
into the mainline at some point.
Index entries for this article | |
---|---|
Kernel | Interrupts |
Posted Jun 17, 2010 7:41 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (3 responses)
Strangely enough my thought was the reverse - that the system would be well suited for gathering diagnostic information that could be used to alert users to potential problems.
Posted Jun 17, 2010 20:49 UTC (Thu)
by tialaramex (subscriber, #21167)
[Link] (2 responses)
It's worth recording and making available the information to anyone who enquires, but that's about it. I'd say it's like the occasional deprecated API feature, those get mentioned in dmesg but they aren't (in any system I've seen) pushed to desktop notifiers etc., because users who can't run dmesg are most likely powerless to fix them, and generally an upstream will already know about it and be in the process of developing a fix.
Posted Jun 18, 2010 6:46 UTC (Fri)
by jzbiciak (guest, #5246)
[Link]
Proactive alerts would be bad, unless a massive failure is imminent, then go ahead and alert me. So, spurious interrupts? Don't tell me proactively. Hard-drive about to croak? Give me a pie in the face if you have to!
That said, it would be nice to have a "Why is it slow?" button that can go round up all the suspicious things it's seen lately, such as: Basically, round up anything vaguely suspicious and say "Uh, here," and maybe stop there. That is, aim it at a semi-expert or motivated tinkerer diagnosing a slow computer. Trying to give advice to less clued users based on some sort of expert system database is asking for trouble and confusion. Better to leave it somewhat opaque and leave it to the educated and motivated to interpret it. A recent example from my Windows laptop: Video acceleration "dies" if I have VPN up and running while also running dual head. (At least, that's the only common factor I've identified.) I first noticed it because everything "got slow" to varying degrees. If I had a "Why's it slow?" button, it should put that event at the top of the list, even if it can't tell me what to do about it. On a previous laptop, it "got slow" due to HD timeouts. The list goes on. These spurious and dropped interrupts are natural candidates for such a list.
Posted May 31, 2016 19:06 UTC (Tue)
by stevedonato (guest, #109054)
[Link]
Posted Jun 18, 2010 21:20 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
Does this mean the device driver should call unexpect_irq()? How does it know the interrupt did not arrive, given that someone calls the driver's interrupt handler (because of expect_irq()) even if it didn't?
Improving lost and spurious IRQ handling
Improving lost and spurious IRQ handling
Improving lost and spurious IRQ handling
Improving lost and spurious IRQ handling 100% fix
In my opinion a simple 100% fix is; kernel should include starting a timer prior to issuing any I/O request. If the timer interrupt pops and the I/O request has not YET completed the missing interrupt handler "MIH" can post an I/O error/timeout etc. back to the original requester of the I/O.
If the I/O completes normally the timer is canceled before returning to the task scheduler.
Starting a hardware Timer should no take any CPU time during it's wait time.
While addition code has to take into account the type of device and what is max timeout it should wait for etc. this is a simple table driven list of items.
IBM uses this approach in all operating systems
Improving lost and spurious IRQ handling
When the operation is completed, unexpect_irq() should be called, with timedout indicating whether the operation timed out (the interrupt did not arrive)