User: Password:
Subscribe / Log in / New account

Improving lost and spurious IRQ handling

Improving lost and spurious IRQ handling

Posted Jun 17, 2010 20:49 UTC (Thu) by tialaramex (subscriber, #21167)
In reply to: Improving lost and spurious IRQ handling by michaeljt
Parent article: Improving lost and spurious IRQ handling

Alerts must be actionable. If you tell me "Foo: Bar happened. Quux!" and I can't do anything about it ("throw your Foo away" is rarely an acceptable option) then I just feel like Linux is pointlessly screaming at me.

It's worth recording and making available the information to anyone who enquires, but that's about it. I'd say it's like the occasional deprecated API feature, those get mentioned in dmesg but they aren't (in any system I've seen) pushed to desktop notifiers etc., because users who can't run dmesg are most likely powerless to fix them, and generally an upstream will already know about it and be in the process of developing a fix.

(Log in to post comments)

Improving lost and spurious IRQ handling

Posted Jun 18, 2010 6:46 UTC (Fri) by jzbiciak (subscriber, #5246) [Link]

Proactive alerts would be bad, unless a massive failure is imminent, then go ahead and alert me. So, spurious interrupts? Don't tell me proactively. Hard-drive about to croak? Give me a pie in the face if you have to!

That said, it would be nice to have a "Why is it slow?" button that can go round up all the suspicious things it's seen lately, such as:

  • Spurious interrupts
  • Dropped interrupts
  • Kernel oopses that didn't panic the system
  • HD command timeouts. (Much more common for me back in the IDE days.)
  • Wacky numbers on my network interfaces (ie. gobs of dropped/collided/whatever packets)
  • Unusual temperature readings
  • ...etc, etc, etc.

Basically, round up anything vaguely suspicious and say "Uh, here," and maybe stop there. That is, aim it at a semi-expert or motivated tinkerer diagnosing a slow computer. Trying to give advice to less clued users based on some sort of expert system database is asking for trouble and confusion. Better to leave it somewhat opaque and leave it to the educated and motivated to interpret it.

A recent example from my Windows laptop: Video acceleration "dies" if I have VPN up and running while also running dual head. (At least, that's the only common factor I've identified.) I first noticed it because everything "got slow" to varying degrees. If I had a "Why's it slow?" button, it should put that event at the top of the list, even if it can't tell me what to do about it. On a previous laptop, it "got slow" due to HD timeouts. The list goes on. These spurious and dropped interrupts are natural candidates for such a list.

Improving lost and spurious IRQ handling 100% fix

Posted May 31, 2016 19:06 UTC (Tue) by stevedonato (guest, #109054) [Link]

Current Linux missing interrupts can cause the OS to hang because it does not have a perfected "Missing Interrupt Handler."
In my opinion a simple 100% fix is; kernel should include starting a timer prior to issuing any I/O request. If the timer interrupt pops and the I/O request has not YET completed the missing interrupt handler "MIH" can post an I/O error/timeout etc. back to the original requester of the I/O.
If the I/O completes normally the timer is canceled before returning to the task scheduler.
Starting a hardware Timer should no take any CPU time during it's wait time.
While addition code has to take into account the type of device and what is max timeout it should wait for etc. this is a simple table driven list of items.
IBM uses this approach in all operating systems

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds