Subtle interactions in the embedded world - what bugs can teach us

February 22, 2012

This article was contributed by Neil Brown

I must admit that I love bugs. I also hate bugs which leads to a very conflicted relationship with the blighters, but for now I just want to focus on the positive - I love bugs. The bugs I am talking about are, of course, the incorrect behavior of software systems due to incorrect code and the reason that I love them is that they are so educational. There are a number of reasons for this, the significant one for the present being that they provide motivation and focus to read and study new code.

One of the freedoms that free and open source software provides is the freedom to study the source code. However having that freedom doesn't mean it's easy to use it. When faced with a large body of code such as the Linux kernel it can be hard to know where to start, or when to move on. There is no story-line, no curriculum, no plot to guide your study. This is where a bug comes in: it can provide a story line. This may be a particular sequence of events or a particular combination of features, or just a starting point to spiral out from. But by providing a clear focus and a clear finishing point, it makes study easier and more rewarding.

More than a phone

I have recently been spending some time trying to make the Linux kernel run well on the GTA04 "Phoenux" replacement motherboard for the Openmoko "Freerunner" and "Neo 1973" mobile phones. This is an attempt to revitalize the Openmoko effort to provide an open (or at least "as open as possible") platform for a mobile computing device. It fits in the same case, uses the same display and speakers, but provides more memory, faster processor, and faster networking plus a few other extras. During this (ongoing) effort I have discovered a number of bugs and missing features, most of which only came with smaller learning experiences. One, though, stands out for leading me on a somewhat longer path of discovery than most, and so I would like to share that path and those discoveries so that others might benefit. I should note first that the path is still open. I have not actually fixed the bug, so others could follow the path themselves. Anyone keen to do that should stop after the next paragraph, get a GTA04, and start hunting themselves. Others can read on.

The symptom of the bug is easy to describe. Like most computer systems the GTA04 has an RTC (real time clock) which has an alarm function. When the time in the clock matches the time in the alarm, it generates an interrupt. The problem is that, though this interrupt works perfectly while the system is awake, it does not reliably wake the device when it is asleep (i.e. in suspend-to-RAM state). Other interrupts configured much the same way (for example data received on the serial console) wake the system with 100% reliability. However the RTC alarm almost always fails. Understanding this bug leads on a path through the interrupt management code, the suspend management code, and even the USB code. But it must start with a brief description of the hardware.

In the GTA04 there are two particular integrated circuits (ICs) that are relevant. One is the OMAP3 SoC (system-on-a-chip) from TI. The other is the PMIC (power management IC) which is referred to as "twl4030" in the various drivers - various because, like the OMAP3, it is a multi-function chip combining diverse functions such as battery charging, keyboard control, and the USB electrical interface. There are a number of connections between these chips but only two are relevant for now. These are a single interrupt line and an I2C bus.

The interrupt line signals the interrupt controller (INTC) in the OMAP3 that something has happened. The I2C (inter-integrated-circuit) bus is a simple 2-wire bus that allows the CPU to read and write registers in the PMIC. In particular it can be used to find out which of the several functions generated the interrupt, and why. It can also be used to clear the source of the interrupt.

Having one interrupt line that indicates any of a number of possible events like this is not uncommon. For example the UARTs (Universal Asynchronous Receiver Transmitter - a serial port) in the OMAP3 each have a single interrupt, but it can be generated by a character arriving, a character transmit having completed, or an error condition. As with most such devices there is a register that can be inspected which has one bit for each possible interrupt source.

When all the interrupt sources are related to a single device and handled by a single driver this is all quite easy to manage. However when the different interrupt sources are related to logically separate devices, having to manage all these behind a single interrupt could get clumsy.

So many interruptions

To address this situation, Linux has an abstraction called an "irq_chip". An irq_chip represents a set of interrupt sources each of which is assigned an interrupt number (as listed in /proc/interrupts). It provides functions to enable or disable each interrupt, to allow the interrupt to wake the device from suspend, to set the trigger type (edge or level), and various other functions. It also arranges that the interrupt handler for each interrupt will be called when appropriate.

irq_chips with sources

Using this, the driver for the INTC in the OMAP3 defines an irq_chip for all the interrupts that it knows about directly, and the driver for the PMIC can define a separate irq_chip which describes the various interrupt sources in the PMIC. Each irq_chip will provide very different functions to configure and control the interrupt sources, but will provide a uniform interface to all device drivers. When the single PMIC interrupt arrives at the INTC, it will call an interrupt handler which will examine the PMIC registers and then call the appropriate handler that was registered with the second irq_chip. This way the drivers for the various parts of the PMIC don't need to know too much about each other - the irq_chip mediates between them.

In our PMIC there is a "Primary Interrupt Status Register" which has one bit for each module that can generate interrupts. Each of these modules has their own "Secondary Interrupt Status Register" to report what actually caused the interrupt, much like the UART described earlier. There is normally no need for a tertiary irq_chip to represent these registers as the one driver manages the whole module and all of its interrupts.

There are two exceptions, only one of which interests us. There is a "power interrupts" module in the PMIC which combines interrupts from a diverse range of sources: press of the power button, insertion of USB cable, over-temperature warning, and the RTC alarm. These are sufficiently varied that a separate irq_chip makes sense here. The twl4030-irq driver manages all this quite elegantly, providing a secondary irq_chip for the device as a whole, and supporting tertiary irq_chips, including for the Power Interrupts module.

So when the RTC alarm triggers, the primary interrupt is handled by the INTC. This calls an interrupt handler which inspects the PMIC, determines that a power interrupt is active and calls the relevant handler. It in turn examines the PMIC again and finds that the RTC alarm has fired, and so calls the handler that was registered for that particular interrupt. This all works perfectly. But it doesn't seem to wake the device from suspend. At least, not always. It seems that something happens on the way to suspend that interferes with this.

A funny thing happened on the way to suspend

One of the many things that happens on the way to suspend is that each individual interrupt gets disabled - unless it is flagged as IRQF_NO_SUSPEND in which case it is left alone. However this doesn't mean exactly what it sounds like it means. Being "disabled" just means that the handler routine will not be run, it doesn't mean that the interrupt will not be generated. We have a different word for that, which is "masking". When an interrupt is masked the originating source of the interrupt is told to never post that interrupt.

Linux uses a lazy scheme for disabling interrupts. When the disable request is made, the fact is recorded in internal data structures, but that is all. If the interrupt is subsequently delivered, only then might the interrupt be masked. This can be a useful optimization as masking an interrupt can take a lot longer than just setting a flag in memory.

So, on the way to suspend, interrupts are disabled but not masked. If the interrupt does actually arrive before we reach full suspend, the fact is recorded. If it was an interrupt that should wake from suspend, this is detected in check_wakeup_irqs() and suspend aborts. If the interrupt doesn't arrive before full suspend, then it is still unmasked and will successfully wake up the device, which will resume and then handle the interrupt. This might all seem a bit complex, but once it is fully understood it is actually quite neat and it works well ... except for my RTC alarm.

Who turned out the lights?

As promised we will need to glimpse at the USB code as well and now is the time to bring that in. Like most modern phones, the GTA04 can be attached to a computer or a charger via a USB port. When the USB interface detects 5 volts on the power line, it will route this to the battery charger (BCI) which will use it to charge the battery and power the device.

This is relevant because - when tracing is added to the twl4030-irq driver - we see that during suspend we get an interrupt from the PMIC which turns out to be due to the battery charger losing power and saying "Hey, just thought you should know that I'm running on battery again now". What is happening is that when the USB port is told to go to sleep it turns off its various power supply regulators. One of these (USB3V1) also powers the voltage comparator which determines if 5V is present. As soon as that presence is no longer signaled, the current stops being routed to the BCI, and the BCI justifiably complains.

This results in the interrupt line from the PMIC being asserted which causes the interrupt management code to run. This notices that the interrupt was disabled, and so masks it. When we get to check_wakeup_irqs(), the PMIC IRQ is pending, but as it has not been set for wakeup it is just ignored. The net result here is that all interrupts from the PMIC are ignored during suspend, so the RTC alarm doesn't work, the power button doesn't wake the device up, and plugging or unplugging the USB cable has no effect either.

So now that we understand the problem ...

So, what to do? We could just unplug the USB cable when testing the RTC alarm. Then the BCI would not notice the current disappearing and so wouldn't generate an interrupt. This certainly works - once. It doesn't seem to work a second time, but we'll leave that issue for the moment.

We could tell the USB controller not to turn off USB3V1 when the BCI is active and power is available on the USB bus. This is certainly a good idea and would mean that the battery can charge during suspend. It also brings up the interesting issue of how these separate drivers should communicate, but it doesn't really solve the general problem, just this specific one.

It could be that an interrupt occurs that we really do want to ignore during suspend, that we have even explicitly disabled (but due to lazy-masking has not been masked). So we want to be able to mask those interrupts without masking the main interrupt. A few moments reflection on what we have learned so far suggests that setting IRQF_NO_SUSPEND on the intermediate interrupts should help. That way they will not be disabled during suspend so the interrupt management will trace through the chain of irq_chips far enough to find the originating interrupt source and will mask just that. The other interrupt source will still be available.

This may sound convincing, but reality has a way of getting in the way: testing shows that just setting IRQF_NO_SUSPEND is not enough. Diving back into the code and the data sheets reveals an interesting and important fact. The PMIC allows each individual interrupt source to be masked, but it does not allow a whole module as a set of interrupt sources to be masked. So when an interrupt arrives from the BCI (one of the modules) the IRQ manager notices that interrupt is disabled and so the secondary irq_chip is asked to mask it. But as the irq_chip cannot mask it (the hardware doesn't support that), it simply ignores the request. You can imagine what happens next - an unending stream of attempts to mask the BCI interrupt, none of which are effective and the interrupt keeps re-firing.

The root problem seems to be that while the TWL4030 PMIC appears to have an interrupt structure that lends itself to being represented as a small tree of irq_chips, this isn't actually the case. The details of the behavior of suspend makes it essential that a working "mask" function be provided, and that cannot be provided for modules in the twl4030, only for individual interrupts. So it seems that a complete fix requires some change to the structure of the irq_chips in the twl4030 driver. Possibly we could make the structure simpler (no tree), or possibly we could make the implementation more clever in some way. The important point is that each interrupt that is visible to the Linux IRQ management code must support being masked in the hardware in some way.

Another funny thing on the way to resume

As mentioned, after one successful wake-from-alarm, the RTC alarm in the GTA04 doesn't work a second time. In fact it stops working completely, even when not suspended. It seems that something has gone wrong on the way to resume.

During resume all the interrupts that were disabled by suspend are re-enabled. The three interrupts which relate to the RTC - the base PMIC interrupt, the interrupt for the "power interrupts" module, and the actual RTC interrupt - are enabled in that order with PMIC first. As this interrupt is still asserted (as it is what triggered resume) the PMIC interrupt handler runs as soon as it is enabled. It reads the interrupt status register to see which module was responsible, and triggers the relevant interrupt for that module.

This secondary interrupt is still disabled so nothing happens. In particular the interrupt source is not cleared, nor is it masked, so we again get a repeated sequence of attempts to handle an interrupt which cannot be handled. This one isn't unending as the loop that enables all the interrupts is still running and it eventually gets to the one for the target module (the power interrupts module). This is triggered again and this time it actually does something. It reads the final status register (thus clearing the interrupt) and calls the interrupt handler for the RTC alarm. Unfortunately that is still disabled so nothing else happens.

Eventually the RTC interrupt is enabled but by now it is too late. In some cases an interrupt that fires while disabled gets the IRQS_PENDING flag set so that when it is re-enabled, the interrupt is resent. But that is not the case when the twl4030-irq interrupt controller tries to handle a tertiary IRQ. Whether this is a bug or a misunderstood feature is as yet unclear. Another journey of discovery would be needed to fully understand that issue. Meanwhile, because the interrupt is delivered but the interrupt service routine isn't run, the handshaking between the driver and the device gets confused and no other interrupt is ever seen from the RTC.

We can work around this problem by a simple expedient. When re-enabling the interrupts during resume, do it in the reverse order. This ensures that the RTC interrupt is enabled before it is called, and everything works. This even feels like the "right" thing to do - it is balanced for "enable" to happen in the reverse order of "disable". But even if it is "right", it is not really a fix. There would still be bugs lurking and ready to spring out again.

Journey's end

The point of this little tour is not to point out bugs in the code or to shame the relevant developers. In fact the code is in general quite elegant, well structured, largely very effective, and the developers are only to be congratulated. Rather, the point is to highlight the complexity of the task being tackled by developers working the embedded space:

There are complex interactions between distinct hardware. Here the USB interface and the battery charger have important interdependencies that need to be reflected in the drivers. Some interdependencies already are but there are subtleties that are easy to miss.
The requirements of an "irq_chip" are not really documented anywhere, and given the current rate of development there is a good chance that such documentation would be out of date. Extracting the requirements from the code is far from trivial. In this case a seemingly correct and obvious implementation was only found to be deeply wrong by a fairly obscure test case.
The IRQF_NO_SUSPEND flag is clearly important, but not easy to understand. It is fairly obvious what it does, but much less obvious why you would want to do that. I imagine that as a developer my approach would be to not set it until I found a bug which could be fixed by setting it. Then I would set it and hope it didn't break anything. This is not really a robust way to do development.

With complex hardware interactions and complex software dependencies it is not surprising that we seem to see a new class of problems in the embedded space. As we have observed, there are often ways to work around the bug rather than fix it, and that is usually much simpler and quicker. They may be tempting, but for this developer at least they are nowhere near as satisfying. Because if you follow the bug all the way to the root cause, and if you understand all that is happening and find the right fix, then you have not only helped other future developers, but have learned a whole lot in the process. And that is why I love bugs.

Index entries for this article
GuestArticles	Brown, Neil

Subtle interactions in the embedded world - what bugs can teach us

Posted Feb 23, 2012 7:39 UTC (Thu) by kleptog (subscriber, #1183) [Link] (2 responses)

Thanks for the interesting article. This is a situation I think almost unique to embedded systems: you have an interface (the hardware) which is essentially well known but unchangeable. Most unchangeable interfaces are also opaque.

I must admit too that I like fixing bugs more than writing new code. I just wish the bugs wouldn't appear on a production system just before I was about to go home. You learn more from bugs, but it's generally true that you learn more from mistakes than successes.

Not unique to embedded

Posted Feb 23, 2012 17:30 UTC (Thu) by hollis (subscriber, #6768) [Link]

This is definitely not unique to embedded systems, and really "Well known but unchangeable" is a better description of an x86 PC than an embedded system.

In embedded systems the hardware is often not well known. Maybe it's a brand new component, or maybe due to the variety of the ecosystem it's your first personal encounter with it.

Also, embedded hardware is more malleable -- there's freedom to select alternate components (even ones that don't conform to decades of PC tradition), or with in-house designs you can at least complain to the hardware designer.

Subtle interactions in the embedded world - what bugs can teach us

Posted Feb 23, 2012 18:12 UTC (Thu) by iabervon (subscriber, #722) [Link]

I personally learned a lot about the MSI interrupt mechanism for PCI and Linux interrupt handling in general by debugging a server which would start ignoring its hard drive interrupt if it didn't use the hard drive for more than a couple of hours.

Subtle interactions in the embedded world - what bugs can teach us

Posted Feb 24, 2012 2:54 UTC (Fri) by mhelsley (guest, #11324) [Link]

Assuming the bug is in a project I use regularly this is a great place to start. Adding features also works. I just think of an unusual string that I saw in the "interface" that's associated with the bug or feature. Unusual strings I look for incorporate infrequent words, long words, mispellings (beneficial!), non-idiomatic use of whitespace, etc. I avoid numbers since those tend to get printed with printf format strings (C-like code) or come from variables (the rest). I try to choose unusual strings that are as closely related to my desired bug/feature as possible. A well-crafted grep (and an SSD for large code repositories) will quickly find one or more of those and that's where I start reading the code.

The great thing about this method is I've already got context for this portion of the code without even reading it. I've used this code and know what behavior it produces (bug or not) -- often I even know enought about the data to help recognize variables and/or guess the likely control flow I care about. In fact, I suspect that adding a small feature is usually an easier route to start with the project's code. The problem is a small feature will not force me to gain a very deep understanding of the code like a difficult bug would -- so this is where it's critical to know my personal learning style and what I hope to get out of the experience (if anything more than an easy fix or small feature).

In contrast, without this context selecting a feature or fishing a bug out of bugzilla to get a cold read relies heavily on technical experience and conjecture to help me rapidly acquire context. For me hands-on experience with a project -- even non-technical hands-on experience -- really counts.

Of course this is also much harder if we're talking about code that's in lower layers of the "stack" (as with the kernel here). Those layers can often be reached with this method, a good packaging system (e.g. apt or yum on top of rpm or dpkg), and much patience.

Subtle interactions in the embedded world - what bugs can teach us

Posted Feb 24, 2012 20:20 UTC (Fri) by alvieboy (guest, #51617) [Link]

Neil: a huge thanks for this article. Most people forget that hardware and software actually need to interact in very subtle ways. Although I was not involved in GTA itself, I was once part of the #gta02-core team. And I do a lot of embedded stuff, but in different areas. Back then we struggled to get HW and SW to co-operate, which was never an easy task (and swsuspend/resume was one of them, but there were others) like patching kernel to disable SD card clock so to not interfere with GPS, and so on [a huge thanks to Werner Almesberger for his amazing work on the GTA/#GTA02-core project].

Just a note for some people: Soc-based platforms are, nowadays, very complex devices. All designs aim to cut costs, but they need to evolve and be adapted due to sourcing problems (see Raspberry Pi xtal problems, for example). Hardware people (like myself, but I'm both hardware and software) focus primarily on these [cost,source] problems, and delegate on software people the task of actually putting things to work. Unfortunately this can lead to problems, because the two (software and hardware) teams [usually] only talk to each other when the hardware design is already finished. I wished that, before specifying the HW specs, the SW guys could look at it and help minimizing integration problems.

Again, thanks a lot for your clear explanation of the problem.

Subtle interactions in the embedded world - what bugs can teach us

Posted Feb 27, 2012 20:52 UTC (Mon) by geuder (guest, #62854) [Link] (2 responses)

Looks like a very interesting article, although at least I cannot get all the details at the first reading.

However, I wonder how the author can just talk about about "suspend"? It's more than 2 years that I was struggling with OMAP3 documenentation, but I vaguely remember that there were 4(?) levels of suspend. Clocks and power are controlled separately. The deeper you suspend the more power you save, however at the price that waking up is more expensive (and maybe some interrupts are completely lost???). On top of that the SoC has numerous clock domains and power domains.

Whether this complexity has its part when some interrupt handling works sometimes but not always or always only once?

Subtle interactions in the embedded world - what bugs can teach us

Posted Feb 29, 2012 8:04 UTC (Wed) by neilbrown (subscriber, #359) [Link] (1 responses)

The various functions in the OMAP3 module do indeed have 4 levels of power management - Active, Idle, Retension, and Off. "Idle" means all clocks have been turned off, I don't really understand exactly what "retention" means. The other two should be self explanatory. As you say, the deeper levels use less power and take longer to wake up from.

The runtime power management - which for the CPU is called 'cpuidle' - attempts to switch to the lowest power state which still meets the current needs.

This is all done quite independently of "suspend".

"suspend" is an OS concept more than a hardware concept - at least it in on ARM. On X86 it is also a firmware concept imposed by the BIOS (once the APM bios, now the ACPI bios).

To Linux "suspend" means:
. - freeze all processes
. - give extra encouragement to devices to go to low power states
. - disable some interrupts

and wait for a non-disabled interrupt.

Some devices don't need extra encouragement, and will always power down as much as possible. Others might only powerdown when they are not in use (i.e. no process has the device file open), and the extra encouragement from suspend powers them down anyway. Others might stay powered on a little bit so they can wake the system up, and this might be configurable.

So when I was talking about "suspend" it was definitely the OS-level concept, not the chip-level concept.

Hope that helps.

Subtle interactions in the embedded world - what bugs can teach us

Posted Feb 29, 2012 18:31 UTC (Wed) by geuder (guest, #62854) [Link]

Thanks for your reply.

When reading the article again, I see that you indeed wrote "suspend to RAM".

My fault probably was that when reading an article about a phone and OMAP3 I was implicitly thinking N900. And as far as I know N900 does not use suspend to RAM at all.

So obviously your GTA04 software is more advanced in that respect.

I have no idea what is the state machine of the various OMAP domains with respect to those power management states when you wake up from suspend to RAM. But I'm quite sure that not everything is fully active instantaneously after the first pulse on the interrupt line. Whether the reason for the missed interrupt is buried somewhere there?

Subtle interactions in the embedded world - what bugs can teach us

Posted Mar 1, 2012 16:01 UTC (Thu) by mentor (guest, #80761) [Link] (1 responses)

I don't understand how the hardware setup prevents a mask operation succeeding. Each subchip has a full IMR; surely masking can be achieved by writing ~0 to all of them?

Subtle interactions in the embedded world - what bugs can teach us

Posted Mar 2, 2012 21:15 UTC (Fri) by neilbrown (subscriber, #359) [Link]

> Each subchip has a full IMR; surely masking can be achieved by writing ~0 to all of them?

True. Maybe this comes under:

> or possibly we could make the implementation more clever in some way

The point is/was that the fact that masking at every level is needed, but that need isn't at all obvious.

Doing something more clever with the masking may indeed be part of the solution to meet that need.

What bugs can teach us

Posted Mar 2, 2012 9:23 UTC (Fri) by ldo (guest, #40946) [Link]

“This is where a bug comes in: it can provide a story line.” This sounds like something I’ve told several different people who’ve asked me how to start learning about a large, existing code base. My advice to them was to have some specific purpose in mind: fix a particular bug, add a particular feature, something like that. This kind of thing helps to focus your mind as you beat your path through the software thicket, so that you aim for a specific destination instead of wandering off and getting lost.

Oh, and your details on tracking the problem down are the kind of thing I like to write about in my “Puzzle Page” series. :)