Realtime using the PRU
Realtime applications on Linux are generally run on the RT_PREEMPT kernel, but Ron Birkett presented an alternative at the 2015 Embedded Linux Conference (ELC) in San Jose: using the programmable realtime unit (PRU) available on some ARM chips. It is, in fact, a popular alternative, as several of the Linux drone makers presenting at the conference were using the PRU to offload various realtime tasks from Linux. That was the main difference Birkett said he noticed from ELC Europe last October—there, he was the only one talking about PRUs.
Birkett introduced himself as a firmware developer, "not a Linux kernel guy", working on Sitara ARM processors for TI. The PRU was specifically designed as a special-purpose RISC processor to help with realtime requirements by minimizing latency response. Beyond that, it is "really cool" what the PRU can do when you hook it up to Linux.
Realtime does not mean "ultra fast", he reminded attendees. Instead, it means that there is determinism in the system; that events will happen when the system designer thinks they will happen. If you need to get up for work at 6am every day or lose your job, he said, that is a realtime requirement at some level. Typically, though, realtime means more than just deterministic latency; it also means that the response time will be quite low.
But the PRU has not always existed, and we have been making realtime systems for years. Why does the PRU do determinism better than other options? That is a question he was setting out to answer in the talk, he said.
Realtime is typically only one element of the complex systems we are building these days. Getting realtime response requires trading something off, which is throughput. But what if you want both throughput and realtime response? Normally, you can't have both.
The AM355x system-on-chip (SoC) family (as found in the BeagleBone Black) has a Cortex-A8 CPU that is designed for throughput. It has a long, deep pipeline and multiple levels of memory and cache. But cache throws away determinism, he said. That leads to people using a 2GHz processor to solve a 200MHz problem, because they have to look at the worst-case latencies with a cold cache, nothing in the processor pipeline, and slow RAM access.
Enter the PRU. Typically, there are two PRU cores, each with its own dedicated instruction and data RAM, without any caching. There is a dedicated interconnect between the two PRUs, with some shared RAM, an interrupt controller, and some peripherals. The PRU does connect out to the rest of the system, but you lose determinism when you do so. Putting everything required for the realtime portion of the application inside the PRU "box" gives assurances on access times. In the PRU, access to the instruction RAM takes one cycle and the data and shared RAM can be accessed in three cycles.
Each PRU core is a 32-bit RISC processor that runs at 200MHz. There is no pipeline and instructions are executed in a single 5ns cycle. He showed the system diagram for one particular flavor of PRU, which had a scratchpad to move the entire register set between the PRU cores, an interrupt controller, some dedicated peripherals (e.g. a UART), and a fast I/O interface that has 30 general-purpose (GP) inputs and 32 GP outputs per PRU core.
The GPIO controllers have direct access to the pins, unlike other processors where there are multiple levels of controllers and other hardware between the processor and the pin. There are several different input modes including a 16-bit parallel capture. By way of comparison, his team wrote a small program to simply blink an LED attached to a GP pin for both the CPU and the PRU. Looking at the output on an oscilloscope, the 2GHz Cortex-A8 could transition the pin in 200ns, while the PRU could do it in 5ns.
There are lots of different things that can be done with such a device, Birkett said. For drones, one PRU is often used to handle the radio-control interface, while the other is used to drive the pulse-width modulation (PWM) for the motors. There are also dedicated peripherals as part of the PRU that can be used as an extra UART, timer, or PWM controller that is accessible from the Cortex-A8.
The PRU does not support interrupts. Instead, it must poll the interrupt controller to determine if an interrupt has occurred. Polling is more deterministic; asynchronous interrupts can cause jitter in the execution time.
So the PRU makes a great complement to a high-end core like the Cortex-A8, he said. PRUs are available in the AM335x and AM437x and planned for more SoCs in the future.
An audience member asked about support for I2C on the PRU. Birkett said that it is easily done in software on the PRU, as are SPI and other communication protocols. There are no open-source implementations, yet, but there are plans to release code for those over time.
It is not possible to run Linux on the PRU—it doesn't make sense to do so even if you could. Linux will run on the main CPU and communicate with the PRU using interrupts or messages. There is only 8KB of instruction RAM available, so some kind of bare-metal stack in C or assembly makes the most sense. You could run a small realtime operating system (RTOS), but even that might be difficult.
There is a C compiler for the PRU available to use, though it is not free software. There have been a few years worth of work on optimization put into the compiler, so it generates "pretty good code" at this point, Birkett said. There is a GCC version available too, though it lacks the optimizations that the TI compiler provides.
Linux's role is to load the firmware for the program into the PRU's instruction RAM, initialize the resources (e.g. memory, interrupts) for the device, and manage its execution. Meanwhile, Linux can continue doing whatever general-purpose processing it needs to.
Linux and the PRU communicate using interrupts via the remoteproc framework or with messages using rpmsg on top of virtio. Birkett noted that he had often heard kernel developers say that new features should not be added, but that existing facilities should be enhanced, if possible, instead. That is why remoteproc and rpmsg were chosen to be supported for the PRU.
As Birkett noted at the outset, the PRU was mentioned in several other talks throughout the conference. Since Dronecode was one of the themes at ELC this year, and the BeagleBone Black was a common platform for drones, the PRU came up frequently. It frees Linux up to do other tasks, such as computer vision processing, mapping, navigation, video streaming, and the like. Since the realtime needs for drones tend to be small and specialized, offloading them to hardware targeted for that kind of task seems to make a great deal of sense. Other use cases are undoubtedly out there as well.
[I would like to thank the Linux Foundation for travel support to San Jose for ELC.]
| Index entries for this article | |
|---|---|
| Conference | Embedded Linux Conference/2015 |
