|
|
Log in / Subscribe / Register

Realtime using the PRU

By Jake Edge
April 8, 2015

ELC 2015

Realtime applications on Linux are generally run on the RT_PREEMPT kernel, but Ron Birkett presented an alternative at the 2015 Embedded Linux Conference (ELC) in San Jose: using the programmable realtime unit (PRU) available on some ARM chips. It is, in fact, a popular alternative, as several of the Linux drone makers presenting at the conference were using the PRU to offload various realtime tasks from Linux. That was the main difference Birkett said he noticed from ELC Europe last October—there, he was the only one talking about PRUs.

Birkett introduced himself as a firmware developer, "not a Linux kernel guy", working on Sitara ARM processors for TI. The PRU was specifically designed as a special-purpose RISC processor to help with realtime requirements by minimizing latency response. Beyond that, it is "really cool" what the PRU can do when you hook it up to Linux.

[Rob Birkett]

Realtime does not mean "ultra fast", he reminded attendees. Instead, it means that there is determinism in the system; that events will happen when the system designer thinks they will happen. If you need to get up for work at 6am every day or lose your job, he said, that is a realtime requirement at some level. Typically, though, realtime means more than just deterministic latency; it also means that the response time will be quite low.

But the PRU has not always existed, and we have been making realtime systems for years. Why does the PRU do determinism better than other options? That is a question he was setting out to answer in the talk, he said.

Realtime is typically only one element of the complex systems we are building these days. Getting realtime response requires trading something off, which is throughput. But what if you want both throughput and realtime response? Normally, you can't have both.

The AM355x system-on-chip (SoC) family (as found in the BeagleBone Black) has a Cortex-A8 CPU that is designed for throughput. It has a long, deep pipeline and multiple levels of memory and cache. But cache throws away determinism, he said. That leads to people using a 2GHz processor to solve a 200MHz problem, because they have to look at the worst-case latencies with a cold cache, nothing in the processor pipeline, and slow RAM access.

Enter the PRU. Typically, there are two PRU cores, each with its own dedicated instruction and data RAM, without any caching. There is a dedicated interconnect between the two PRUs, with some shared RAM, an interrupt controller, and some peripherals. The PRU does connect out to the rest of the system, but you lose determinism when you do so. Putting everything required for the realtime portion of the application inside the PRU "box" gives assurances on access times. In the PRU, access to the instruction RAM takes one cycle and the data and shared RAM can be accessed in three cycles.

Each PRU core is a 32-bit RISC processor that runs at 200MHz. There is no pipeline and instructions are executed in a single 5ns cycle. He showed the system diagram for one particular flavor of PRU, which had a scratchpad to move the entire register set between the PRU cores, an interrupt controller, some dedicated peripherals (e.g. a UART), and a fast I/O interface that has 30 general-purpose (GP) inputs and 32 GP outputs per PRU core.

The GPIO controllers have direct access to the pins, unlike other processors where there are multiple levels of controllers and other hardware between the processor and the pin. There are several different input modes including a 16-bit parallel capture. By way of comparison, his team wrote a small program to simply blink an LED attached to a GP pin for both the CPU and the PRU. Looking at the output on an oscilloscope, the 2GHz Cortex-A8 could transition the pin in 200ns, while the PRU could do it in 5ns.

There are lots of different things that can be done with such a device, Birkett said. For drones, one PRU is often used to handle the radio-control interface, while the other is used to drive the pulse-width modulation (PWM) for the motors. There are also dedicated peripherals as part of the PRU that can be used as an extra UART, timer, or PWM controller that is accessible from the Cortex-A8.

The PRU does not support interrupts. Instead, it must poll the interrupt controller to determine if an interrupt has occurred. Polling is more deterministic; asynchronous interrupts can cause jitter in the execution time.

So the PRU makes a great complement to a high-end core like the Cortex-A8, he said. PRUs are available in the AM335x and AM437x and planned for more SoCs in the future.

An audience member asked about support for I2C on the PRU. Birkett said that it is easily done in software on the PRU, as are SPI and other communication protocols. There are no open-source implementations, yet, but there are plans to release code for those over time.

It is not possible to run Linux on the PRU—it doesn't make sense to do so even if you could. Linux will run on the main CPU and communicate with the PRU using interrupts or messages. There is only 8KB of instruction RAM available, so some kind of bare-metal stack in C or assembly makes the most sense. You could run a small realtime operating system (RTOS), but even that might be difficult.

There is a C compiler for the PRU available to use, though it is not free software. There have been a few years worth of work on optimization put into the compiler, so it generates "pretty good code" at this point, Birkett said. There is a GCC version available too, though it lacks the optimizations that the TI compiler provides.

Linux's role is to load the firmware for the program into the PRU's instruction RAM, initialize the resources (e.g. memory, interrupts) for the device, and manage its execution. Meanwhile, Linux can continue doing whatever general-purpose processing it needs to.

Linux and the PRU communicate using interrupts via the remoteproc framework or with messages using rpmsg on top of virtio. Birkett noted that he had often heard kernel developers say that new features should not be added, but that existing facilities should be enhanced, if possible, instead. That is why remoteproc and rpmsg were chosen to be supported for the PRU.

As Birkett noted at the outset, the PRU was mentioned in several other talks throughout the conference. Since Dronecode was one of the themes at ELC this year, and the BeagleBone Black was a common platform for drones, the PRU came up frequently. It frees Linux up to do other tasks, such as computer vision processing, mapping, navigation, video streaming, and the like. Since the realtime needs for drones tend to be small and specialized, offloading them to hardware targeted for that kind of task seems to make a great deal of sense. Other use cases are undoubtedly out there as well.

[I would like to thank the Linux Foundation for travel support to San Jose for ELC.]

Index entries for this article
ConferenceEmbedded Linux Conference/2015


to post comments

Realtime using the PRU

Posted Apr 9, 2015 17:28 UTC (Thu) by speedster1 (guest, #8143) [Link]

Thanks for the article -- this is one of the talks that I really wanted to see, but had to choose among 3 great topics running simultaneously!

Realtime using the PRU

Posted Apr 13, 2015 9:30 UTC (Mon) by petur (guest, #73362) [Link] (5 responses)

I like this a lot, but it needs beefier options for the PRU. Something like an M3-class controller with loads of IO and I2C/SPI.
Currently I see projects have 2 processors on board, one to handle all the realtime stuff, one to do the high-level stuff like communication, UI,... and for the ones I know about the PRU doesn't fit in.

Realtime using the PRU

Posted Apr 13, 2015 14:41 UTC (Mon) by Funcan (guest, #44209) [Link] (2 responses)

M3 class is far harder to do realtime stuff with, as mentioned in the article, interrupts, caches and the like that give you the beef play hell with the realtime response.

Realtime using the PRU

Posted Apr 13, 2015 15:05 UTC (Mon) by petur (guest, #73362) [Link] (1 responses)

M3 has no cache, and the interrupt controller gives you pretty good control over the stuff that matters to you. I wouldn't know what the problem would be to do hard real-time stuff with it, works for me (currently using in the form of an LPC175x).

Realtime using the PRU

Posted Apr 13, 2015 15:52 UTC (Mon) by pizza (subscriber, #46) [Link]

Also, the Cortex-M family takes 12 cycles from interrupt assertion to running the first instruction of the IRQ handler -- so if the programmer sets up the interrupt priorities properly, it's suitable for most realtime work.

ARM does have the Cortex-R family that is specifically targeting realtime applications. Most of the differences revolve around interrupt and fault handling to improve determinism, but it also includes performance counters so that you can accurately profile your code.

Realtime using the PRU

Posted Apr 16, 2015 17:53 UTC (Thu) by dfsmith (guest, #20302) [Link]

The PRU is designed for "extreme" real time, where the Arm M* is really for embedded controlling with looser timing tolerance.

For example, I used the 2 Sitara PRUs to capture a 16-bit+clock data stream at 48MHz (the PRU's clock is 4*50MHz, so it was semi-asynchronous) and feed it into the ARM A8 system DRAM for a camera image recognition application. That's the kind of real time we're talking for these units.

Realtime using the PRU

Posted Apr 16, 2015 19:36 UTC (Thu) by Russ.Dill@gmail.com (guest, #52805) [Link]

AM335x and AM437x also have an M3 if you want to experiment.


Copyright © 2015, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds