Porting Linux to a new architecture

By Jake Edge
May 7, 2014

While it's certainly not an everyday occurrence, getting Linux running on a new CPU architecture needs to be done at times. To someone faced with that task, it may seem rather daunting—and it is—but, as Marta Rybczyńska described in her Embedded Linux Conference (ELC) talk, there are some fairly straightforward steps to follow. She shared those steps, along with many things that she and her Kalray colleagues learned as they ported Linux to the MPPA 256 processor.

When the word "porting" is used, it can mean one of three different things, she said. It can be a port to a new board with an already-supported processor on it. Or it can be a new processor from an existing, supported processor family. The third alternative is to port to a completely new architecture, as with the MPPA 256 (aka K1).

With a new architecture comes a new CPU instruction set. If there is a C compiler, as there was for her team, then you can recompile the existing (non-arch) kernel C code (hopefully, anyway). Any assembly pieces need to be rewritten. There will be a different memory map and possibly new peripherals. That requires configuring existing drivers to work in a new way or writing new drivers from scratch. Also, when people make the effort to create a new architecture, they don't do that just for fun, Rybczyńska said. There will be benefits to the new architecture, so there will be opportunities to optimize the existing system to take advantage of it.

There are several elements that are common to any port. First, you need build tools, such as GCC and binutils. Next, there is the kernel, both its core code and drivers. There are important user-space libraries that need to be ported, such as libc, libm, pthreads, etc. User-space applications come last. Most people start with BusyBox as the first application, then port other applications one by one.

Getting started

To get started, you have to learn about the new architecture, she said. The K1 is a massively multi-core processor with both high performance and high energy efficiency, she said. It has 256 cores that are arranged in groups of sixteen cores which share memory and an MMU. There are Network-on-Chip interfaces to communicate between the groups. Each core has the same very large instruction word (VLIW) instruction set, which can bundle up to five instructions to be executed in one cycle. The cores have advanced bitwise instructions, hardware loops, and a floating point unit (FPU). While the FPU is not particularly important for porting the kernel, it will be needed to port user-space code.

To begin, you create an empty directory (linux/arch/k1 in her case), but then you need to fill it, of course. The initial files needed are less than might be expected, Rybczyńska said. Code is needed first to configure the processor, then to handle the memory map, which includes configuring the zones and initializing the memory allocators. Handling processor mode changes is next up: interrupt and trap handlers, including the clock interrupt, need to be written, as does code to handle context switches. There is some device tree and Kconfig work to be done as well. Lastly, adding a console to get printk() output is quite useful.

To create that code, there are a couple of different routes. There is not that much documentation on this early boot code, so there is a tendency to copy and paste code from existing architectures. Kalray used several as templates along the way, including MicroBlaze, Blackfin, and OpenRISC. If code cannot be found to fit the new architecture, it will have to be written from scratch. That often requires reading other architecture manuals and code—Rybczyńska can read the assembly language for several architectures she has never actually used.

There is a tradeoff between writing assembly code vs. C code for the port. For the K1, the team opted for as much C code as possible because it is difficult to properly bundle multiple instructions into a single VLIW instruction by hand. GCC handles it well, though, so the K1 port uses compiler built-ins in preference to asm inline functions. She said that the K1 has less assembly code than any other architecture in the kernel.

Once that is all in place, at some point you will get the (sometimes dreaded) "Failed to execute /init" error message. This is actually a "big success", she said, as it means that the kernel has booted. Next up is porting an init, which requires a libc. For the K1, they ported uClibc, but there are other choices, of course. She suggested that the first versions of init be statically linked, so that no dynamic loader is required.

Porting a libc means that the kernel-to-user-space ABI needs to be nailed down. At program startup, which values will be in what registers? Where will the stack be located? And so on. Basically, it required work in both the kernel and libc "to make them work together". System calls will also need to be worked on. Setting the numbers for the calls along with determining how the arguments will be passed (registers? stack?) is needed. Signals will need some work as well, but if the early applications being ported don't use signals, only basic support needs to be added, which makes things much simpler.

Kalray created an instruction set simulator for the K1, which was helpful in debugging. The simulator can show every single instruction with the value in each register. It is "handy and fast", Rybczyńska said, and was a great help when doing the port.

Eventually, booting into the newly ported init will be possible. At that point, additional user-space executables are on the agenda. Again she suggested starting out with static binaries. Work on the dynamic loader required "lots of work on the compiler and binutils", at least for the K1. Also needed is porting or writing drivers for the main peripherals that will be used.

Testing

Rybczyńska stressed that testing is "easily forgotten", but is important to the process. When changes are made, you need to ensure you didn't break things that were already working. Her team started by trying to create unit tests from the kernel code, but determined that was hard to do. Instead, they created a "test init" that contained some basic tests of functionality. It is a "basic validation that all of the tools, libc, and the kernel are working correctly", she said.

Further testing of the kernel is required as well, of course. The "normal idea" is to write your own tests, she said, but it would take months just to create tests for all of the system calls. Instead, the K1 team used existing tests, especially those from the Linux Test Project (LTP). It is a "very active project" with "tests for nearly everything", she said; using LTP was much better than trying to write their own tests.

Continuing on is just a matter of activating new functionality (e.g. a new kernel subsystem, filesystem, or driver), fixing things that don't compile, then fixing any functionality that doesn't work. Test-driven development "worked very well for us".

As an example, she described the process undertaken to port strace, which she called a nice debugging tool that is much less verbose than the instruction set simulator. But strace uses the ptrace() system call and requires support for signals. Up until that point, there had not been a need to support signals. The ptrace() tests in LTP were run first, then strace was tried. It compiled easily, but didn't work as there were architecture-specific pieces of the ptrace() code that still needed to be implemented.

Supporting a new architecture requires new code to enable the special features of the chip. For Kalray, the symmetric multi-processing (SMP) and MMU code required a fair amount of time to design and implement. The K1 also has the Network-on-Chip (NoC) subsystem, which is brand new to the kernel. Supporting that took a lot of internal discussion to create something that worked correctly and performed reasonably. The NoC connects the groups of cores, so its performance is integral to the overall performance of the system.

Once the port matures, building a distribution may be next up. One way is to "do it yourself", which is "fine if you have three packages", Rybczyńska said. But if you have more packages than that, it becomes a lot less fun to do it that way. Kalray is currently using Buildroot, which was "easy to set up". The team is now looking at the Yocto Project as another possibility.

Lessons learned

The team learned a number of valuable lessons in doing the port. To start with, it is important to break the work up into stages. That allows you to see something working along the way, which indicates progress being made, but it also helps with debugging. "Test test test", she said, and do it right from the beginning. There are subtle bugs that can be introduced in the early going and, if you aren't testing, you won't catch them early enough to easily figure out where they were introduced.

Wherever possible, use generic functionality already provided by the kernel or other tools; don't roll your own unless you have to. Adhere to the kernel coding style from the outset. She suggested using panic() and exit() in lots of places, including putting it in every non-implemented function. That will help not to waste time debugging problems that aren't actually problems. Code that won't compile if the architecture is unknown should be preferred. If an application has architecture dependencies, failing to compile is much easier to diagnose than some strange failure.

Spend time developing advanced debugging techniques and tools. For example, they developed a visualization tool that showed kernel threads being activated during the boot process. Reading the documentation is important, as is reading the comments in the code. Her last tip was that reading code for other platforms is quite useful, as well.

With that, she answered a few questions from the audience. The port took about two months to get it to boot the first init, she said, the rest "takes much more time". The port is completely self-contained as there are no changes to the generic kernel. Her hope is to submit the code upstream as soon as possible, noting that being out of the mainline can lead to problems (as they encountered with a pointer type in the tty functions when upgrading to 3.8). While Linux is not shipping yet for the K1, it will be soon. The K1 is currently shipping with RTEMS, which was easier to port, thus it filled the operating system role while the Linux port was being completed, she said.

Slides [PDF] from Rybczyńska's talk are available on the ELC slides page.

Index entries for this article
Conference	Embedded Linux Conference/2014

Porting Linux to a new architecture

Posted May 8, 2014 2:56 UTC (Thu) by billygout (guest, #70918) [Link]

Very interesting read!

Porting Linux to a new architecture

Posted May 9, 2014 22:46 UTC (Fri) by ana (guest, #41598) [Link]

Very interesting read, thank you.