|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for March 22, 2018

Welcome to the LWN.net Weekly Edition for March 22, 2018

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

A "runtime guard" for the kernel

By Jake Edge
March 21, 2018

While updating kernels frequently is generally considered a security best practice, there are many installations that are unable to do so for a variety of reasons. That means running with some number of known vulnerabilities (along with an unknown number of unknown vulnerabilities, of course), so some way to detect and stop exploits for those flaws may be desired. That is exactly what the Linux Kernel Runtime Guard (LKRG) is meant to do.

LKRG comes out of the Openwall project that is perhaps best known for its security-enhanced Linux distribution. Alexander Peslyak, or "Solar Designer", who is Openwall's founder and leader is prominent in security circles as well. He announced LKRG at the end of January as "our most controversial project ever". The 0.0 release that was announced was "quite sloppy", Peslyak said in a LKRG 0.1 release announcement on February 9; principal developer Adam "pi3" Zabrocki cleaned things up and added some new features based on ten days of feedback.

At its core, LKRG is a loadable kernel module that tries to detect changes to the running kernel that would indicate that some kind of exploit is being used against it. Beyond that, it checks processes running on the system to look for unauthorized modifications to credentials of various sorts in order to prevent those changes from granting additional access—something that exploits will try to do. The initial LKRG announcement describes the goals this way:

While LKRG defeats many pre-existing exploits of Linux kernel vulnerabilities, and will likely defeat many future exploits (including of yet unknown vulnerabilities) that do not specifically attempt to bypass LKRG, it is bypassable by design (albeit sometimes at the expense of more complicated and/or less reliable exploits). Thus, it can be said that LKRG provides security through diversity, much like running an uncommon OS kernel would, yet without the usability drawbacks of actually running an uncommon OS.

As noted, LKRG can be bypassed, so it is really only another line of defense in a defense-in-depth strategy, rather than a panacea of any sort. In addition, it currently is in an experimental stage (as the version numbers might indicate), so it only logs any kernel modifications that it finds. The kernel is replete with various types of self-modifying code, from tracepoints and other debugging features to optimizations of various sorts, so protecting the integrity of the running kernel is not a straightforward task.

To track the running kernel, LKRG creates a database of hashes of various types of information about the system and the kernel running on it. It tracks the CPUs available and active in the system, along with the location and contents of their interrupt descriptor tables (IDTs) and model-specific registers (MSRs). Since the kernel may modify itself due to changes in the number of CPUs hotplugged into (or unplugged from) the system, LKRG must be ready to recalculate some of its hashes based on those events.

For the kernel, LKRG tracks the hashes of the .text section, the .rodata section (which should never change), and the exception table. Beyond that, each loaded module is tracked, including information like its struct module pointer, name, size and hash of its .text section, and some other module-specific information. The details of that are described on the LKRG wiki.

In order to detect modifications, the values stored need to be validated regularly. This is done via a number of mechanisms, starting with a timer that checks at regular intervals; the period can be set via the sysctl interface. It also runs the check whenever module-loading or CPU-hotplug activity is detected and can be triggered manually by way of another sysctl. Other events in the system (e.g. CPU idle, network activity, USB change, etc.) will trigger the validation, though only a certain percentage of the time to reduce the performance impact. For example, CPU idle will trigger validation 0.005% of the time while a USB change will do so 50% of time.

All of that is meant to protect the integrity of the running kernel itself, but exploits often target the processes running on the system in order to elevate privileges and the like; that information lives in the kernel's read-write memory. So LKRG also tracks a whole bunch of different attributes of each process and maintains its own task list that can be used to validate the kernel's list. If the two diverge, affected processes are killed; the intent is to do so before they can take advantage of the differences.

The tracking consists of task attributes like the address of the task_struct, process name and ID, the addresses of the cred and real_cred credential structures, the various user and group IDs associated with it, SELinux settings, and "secure computing" (seccomp) settings and configuration. Various other things are tracked currently (e.g. capabilities information) but not validated.

All of that information is validated every time certain system calls (e.g. setuid(), execve()) or other events happen in the system (e.g. when permissions are checked prior to opening a file). In addition, the process-list validation is done every time the kernel validation is run. All processes are validated each time, not just the one making the system call, and any discrepancy results in killing any process that has differences.

The wiki page shows tests of LKRG to detect exploits of some known kernel vulnerabilities (e.g. CVE-2014-9322, CVE-2017-6074); both of those were detected, as were a few others. The performance impact has been measured in a rudimentary way: a system running LKRG 0.0 was about 6.5% slower building Openwall's John the Ripper password cracker. Performance optimizations have not been a focus yet, but: "We find this performance impact significant (especially for a security measure bypassable by design), and are likely to make adjustments to reduce it."

There are certain kinds of kernel vulnerabilities that LKRG cannot detect. If the exploit functions entirely in user space (perhaps by exploiting a kernel race condition like Dirty COW), it won't modify the parts of the kernel that are being tracked, thus won't trigger LKRG. The home page describes it this way:

However, it wouldn't be expected to detect exploits of CVE-2016-5195 (Dirty COW) since those directly target the userspace even if via the kernel. While in case of Dirty COW the LKRG "bypass" happened due to the nature of the bug and this being the way to exploit it, it's also a way for future exploits to bypass LKRG by similarly directly targeting userspace. It remains to be seen whether such exploits become common (unlikely unless LKRG or similar become popular?) and what (negative?) effect on their reliability this will have (for kernel vulnerabilities where directly targeting the userspace isn't essential and possibly not straightforward).

LKRG is available for x86 and x86-64 and, because it is a kernel module rather than a set of patches, it will build for a wide variety of kernel versions. It can be built for the RHEL 7 kernel, which is based on 3.10, and it will also build for the mainline (4.15). The project has a mailing list for questions and support, though it is rather quiet; there are only a few postings from January and February at this point.

It is clearly a niche project and one that may not really find many users. For some installations, it could provide another level of defense, but it means those users are probably not keeping up with their kernel updates. Given that LKRG can be bypassed and that it certainly can't detect all kinds of kernel exploits, it may provide a false sense of security. But for organizations that carefully consider the threat model for LKRG and their own needs, there is value to be found in LKRG. Whether there is enough value to sustain a project (and perhaps allow Openwall to provide a non-free "LKRG Pro" version) remains to be seen.

Comments (2 posted)

Super long-term kernel support

By Jonathan Corbet
March 19, 2018

ELC
Some years ago, prominent community leaders doubted that even short-term stable maintenance of kernel releases was feasible. More recently, selecting an occasional kernel for a two-year maintenance cycle has become routine, and some kernels, such as 3.2 under the care of Ben Hutchings, have received constant maintenance for as much as six years. But even that sort of extended maintenance is not enough for some use cases, as Yoshitake Kobayashi explained in his Embedded Linux Conference talk. To meet those needs, the Civil Infrastructure Platform (CIP) project is setting out to maintain releases for a minimum of 20 years.

CIP, he said, is one of the most conservative projects out there, but also one of the most important. It is working to create a stable base layer for civil infrastructure systems. It is not trying to create a new distribution. Civilization runs on Linux. Infrastructure we all count on, including that dealing with transportation, power generation, and more, is Linux based. If those systems fail, we will have serious problems. But this kind of infrastructure runs on a different time scale than a typical Linux distribution. The development time required just to place such a system in service can approach two decades, and the system itself can then stay in service for 25-60 years.

The computing systems that support this infrastructure must thus continue to work for a long time. It must be based on "industrial-grade" software that is able to provide the required level of reliability, robustness, and [Yoshitake
Kobayashi] security. But the systems supporting civil infrastructure also must be brought up to current technology levels. Until now, the long-term support needed to keep them running has been done by individual companies, with little in the way of shared effort. That has kept these systems functional, but it is an expensive approach that tends to lag behind the current state of the technology.

The way to do a better job, Kobayashi said, is to put together a collaborative framework that supports industrial-grade software while working with the upstream development communities as much as possible. That is the role that the CIP was created to fill. There are currently seven member companies supporting CIP, with Moxa being the latest addition. They are supporting the project by contributing directly to upstream projects and funding work that advances CIP's objectives.

CIP is currently focused on the creation of an open-source base layer consisting of a small number of components, including the kernel, the GNU C library, and BusyBox. Distributors will be able to build on this base as is needed, but CIP itself is starting small. The primary project at the moment is the creation of the super-long-term support (SLTS) kernel which, it is hoped, can be supported for at least ten years; as experience with extra long-term support grows, future kernels will have longer periods of support. The first SLTS kernel will be based on the 4.4 LTS release and will be maintained by Ben Hutchings; the 4.4.120-cip20 release came out on March 9.

For the most part, the CIP SLTS kernel will be based on vanilla 4.4, but there are some additions being made. The latest Meltdown and Spectre fixes are being backported to this kernel for example, as are some of the hardening patches from the Kernel Self-Protection Project. Support for some Siemens industrial-control boards is being added. Perhaps the most interesting enhancement, however, is the realtime preemption patch set, which is of interest for a number of the use cases targeted by the CIP project. CIP has joined the realtime preemption project as a member and is planning to take over the maintenance of the 4.4-rt kernel. The first SLTS kernel with realtime support was released in January.

In general, the project's policy will be to follow the upstream stable releases for as long as they are supported. Backports from newer kernels are explicitly allowed, but they must be in the mainline before being considered for addition to an SLTS kernel. New kernel versions will be released every four-to-six weeks. There is an explicit policy of non-support for out-of-tree drivers; distributors and users can add them, of course, but any bugs must be demonstrated in a pristine SLTS kernel before the CIP project will act on them.

A new major kernel release will be chosen for super-long-term support every two or three years. The project is currently thinking about which release will be the base for the next SLTS kernel; for obvious reasons, alignment with upstream LTS choices is important. There will be a meeting at the Japan Open Source Summit to make this decision.

There is some initial work on testing infrastructure based on the "board at desk" model; the testing framework is based on the kernelci.org infrastructure. Future work includes collaboration with other testing efforts, more frequent test coverage, and support for container deployment on SLTS-based systems. Debian has been chosen as the primary reference distribution for CIP systems, and all of the CIP core packages have been taken from Debian. As part of this effort, CIP is supporting the Debian-LTS effort at the platinum level.

The CIP core effort is working on the creation of installable images consisting of a small subset of Debian packages and the CIP SLTS kernel. This work can be found on GitLab. CIP is working with Debian to provide longer-term support of a subset of packages, to improve cross-compilation support, and to improve the sharing of DEP-5 license information.

In the longer-term, CIP is looking toward IEC-62443 security certification. That is an ambitious goal and CIP can't get there by itself, but the project is working on documentation, test cases, and tools that will hopefully help with an eventual certification effort. Another issue that must be on the radar of any project like this is the year-2038 problem, which currently puts a hard limit on how long a Linux system can be supported. CIP is working with kernel and libc developers to push solutions forward in this area.

Someday CIP hopes to work more on functional safety issues and to come up with better solutions for long-term software updates. The project has just joined the EdgeX Foundry to explore what common ground may be found with that group. Clearly, the CIP project has a lot of issues on its plate; it seems likely that we will be hearing about this project for a long time.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting your editor's travel to ELC.]

Comments (74 posted)

Two perspectives on the maintainer relationship

By Jonathan Corbet
March 20, 2018

ELC
Developers and maintainers of free-software projects are drawn from the same pool of people, and maintainers in one project are often developers in another, but there is still a certain amount of friction between the two groups. Maintainers depend on developers to contribute changes, but the two groups have a different set of incentives when it comes to reviewing and accepting those changes. Two talks at the 2018 Embedded Linux Conference shed some light on this relationship and how it can be made to work more smoothly.

A developer's view

Gilad Ben-Yossef is the developer of the TrustZone CryptoCell device driver, which is slated to move into the mainline in the 4.17 cycle after a period in the staging tree. Staging, he said, is a place for code that is "on probation", waiting to see if it can be brought up to the kernel's standards for mainline code. Getting the CryptoCell driver out of staging was a sometimes difficult and frustrating process but, at the end, he came to a surprising realization: during that process, 30% of the code had been removed from the driver with no loss in functionality. This was a good thing and a nice benefit from the kernel's code-review processes, but it was also discouraging in that 30% of the code he started with was worthless — or worse.

So where did all of this extra code come from? He identified seven different types of useless code, the first of which was described as "reinventing the wheel". Much of the code in the driver was reimplementing functionality that was already available elsewhere in the kernel. When [Gilad Ben-Yossef] looking at code, he said, developers should always be asking themselves whether the problem being solved is truly unique to the subsystem in question. If not, it's time to look at what other subsystems do; sometimes a useful API will be found in the process. At other times, of course, it falls on the developer to create that API.

Much of the code in the CryptoCell driver was there for backward compatibility, using what he called a "back to the future" approach where a lot of #ifdef blocks were employed to isolate kernel-version-specific code. The better approach, he found, was to simply base the driver on the current mainline kernel and to have that driver support all versions of the device in question. That driver can then be backported to older kernel versions if necessary, with help from the tools at backports.wiki.kernel.org. The process is mostly automated and nearly painless, and it allows a lot of code to be removed from the upstream driver.

Another source of code bloat is the use of the wrong API. The CryptoCell driver was using sysfs to export what was essentially debugging information, for example. Switching to debugfs accomplished the same thing with ¼ of the code.

The next problem was "duct tape". The crypto layer supports both synchronous and asynchronous APIs, and drivers can implement one or both of those interfaces. The CryptoCell driver supported the asychronous API, which is a better fit to the hardware, but some of the more important users (such as the dm-verity target) expected the synchronous API. Supporting both in CryptoCell led to an "ugly and unstable" solution. The better solution was to convert dm-verity to the asynchronous API. The lesson here is to fix problems at their source rather than working around them in driver code.

The driver included a fair amount of "macro gymnastics" that were mostly the result of trying to implement a hardware abstraction layer. These layers have a bad reputation in the kernel community; they tend to lead to poor performance and driver code that is harder to maintain. The solution was to simply rip all of that stuff out. Whenever you see a lot of wrappers, he said, you should be asking whether they make sense.

"Zombie code" is surprisingly prevalent, despite the normal expectations that a maintained driver wouldn't have much unused code. Some of the code that was found had never been used. This code should always come out; removed code cannot be used against you, he said. This is especially important given the Spectre vulnerability, where even code that is never called can be made to execute speculatively.

Finally, "don't repeat yourself" is "programming 101", but unneeded repetition of code tends to happen anyway. Copy-and-paste is a way of life. Such code should be pulled together and generalized as needed; the result will be a much shorter driver.

While Ben-Yossef talked mostly about the ways in which he was able to reduce the size of the driver, the key point from the talk was something else. Taking out all of that code improved the quality of the driver considerably, and all of those benefits were a direct result of the process of upstreaming the driver into the mainline kernel. Upstreaming is not just the right thing to do; it has a huge positive effect on the quality of the code itself. It is a way to get the attention of a community of experts, all of whom are working to improve the quality of the kernel altogether. While the process was frustrating at times, it was all worthwhile in the end.

The maintainer's paradox

Tim Bird has been a member of the embedded Linux community for a long time, but he had not worked in a maintainer role until he took over responsibility for the Fuego project. That gave him a different perspective on how the community works that he shared in a keynote presentation.

As a maintainer, he is excited to see new contributions to the project show up on the mailing lists. He appreciates new ideas and new contributors. But he also approaches new contributions with a bit of fear and [Tim Bird] trepidation; a new patch set can create an instinctive "oh no" response. The problem is mostly a matter of finding the time to do a proper job of looking at each contribution. He wants to do it well, providing careful review and appropriate feedback, but doing that turns out to be a lot of work. He had set a goal for himself to respond to all patches within 24 hours; while that was a great goal, it was also totally unrealistic.

The experience of being a maintainer, he said, can be overwhelming; his experience has caused him to rethink the times that he has gotten annoyed with maintainers in the past. A patch contribution is a bit like getting a puppy; we all want one, but don't always think about how much work is involved.

The community, he said, likes to think that its decisions are based entirely on merit. In truth, there is also an important social element involved in working in the development community. Despite our efforts to just review code and judge it on its own merits, there are personalities involved. As a result, we see snippy answers and negative things happening in our communication channels. He referred to Daniel Vetter's recent talk as a description of how things can go wrong, while noting that he didn't agree with everything that was said there.

There are a few things we can do to help reduce the amount of negativity in our community, he said. We should call out negative communication when we see it. That is sometimes best done in private, but it can sometimes be necessary to do it publicly when things get especially bad. In the presence of continued problems, the best thing is often to route around persistent offenders. That is relatively easy to do in subsystems that have group maintainership, a model which not only gives contributors multiple paths to acceptance but also relieves the stress on the maintainers themselves.

Maintainers and developers should always listen carefully and make active efforts to clarify the message they are receiving. Many problems have simple miscommunication at their root, he said. Any developer can assist maintainers by answering questions when they are asked and helping other developers in general. And, finally, developers should become maintainers as well, so that they, too, can "enjoy this overwhelming feeling".

Bird closed with a plea to everybody in the room: find something to do in the community. There are roles for everybody out there, and there is no shortage of work to be done. In his 25 years of experience, he has never found anything quite like the kernel community. It is a place where anybody can contribute and create value for humanity as a whole.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting your editor's travel to ELC.]

Comments (3 posted)

The Sound Open Firmware project launches

By Jonathan Corbet
March 21, 2018

ELC
It is an increasingly poorly kept secret that, underneath the hood of the components that most of us view as "hardware", there is a great deal of proprietary software. This code, written by anonymous developers, rarely sees the light of day; as a result, it tends to have all of the pathologies associated with software that nobody can either review or fix. The 2018 Embedded Linux Conference saw an announcement for a new project that, with luck, will change that situation, at least for one variety of hardware: audio devices.

Intel's Imad Sousou took to the stage at ELC to introduce a couple of new projects. One of them was inspired by the MinnowBoard project, which is working to create development boards based entirely on free software. That goal [Imad Sousou] has been almost completely achieved, he said, but there is an exception in that the board requires a number of proprietary firmware components. The desire to fix that problem led to the creation of the Sound Open Firmware (SOF) project.

The objective behind SOF is to create open-source firmware for audio and video-related devices. This is, Sousou said, an area that has typically been dominated by proprietary software. This project has been jointly launched with Google; there are, he suggested, plenty of opportunities for contributors who might want to join this effort. The code is released under a combination of the BSD and GPLv2 licenses.

Sousou's talk stopped there, but a bit of digging can turn up some more information. The project has a page on the ALSA project's wiki describing where to find the software and how to get started with it. There is a software-development kit (SDK) to install; building the firmware also requires a cross-compiler for the Xtensa architecture. The SDK includes an emulator that can be used during the development process; the SOF web site also notes that "proprietary compilers and emulators" are available. Currently, only Xtensa-based digital signal processors are supported, but the project's intent is to eventually support a wider range of hardware.

The firmware itself is based on its own miniature kernel; it would appear that a new kernel was developed rather than adapting one of the (many) existing tiny kernels in circulation. It includes an earliest-deadline first scheduler and supports basic concepts like memory allocation, interrupt handling, work queues, and more. Using that, the firmware runs an "audio task" whose job is to move data between the DMA buffers and the audio-processing components.

On the kernel side, SOF appears to present a semi-standard interface that all drivers can use, but there is still a significant amount of device-specific code required. The code to support SOF-based devices can be found in the ALSA system-on-chip tree (git://git.kernel.org/pub/scm/linux/kernel/git/lrg/asoc.git). It is not currently in the mainline; it also does not appear in linux-next, so it will almost certainly not make an appearance before the 4.18 development cycle. There is a mailing list with a fair amount of traffic; almost all of it comes from Intel addresses, which is unsurprising for a project that has only just been opened up to the world.

This project is in its infancy (despite the nearly 18 months of history on the mailing list), but it has the potential to make things better in a number of ways. Manufacturers that participate in it may end up with higher-quality firmware with less effort. Developers interested in doing new things with audio hardware should find SOF-based devices to be a good starting point. And, if the project succeeds, we'll all have systems running a bit more free software at the lowest levels, which seems like a good thing.

(For the curious, the other project announced in this talk was the ACRN hypervisor. It is designed for safety-critical systems and appears to support many of the features found in systems like seL4.)

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting your editor's travel to ELC.]

Comments (5 posted)

The strange story of the ARM Meltdown-fix backport

By Jonathan Corbet
March 15, 2018
Alex Shi's posting of a patch series backporting a set of Meltdown fixes for the arm64 architecture to the 4.9 kernel might seem like a normal exercise in making important security fixes available on older kernels. But this case raised a couple of interesting questions about why this backport should be accepted into the long-term-support kernels — and a couple of equally interesting answers, one of which was rather better received than the other.

The Meltdown vulnerability is most prominent in the x86 world, but it is not an Intel-only problem; some (but not all) 64-bit ARM processors suffer from it as well. The answer to Meltdown is the same in the ARM world as it is for x86 processors: kernel page-table isolation (KPTI), though the details of its implementation necessarily differ. The arm64 KPTI patches entered the mainline during the 4.16 merge window. ARM-based systems notoriously run older kernels, though, so it is natural to want to protect those kernels from these vulnerabilities as well.

When Shi posted the 4.9 backport, stable-kernel maintainer Greg Kroah-Hartman responded with a pair of questions: why has a separate backport been done when the Android Common kernel tree already contains the Meltdown work, and what sort of testing has been done on this backport? In both cases, the answer illustrated some interesting aspects of how the ARM vendor ecosystem works.

Android Common and LTS kernels

The Android Common kernels are maintained by Google as part of the Android Open-Source Project; they are meant to serve as a base for vendors to use when creating their device-specific kernels. These kernels start with the long-term support (LTS) kernels, but then add a number of Android-specific features, including the energy-aware scheduling work, features that haven't made it into the mainline for a number of reasons, and more. They also contain backports of important features and fixes, including the Meltdown fixes.

The Meltdown-fix backport was quite a bit of work, and it has gone through extensive testing in the Android kernel. Kroah-Hartman worried that the new backport may not have all of the necessary pieces or have been as extensively validated as the Android work; as such, it may not be something that should appear in the LTS kernels. The analogous effort for x86 should not be an example to follow, he said:

Yes, we did a horrid hack for the x86 backports (with the known issues that it has, and people seem to keep ignoring, which is crazy), and I would suggest NOT doing that same type of hack for ARM, but go grab a tree that we all know to work correctly if you are stuck with these old kernels!

The problem with this idea is that not every ARM system is running Android, and pulling from the Android kernel will not work for vendors whose kernels are closer to the mainline. As Mark Brown put it:

While that's a very large part of ARM ecosystem it's not all of it, there are also chip vendors and system integrators who have made deliberate choices to minimize out of tree code just as we've been encouraging them to.

Those vendors would like to have a long-term supported version of the Meltdown mitigations that does not require dragging in all of the other changes that accumulate in the Android kernels. As Brown pointed out, there are increasing numbers of vendors that are doing what the community has been asking for years and staying closer to the mainline. Not providing a proper backport of these important fixes could be seen as breaking the promise that the community has made: run the officially supported stable kernels and you will get the fixes for significant problems.

There is, thus, a reasonable argument to be made that a proper set of backports for the Meltdown fixes should find its way into the LTS kernels. One little problem remains, though: a proper backport should be known to actually work.

Testing deemed optional

Shi's response to Kroah-Hartman's question about testing was, in its entirety: "Oh, I have no A73/A75 cpu, so I can not reproduce meltdown bug." Reproducing the bug on the A73 would be a bit of a challenge, since that processor does not suffer from Meltdown, but A75 does, so asking for testing results on that CPU does not seem entirely out of line. When Kroah-Hartman repeated his request for testing, though, Ard Biesheuvel responded:

If ARM Ltd. issues recommendations regarding what firmware PSCI methods to call when doing a context switch, or which barrier instruction to issue in certain circumstances, they do so because a certain class of hardware may require it in some cases. It is really not up to me to go find some exploit code on GitHub, run it before and after applying the patch and conclude that the problem is fixed. Instead, what I should do is confirm that the changes result in the recommended actions to be taken at the appropriate times.

Upon receipt of that message, Kroah-Hartman dropped the patch series entirely, complaining that: "I can't believe we are having the argument of 'Test that your patches actually work'". He later added that if the developers working on the backport don't have both the hardware and the exploit code, "then someone is doing something seriously wrong". He urged them to complain to ARM Ltd to get that problem fixed.

At that point, the conversation stopped. Whether the testing problem is on its way toward a solution has not been revealed. It does seem right that the fixes should be merged into the LTS kernels; otherwise the promises that the community has made regarding those kernels will start to look hollow. But the vendors depending on the LTS kernels also have a right to fixes that somebody has actually bothered to test; anybody who has worked in system software for any period of time knows that just checking for adherence to a specification is no guarantee of a working solution.

Comments (20 posted)

Porting Fedora to RISC-V

March 20, 2018

This article was contributed by Richard W.M. Jones

In my previous article, I gave an introduction to the open architecture of RISC-V. This article looks at how I and a small team of Fedora users ported a large part of the Fedora package set to RISC-V. It was a daunting task, especially when there is no real hardware or existing infrastructure, but we were able to get there in a part-time effort over a year and a half or so.

How to bootstrap

The first question I'm usually asked is: How do you bootstrap Linux on a new architecture? We were lucky that most of the seriously hard work — adding support to the kernel and the compiler — was already done by the RISC-V Foundation about two years ago. However there's still the rather large problem that a .riscv64.rpm cannot be built using rpmbuild, because Fedora's RPM does not support cross-compilation, so we must use a Fedora/RISC-V machine. That led to another problem, of course: Fedora/RISC-V didn't exist.

The solution is to take a small Fedora x86-64 chroot environment, and remove all of the x86-64 binaries and libraries from it (everything else — data files, config files, directories and whatever else — can stay). After that, build a GCC cross-compiler that will run on the x86-64 host, cross-compile popular GNU tools and libraries for RV64GC (RISC-V 64-bit general purpose with compressed instructions) and install them into the chroot environment in place of the x86 programs that were just deleted. The last step is to mechanically construct a disk image from this chroot environment and to boot it under QEMU. After many attempts at that you will have a Fedora/RISC-V environment that can run rpmbuild. Build a few hundred RPMs, tediously, while manually resolving build problems and circular build dependencies, until there is enough to build a pristine disk image composed entirely of RPMs.

All of that does not complete the process by any means. Next up is to use an autobuilder to crunch several times through the 20,000+ packages that make up Fedora, again manually fixing compile problems and build dependencies. All of this is done on a $4,000 16-core Intel Xeon system with a pile of SSDs — and without a hint of irony.

After months of effort, over two-thirds of all Fedora packages are done and the rest is really a matter of waiting and fixing problems as they come up.

Myself, David Abdurachmanov, Stefan O'Rear, and others have been through this whole process three times. Back in September 2016 we started the effort and got most of Fedora done by Christmas. But glibc for RISC-V was not stable at that point and our work was broken by changes to the glibc ABI, which meant we had to re-bootstrap everything from scratch. The recent release of glibc 2.27 with stable ABI support for RISC-V means that our current bootstrap attempt will likely be the final one.

Freedom

Fedora's motto is "Freedom, Friends, Features, First". RISC-V may offer the first chance for real "Freedom" all the way down to the hardware. Like Intel and AMD, it has a "Machine Mode" (similar to Intel System Management Mode) running below the operating system but, unlike those, the RISC-V Machine Mode is all open source. Although it's likely that the highest performance chips will be proprietary, there will be a variety of open options, both for full custom ASICs and FPGAs, so where you really need assurance that the CPU is as designed and the freedom to tinker you can have it.

Right now, you can put an open RISC-V design such as the 64-bit Rocket core onto a reasonably priced FPGA, such as the Nexys 4 DDR, and with a certain amount of effort run Fedora on it. This is a fun learning project, and gives you a reasonable amount of assurance. Unfortunately the Xilinx chip and toolchain used on the Nexys are highly proprietary. In the future it is hoped that Project IceStorm will reverse-engineer a large enough FPGA to enable a 64-bit core to be developed using a completely free toolchain (currently only 32-bit embedded RISC-V cores can be run that way).

In a few months to a year there should be a selection of development boards with fully open cores (unfortunately mixed with proprietary memory interfaces and I/O) that will run Fedora out of the box.

Fedora/RISC-V will only be around as long as it stays relevant, so that raises a question: Will RISC-V succeed in the long term? I think it is assured a place as a research platform. CPU researchers always want something they can just download and use without licensing hassles; something that is simple to implement and integrate. There have already been (by my count) six research chips based on RISC-V that have actually been manufactured.

Another use case is manufacturers that want cores for specialized roles, but don't want to pay license fees to ARM or others. As mentioned in the previous article, Western Digital is planning to ship a billion RISC-V cores per year in its hard drives. Although most of the targets will be in the embedded space, having an off-the-shelf Linux distribution available may be helpful in higher-end designs such as in-car displays, single-purpose devices like fitness bands, heating controls, and satellite navigation systems — or even no-name phones and tablets.

My particular interest is in servers, and I think that realistically RISC-V is a very long way off; there are also many opportunities to mess things up along the way. Servers require multi-year or decades-long commitments to standards, specifications, ABIs, and the enterprise; it remains to be seen if RISC-V will provide that.

To try out Fedora on RISC-V, I recommend looking at the Fedora/RISC-V wiki page and grabbing bootable Fedora/RISC-V disk images that can be run under QEMU 2.12 on any Linux machine. Look at the readme.txt file for information on using those images with QEMU. The project has an IRC channel — #fedora-riscv on freenode. There are no mailing lists or special development forks because the aim is to get everything upstream and into the ordinary Fedora dist-git repositories. The aim is no less than feature parity with x86-64 and other Fedora architectures, eventually, and to provide downloadable disk images or installers for RISC-V hardware as it becomes available — turning RISC-V into a first-class Fedora architecture.

Comments (11 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Briefs: GCC 8 usability; Low-level graphics; GStreamer 1.14; RawTherapee 5.4; Quotes; ...
  • Announcements: Newsletters; events; security updates; kernel patches; ...
Next page: Brief items>>

Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds