Linux in mixed-criticality systems

By Jonathan Corbet
December 13, 2018

The Linux kernel is generally seen as a poor fit for safety-critical systems; it was never designed to provide realtime response guarantees or to be certifiable for such uses. But the systems that can be used in such settings lack the features needed to support complex applications. This problem is often solved by deploying a mix of computers running different operating systems. But what if you want to support a mixture of tasks, some safety-critical and some not, on the same system? At a talk given at LinuxLab 2018, Claudio Scordino described an effort to support this type of mixed-criticality system.

For the moment, this work is focused on automotive systems, which have a bunch of non-critical tasks (user interaction and displaying multimedia, for example) and critical tasks (such as autonomous driving and engine control). These tasks can be (and often are) handled with independent computers running different operating systems, but there is a lot of interest in combining these computers into one. The result, should this effort be successful, would be a system that is both cheaper and more flexible.

One way of doing this would be to turn Linux into a fully realtime system. In the past, dual-kernel approaches, such as RTLinux, RTAI, and Xenomai have been developed toward that end. More recently, the PREEMPT-RT patches have seen the most attention, though Scordino described the result as "soft realtime". The problem with all of these systems is certification for use in safety-critical settings, which is hard (if not impossible) for a kernel as large as Linux. In addition, regulations can prevent its use anyway; European automotive regulations do not allow the use of a shared system for both non-critical tasks and engine control, for example.

So attention has shifted to an alternative approach: resource partitioning that can create a hard separation between tasks on a single platform. Modern processors support this under most common architectures. If one of these systems could be successfully partitioned, it would become possible to use Linux for non-critical code and a certified operating system for the rest. This is the focus of the Hercules project, which has been funded by the European Union.

For the safety-critical side of the system, Hercules has chosen Erika Enterprise, a system that is licensed under GPLv2+. It has been designed explicitly for automotive electronic control units, and carries a number of the relevant certifications. Erika is used in production in some cars now; it supports a range of CPUs and can run under a variety of hypervisors.

On the hypervisor side, the system of choice is Jailhouse, which is also available under GPLv2. Jailhouse, too, has been designed for safety-critical applications with certification in mind; to that end, the project has a goal of not exceeding 10,000 lines of code on any architecture that it supports. The plan is to use Jailhouse to run realtime, safety-critical tasks on multicore platforms alongside Linux. It should be able to provide strong and clean isolation between the two while performing at "bare-metal levels".

Jailhouse has the concept of a "root cell" which, while being in control of the system as a whole, is not in full control of the hardware it is running on. The root cell will be running Linux. Other "cells" can run (as an "inmate") any kernel that has Jailhouse support; unlike full virtualization systems, Jailhouse is not able to run unmodified kernels. There is no scheduling built into Jailhouse; each core in the system is given over fully to one system for its use. There is no overcommitting of hardware, and no hardware emulation. Memory is partitioned between the cells, with some set aside for Jailhouse itself.

Linux systems with Jailhouse support have a special device (/dev/jailhouse) used for configuration of the root cell and the loading of inmate systems into the other cells. It uses a rather long and intimidating configuration file, written as a set of C data-structure definitions, that fully describes the hardware and its partitioning; this file can be automatically generated on x86 systems, but must be written by hand for Arm systems.

Cells in Jailhouse are isolated from each other, but they are still likely to need to communicate; Jailhouse provides a virtual PCI device for that purpose. There is no multicast capability, which is a bit of a shortcoming, so the Hercules developers have added their own communications library on top. It provides both blocking and non-blocking calls and dynamic message sizes; this code should be released soon.

One of the biggest problems that needs to be solved is avoiding interference between the cells, which can happen even with hard partitioning. Memory bandwidth and cache space can be particularly problematic. One solution for cache contention is to use cache coloring — assigning virtual addresses so that each cell uses a different portion of the cache. Another approach is to use the system's performance counters to monitor the use of memory bandwidth and cache space; tasks can then be throttled if need be to keep them within their limits. Coscheduling, wherein processes are scheduled so as to avoid contending with each other for memory, is also under development. This code, too, is expected to be released soon.

As the session ended, Thomas Gleixner observed from the audience that Hercules is "a nice fairy tale", an embodiment of the design that everybody seems to want. He also said that it is "a pipe dream", though, for a simple reason: there are no CPUs large enough to run this kind of system that have been certified for safety-critical tasks. Without a certified CPU, the system as a whole cannot be certified. Scordino responded that the vendors are working on this problem. Once they have a solution, it would appear that Hercules will be ready to run on it.

[Thanks to LinuxLab and to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the event.]

Linux in mixed-criticality systems

Posted Dec 13, 2018 18:39 UTC (Thu) by nopsled (guest, #129072) [Link] (2 responses)

It would also be nice to have some infrastructure for proper Priority Inheritance in the kernel as part of this, apart from the already present rt_mutex, the last time I ever heard about that from the Realtime effort, associating the interrupt ID with the kernel thread was described to be the hard part. The seL4 developers recently published a paper on using the scheduling context as capabilities as part of their time protection effort, and being able to transfer the clients time slice to the server to achieve that (supporting cases where a server could become unscheduled completely if it happens to have no scCaps, and then transitioning back from that passive state on reception of scCaps from a client). IDK how that would work under Linux, and if trying to map them to file descriptors would be an acceptable approach. Not only can this be useful for system calls, it will also be useful for userspace itself, or for IPC systems that want to bill clients for method calls they make, or charge someone's request over another's. Binder currently uses nice values to emulate this, which is at best, a hack.

Linux in mixed-criticality systems

Posted Dec 13, 2018 20:55 UTC (Thu) by smurf (subscriber, #17840) [Link]

Hmm. I fail to see what PI has to do with this article. Presumably, a hard-RT system that runs on Jailhouse will unequivocally own any hardware it controls, and will not talk to Linux on any mission-critical code path. So why should the Linux kernel's ability, or not, to do "proper Priority Inheritance" matter here?

Linux in mixed-criticality systems

Posted Dec 14, 2018 19:41 UTC (Fri) by glenn (subscriber, #102223) [Link]

The SCHED_DEADLINE developers have been pushing forward on a set of WIP patches from Peter Zijlstra to revamp PI to use a bandwidth-server-based approach to PI: https://www.spinics.net/lists/linux-rt-users/msg19573.html

This is good for mixed-criticality systems, because it helps isolates potential side-effects of PI to the threads that share resources. Independent threads are generally isolated from the effects of bandwidth-server-based PI employed by other threads (this is not the case with simple PI). This helps realize the "freedom from interference" requirement of some safety standards (e.g., the automotive ISO-26262 standard).

Folks have used real-time Linux in mission-critical applications. SpaceX has been open about their heavy use of Linux on their rockets: https://lwn.net/Articles/540368/

Linux in mixed-criticality systems

Posted Dec 14, 2018 5:45 UTC (Fri) by marcH (subscriber, #57642) [Link] (21 responses)

Comparing the price of an embedded and dedicated microprocessor with the price of a car: what's the point?

Let's not even get into the price of insurance and human life.

Linux in mixed-criticality systems

Posted Dec 14, 2018 8:32 UTC (Fri) by wsy (subscriber, #121706) [Link]

I agree. Making life-critical system with physical connection to the Internet is absolutly a bad idea.

Linux in mixed-criticality systems

Posted Dec 14, 2018 9:27 UTC (Fri) by nim-nim (subscriber, #34454) [Link] (8 responses)

A car is an expensive thing to produce, but the automobile industry is very competitive, so margins are minuscule (IIRC, profit is a few hundred of €/$ for a car that sells for tens of thousands). It works because volumes are high, and the car industry real earnings come from repair and loans to buyers.

So basically, consolidating the car information systems, can easily result in savings at least equal to the profit the manufacturer currently makes on the car as a whole.

Linux in mixed-criticality systems

Posted Dec 14, 2018 15:18 UTC (Fri) by marcH (subscriber, #57642) [Link] (7 responses)

Just like any other type of product: it depends. The margins are low for low-end cars (the ones manufacturers want to stop selling, see recent GM layoffs for instance) but they're not low for trucks and SUVs and not the ones with... a gazillion electronic gadgets. We all heard countless stories of buyers negotiating harder and for longer and paying thousands of dollars less than the next guy for the same car.

Linux in mixed-criticality systems

Posted Dec 15, 2018 10:30 UTC (Sat) by drag (guest, #31333) [Link] (6 responses)

Manufacturers are dependent on car dealerships to push their products onto the buying public. Dealerships depend heavily on post-sale support/maintenance/parts/repair and financing for a huge portion of their profits.

So there is a conflict of interest there. If manufacturers produce 'open' technology that is easy and simple for anybody to repair, provide parts for, and support then they risk alienating the sales goons. So they design vehicles specifically to be anti-serviceable and keep diagnostic tools and software extremely proprietary.

As the technology improves cars should be becoming simpler and easier to repair. The technology and engineering that goes into modern vehicles is incredible, very expensive, and hugely difficult... but the actual execution is still really basic. Hall effect sensors, fuel injectors, temperature sensors, O2 sensors, fuel meters, fuel pumps are all late 1970's early 1980's technology just highly evolved. A effective modern fuel injection system is much simpler and easier to support and repair then a old carbureted system. Parts are cheaper to manufacture on a large scale, too. Which is some of the major reasons why cars moved away from the old tech in the first place.

So, yes, piling on tons of gadgets and features to otherwise basic form of transportation is a big way they can help plump up the profits of their dealer networks.

As is tying computer systems together into big massive proprietary lumps and protecting access and tools required to diagnose with DRM in order to trigger DMCA legislation and making it illegal, or at least extremely difficult, for people to produce third party diagnostic and support tools.

https://www.wired.com/2015/04/dmca-ownership-john-deere/

> General Motors told the Copyright Office that proponents of copyright reform mistakenly “conflate ownership of a vehicle with ownership of the underlying computer software in a vehicle.”

Because of that sort of crap I am extremely weary of any attempt to combine entertainment and vehicle management systems into one big lump. Besides the fact that it's just poor engineering. I just don't trust these people.

Next thing you know they will have to have all sorts of backdoors and encrypted control over firmware that prevents user serviceability because of 'security'. Can't have somebody hack your bluetooth speaker and then updating the firmware for your anti-lock brake system... right? Right?

For tying these systems together in a secure manner it should be very possible to setup one-way communication so that information and logging and other things can flow from vehicle management into the display and networking computers. The protocols should be documented and 'best effort' standards should be established so that third party tools and software can be used to reduce the cost of ownership and increase the value of vehicles for the customers.

I have hopes for companies like Tesla will move the industry away from car dealerships and such things. And I think that things will improved provided the Government resists the temptation of strengthening IP protection legislation for the sake of these major corporations. (which are all already extremely damaging at all levels of society). But until that happens I really am going to stick with buying the most basic cars possible.

Linux in mixed-criticality systems

Posted Dec 18, 2018 13:28 UTC (Tue) by nilsmeyer (guest, #122604) [Link]

Great comment. I think adding a lot of Software in embedded devices also drastically increases the loss in value because of course you can't replace them once they stop receiving updates - although this may be discovered as an additional revenue stream instead of trying to get people to buy new cars.

There really is no good reason for an in-vehicle entertainment system providing anything beyond sound or display output and perhaps networking, since passengers usually have a powerful device like a smartphone or tablet with them which they replace more often than the car. The same also goes for airplanes.

I use the train a lot here in Germany, and while the railway company gets a lot wrong in the software / digitalization realm at least they resisted the temptation to outfit the trains with an entertainment system like you have on some airlines. Instead you can log-in to the on-board wifi over which they also offer a streaming service (persumably with lots of content cached on the train), since people bring their own devices anyways.

> So, yes, piling on tons of gadgets and features to otherwise basic form of transportation is a big way they can help plump up the profits of their dealer networks.

You can diversify a lot in the realm of sales here by offering choices so no two car buys are similar and then upselling the hell out of it - "want the soundsystem? You also have to get the undercoating." I never personally bought a car but from what I hear it's a very unpleasant experience, especially if you don't like to haggle.

Linux in mixed-criticality systems

Posted Dec 20, 2018 0:40 UTC (Thu) by rahvin (guest, #16953) [Link] (3 responses)

You should not trust them and there are significant risks to the convergence model. Look no farther than what's happened with Toyota's unintended acceleration issues after they moved to drive by wire. A code error created a situation where a car would accelerate indefinitely, and the transmission would look the shifter so it couldn't be slipped into neutral and the result was a bunch of fatalities. Toyota is going to pay out a LOT of profits by the time all those lawsuits are over.

And the remarkable thing about these suits was just how long it took and how much legal effort was expended just to prove it. IIRC it took almost 3 years of legal work before a set of lawyers were able to get discovery granted to the source code to the systems that caused the problem and a proper analysis was done.

With the move to drive by wire systems we need government regulations and code quality and testing requirements that protect life safety possibly even a requirement for a switch that turns the whole drive system off while preserving braking and steering or some other failsafe. One of the reasons cars were so bloody reliable from 1990 to about 2008 and didn't have these type of issues was because everything was designed previously as single controllers and ASIC's for every function so that a computer failure would only take out a single system not the whole drivetrain. This convergence is very scary to me and was vindicated by the finding on the Toyota acceleration issue where it was a code quality issue. Life safety needs very high standards and multiple failsafes.

I design roadways for a living and I'm legally required to consider safety before any other variable at the risk of my possible incarceration for failure to do so. I believe vehicle source code should be held to a similar standard.

Linux in mixed-criticality systems

Posted Dec 21, 2018 7:35 UTC (Fri) by murukesh (subscriber, #97031) [Link] (2 responses)

Do you have some reading material on this? I looked through https://en.wikipedia.org/wiki/Sudden_unintended_accelerat... and that article seems to claim that the problem was not with electronics or code, but with the pedal design (and the floor mat). Or are you describing another SUA incident?

Linux in mixed-criticality systems

Posted Jan 3, 2019 16:02 UTC (Thu) by sdalley (subscriber, #18550) [Link] (1 responses)

Lots of gripping stuff for the embedded software engineer here. I found it compulsive reading.

https://embeddedgurus.com/barr-code/2013/10/an-update-on-...
http://www.safetyresearch.net/Library/Bookout_v_Toyota_Ba...

Toyota's Unintended Acceleration

Posted Jan 6, 2019 1:02 UTC (Sun) by marcH (subscriber, #57642) [Link]

Fascinating, thanks.

Whether these bugs are real or not and no matter who "wins" this eventually, Toyota and others should already have learned a lesson the hard way: every time human life is at stake legal costs alone far outweigh any development cost saved by sloppy programming. While a rather low bar it's still a somewhat "good" bit of news.

People cope with getting screwed over again and again by $BIGCORP and the 0.1% but funny enough there's always this line that you really can't cross: they don't like to... die.

Now rebooting my smartphone to work around some memory leak and other random issues...

Linux in mixed-criticality systems

Posted Dec 20, 2018 20:12 UTC (Thu) by kamil (guest, #3802) [Link]

> I have hopes for companies like Tesla

Well, those hopes may be misplaced. Tesla, innovative though it is, has a well-deserved reputation of being extremely proprietary. There are plenty of stories flying around about the difficulties of reverse engineering what its cars do internally, etc.

Heck, recently they announced that they will stop working with third-party body shops and that they want all such repairs to be handled in-house. Compared to Tesla, traditional car manufacturers are a model of openness...

Linux in mixed-criticality systems

Posted Dec 14, 2018 11:32 UTC (Fri) by smurf (subscriber, #17840) [Link]

There are reasons besides shaving a few $$ off the cost of a car why you'd like tighter integration between safety critical systems and outside-accessible code.

The typical car (i.e. basically anything that's mid-range or better) consists of a veritable heap of embedded processors with no way to log anything substantial if/when something goes wrong, and with no way to update them from the outside. Tesla excepted …

Linux in mixed-criticality systems

Posted Dec 14, 2018 12:02 UTC (Fri) by nhippi (subscriber, #34640) [Link] (8 responses)

OTOH having a separate microcontroller for everything means a lot more wires and connectors, which are usually the places of faults on modern cars. Putting things on the same die might actually make things more reliable.

Internet connectivity and cloud services OTOH are much big risks.

Linux in mixed-criticality systems

Posted Dec 14, 2018 15:11 UTC (Fri) by marcH (subscriber, #57642) [Link] (7 responses)

> OTOH having a separate microcontroller for everything means a lot more wires and connectors,

Not for everything; for safety-critical systems. Cars used to be certified and pretty safe before micro controllers so there can't be that many safety-critical micro controllers in a car today.

Reducing the number of micro controllers and wires is great and all and sure you also want to add some partitioning and "soft real-time" there but that's different.

Linux in mixed-criticality systems

Posted Dec 14, 2018 15:24 UTC (Fri) by marcH (subscriber, #57642) [Link] (1 responses)

By the way safety critical ethernet already exists and is already in use in planes for instance. Reducing the number of wires doesn't require reducing the number of microcontrollers, those are two different problems with several orders of magnitude difference in complexity.

Linux in mixed-criticality systems

Posted Dec 14, 2018 17:01 UTC (Fri) by ibukanov (subscriber, #3942) [Link]

Older planes still use ARINC 429 protocol which is one way serial link where one source can talk to multiple clients. But since it is used only in aviation, the boards are really expensive. It is cheaper just to use Ethernet and require special verified routers where software is formally verified.

Linux in mixed-criticality systems

Posted Dec 14, 2018 16:54 UTC (Fri) by smurf (subscriber, #17840) [Link] (3 responses)

> Cars used to be certified and pretty safe before micro controllers

Yeah, but that was before brake assistants and airbags and automatic transmissions and mandatory seatbelt warnings and central locks and theft protection and whatnot. All of these, and more, either require µCs outright or are more expensive if you build them without one.

Today? No way. Even less way if you want a car that doesn't burn gasoline.

Linux in mixed-criticality systems

Posted Dec 14, 2018 17:33 UTC (Fri) by marcH (subscriber, #57642) [Link] (2 responses)

mandatory != safety critical

Central locks and rear cameras are "real-time" (for some definition of it) but really not safety critical.

Linux in mixed-criticality systems

Posted Dec 20, 2018 6:26 UTC (Thu) by JdGordy (subscriber, #70103) [Link] (1 responses)

pretty sure there is a requirement for rear cameras to turn on within ~2s of power, and quite likely there is actual safety critical requirements on central locking (if fitted obviously) around how they work in accidents

Linux in mixed-criticality systems

Posted Dec 20, 2018 6:39 UTC (Thu) by marcH (subscriber, #57642) [Link]

My car doesn't have a rear camera, so it must be critically unsafe.

Check what it takes to open a damaged car door.

ABS when fitted is a good example considering its job is to ... release the brakes. Cruise control and any other sort of self driving feature are other obvious ones. Can't wait for the fun of seeing these interact with security holes and other bugs in the media player. As usual the only winners will be the lawyers.

Linux in mixed-criticality systems

Posted Dec 14, 2018 17:38 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

Safety-critical microcontrollers in production automobiles have (with initially narrow market penetration) been with us for nearly fifty years, in the form of anti-lock braking systems.

Linux in mixed-criticality systems

Posted Dec 14, 2018 15:43 UTC (Fri) by marcH (subscriber, #57642) [Link]

We all here know our skills are in short supply and we've worked with people who lack some of the most basic software/hardware engineering practices yet are assigned business- (not safety-) critical tasks anyway. Our industry is well known for its... "youth".

You'd think car makers are paranoid about safety and you'd be right. The problem is: that doesn't magically make them able to attract and hire good computer professionals (bar super rare exceptions like Tesla).

This is where I find the security "circus" extremely valuable: hacking a phone is one thing, hacking a car makes much bigger headlines and shows the emperor is naked.

I've also heard a couple software engineering horror stories from the inside and it's actually not pretty.

Please consider this presentation for what it is: a very interesting but *long term research* project and wish none of it shows up in your car tomorrow.

Linux in mixed-criticality systems

Posted Dec 14, 2018 16:13 UTC (Fri) by shemminger (subscriber, #5739) [Link]

Part of the problem is that many of these systems are one off solutions that never get updated.
When the jailhouse is broken, and it will be; the car will be an open target. Because of the long lead times for development, the regulations, and the fork-from-upstream-five-years-ago model these kind of system are doomed. Having two processors gives some hardware isolation for the restricted software engineering process.

Linux in mixed-criticality systems

Posted Dec 18, 2018 6:31 UTC (Tue) by alison (subscriber, #63752) [Link] (2 responses)

>Memory bandwidth and cache space can be particularly problematic. >One solution for cache contention is to use cache coloring — assigning >virtual addresses so that each cell uses a different portion of the cache.

I've read that ARM and ARM64 don't support cache coloring. If x86_64 is the only arch with this feature then the applicability to embedded use cases may be limited.

If anyone knows contrary information, please speak up!

Linux in mixed-criticality systems

Posted Dec 18, 2018 16:15 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (1 responses)

ARMv6 places some restrictions on page coloring, and actually requires it to an extent when memory is aliased to avoid duplicate entries in the cache. The cache is *virtually* indexed, however, so cache coloring on ARMv6 requires control over the virtual addresses. ARMv7 and ARMv8 both specify physically-indexed and physically-tagged data caches and should have no restrictions on coloring.

You can find more information on the ARMv6 and ARMv7 cache architecture here: https://community.arm.com/processors/b/blog/posts/page-co...

Linux in mixed-criticality systems

Posted Dec 18, 2018 16:31 UTC (Tue) by alison (subscriber, #63752) [Link]

Thanks, nybble41!

Linux in mixed-criticality systems

Posted Dec 24, 2018 4:46 UTC (Mon) by tiffang (guest, #48653) [Link]

So can I think the major problem of such mixed systems is that no chip companies are yet providing hardware level separation solution, such as separated cache structure and memory bus between CPUs?
What else can be shortcomings from hardware?

Linux in mixed-criticality systems

Posted Jan 9, 2019 6:46 UTC (Wed) by robbe (guest, #16131) [Link]

> Jailhouse has the concept of a "root cell" which, while being in control of the system
> as a whole, is not in full control of the hardware it is running on. The root cell will be
> running Linux.

This design ensures that the powers-that-be will have to lock down this Linux system to keep any safety guarantees. Is this an accident?

A more user-friendly design would keep Linux out of the Trusted Computing Base, and therefore able to be replaced without jeopardising overall system safety.

Linux in mixed-criticality systems

Posted Feb 24, 2022 2:49 UTC (Thu) by jiaming (guest, #156804) [Link]

Thanks a lot to the author of this article, I think using cache coloring technology on Jailhouse is a very meaningful work.