The future of realtime Linux in doubt [LWN.net]

The future of realtime Linux in doubt

Posted Jul 10, 2014 14:48 UTC (Thu) by dgm (subscriber, #49227) [Link] (1 responses)

Come on, this is unfair. If we are going the to use the hammer argument, even pure mathematics cannot guarantee anything.

And if you don't agree I can reach for a bigger hammer.

The future of realtime Linux in doubt

Posted Jul 10, 2014 16:22 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Indeed, it is entirely unfair. However, fairness is an interesting human concept which, though valuable where applicable, is not necessarily relevant to real-time computing. Which is why you have to be so careful when attempting to apply pure mathematics to the real world, as so many people have learned the hard way.

As to your bigger hammer, if my guess as to its intended use is correct, then you are going to have to get in line. And it is a very long line. :-)

The future of realtime Linux in doubt

Posted Jul 10, 2014 15:05 UTC (Thu) by jch (guest, #51929) [Link] (1 responses)

> I can show you a hammer that will cause it to miss its deadlines

After contact with the hammer, the system ceases having the realtime property.

> So if you need a real-world system that meets latency specifications of some sort, you will need a somewhat more robust definition of "real time".

No, the realtime property is only guaranteed under a given set of conditions (correct hardware, stable power supply, lack of contact with a hammer). This is no different from e.g. a filesystem only guaranteeing consistency if the hardware doesn't lie about its buffers being flushed to stable hardware.

--jch

The future of realtime Linux in doubt

Posted Jul 10, 2014 15:52 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

By adding those qualifiers, you are starting to add a bit of robustness to the definition. Keep moving in that direction, and you might eventually arrive at a real-world definition of real time.

The future of realtime Linux in doubt

Posted Jul 10, 2014 20:06 UTC (Thu) by marcH (subscriber, #57642) [Link] (1 responses)

> After all, if you show me a real-time system, I can show you a hammer that will cause it to miss its deadlines. Of course, if you don't like hammers, there are any number of other implements and natural disasters that will do the trick.

Yeah, sure: if you drive really too fast or if your brake pads are worn out then the real-time software in your ABS will not stop you from crashing.

More seriously: the definition of real-time software is obviously only about *software* guarantees. At the end of the day your entire system is only as good/safe/secure as its weakest part.

The future of realtime Linux in doubt

Posted Jul 11, 2014 12:44 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Qualifiers such as the condition of the brake pads really are a good first step towards a real-world-worthy definition of real time.

And I completely agree that real time is a property of the entire system and its environment, not just of the software. "Hard real time" software doesn't help much unless the rest of the system and the environment are also amenable to hard real time.

The future of realtime Linux in doubt

Posted Jul 11, 2014 8:41 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

> So if you need a real-world system that meets latency specifications of some sort, you will need a somewhat more robust definition of "real time".

Okay. Let's change the subject of the definition. Make it the definition of "a real-time software system".

At which point your hammer is simply "defined out of existence". :-)

"real time" exists only in maths, anyways, so defining hardware out of the equation isn't really a big thing. The software is real-time when run on appropriate hardware. Damaging the hardware promptly takes it out of spec and invalidates the maths...

Cheers,
Wol

The future of realtime Linux in doubt

Posted Jul 11, 2014 12:31 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Indeed, adding those qualifiers is a good first step towards a good real-world-ready definition of "real time".

Although I agree that math can be used approximate some important behaviors of the real world, it is quite easy to take things too far. In particular, any attempt to define a real-world hammer out of existence is unlikely to end well. ;-)

The future of realtime Linux in doubt

Posted Jul 11, 2014 12:17 UTC (Fri) by ortalo (guest, #4654) [Link] (13 responses)

I once added to that deadline-oriented definition of hard realtime - to which I fully agree - the fact that missing the deadline necessitated a recovery procedure.
My idea was to make the distinction between purely best effort systems (where the distinction between normal and realtime-specific systems is sometimes pretty slim) and more sophisticated environments where a lot of infrastructure is in place for providing guarantees to the user and means for activating recovery (or failure-attenuating) code.

IIRC, we were discussing the problem of cascading scheduling constraints in those hard realtime systems where users define the deadline incorrectly because they do not allow enough time for potential failure-management procedures to execute (think of the functions associated to a self-destruct procedure in a rocket for example).

Globally, my comment is that, yep, you speak of realtime when there are deadlines (definitive ones); but the real fun starts when you think about what your software has to do once these deadlines have been missed and the world is starting to collapse but there is still a chance to avoid it... I guess this is where you can really start to speak about hard realtime. ;-)

The future of realtime Linux in doubt

Posted Jul 11, 2014 12:25 UTC (Fri) by ortalo (guest, #4654) [Link] (3 responses)

BTW, thinking to my past litterature work at school; maybe we can propose a nice new terminology on realtime:
- dramatic realtime: strong deadlines, but still hope if you miss them and lot of complex software to manage the outcome;
- tragic realtime: strong deadlines, no hope if you miss them, just do your best and live your remaining moments honorably - your own definition of honor matters a lot.

As an exercise, I'll let you propose definitions for: comic realtime, poetic realtime, theatral realtime, etc. ;-))

The future of realtime Linux in doubt

Posted Jul 11, 2014 12:52 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

I agree completely with your parent post: Dealing with failures is often quite hard, and so I have no argument with your point about failure handling being the really hard part of hard real time.

Dramatic realtime? Tragic realtime? Comic realtime? Poetic realtime? Comic realtime? Well, there have been some times when a bit of comic realtime would have been quite welcome. And some might argue that many of these comments, no doubt including some of my own, constitute theatrical realtime. ;-)

The future of realtime Linux in doubt

Posted Jul 13, 2014 3:17 UTC (Sun) by nevets (subscriber, #11875) [Link] (1 responses)

This reminds me of a time we talked about "Diamond hard" "Ruby Hard" and, oh yeah, Microsoft's "Feather hard" real-time systems.

http://lwn.net/Articles/143323/

The future of realtime Linux in doubt

Posted Jul 13, 2014 14:06 UTC (Sun) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Indeed! ;-) ;-) ;-)

But it is really hard to believe that article appeared nine years ago!

The future of realtime Linux in doubt

Posted Jul 11, 2014 14:00 UTC (Fri) by roblucid (guest, #48964) [Link] (8 responses)

Wouldn't it be better to initiate the recovery procedure, when the assumptions that mean you WILL meet the deadline are no longer true.
If I had a car engine, which tried to recover after missing a deadline which meant damage, I'd be pretty annoyed, when it turned itself off to avoid further problem. Or say, break fluid pressure low, best not to allow acceleration, but warn and put hazard lights on when travelling at speed.

Much better would be to see it might miss the deadline and for system to take avoiding action, so it meets a goal perhaps with degraded performance.

A processor might normally run down-clocked in a power saving freq. state. if a process which required 1ms CPU time every 100ms according to some worst case analysis, was in 'danger' then scheduler engaging a turbo mode 10ms before expiry and running that task as priority, provides the CPU time without reducing other tasks resources.

Presumably it's possible to have hardware normally use interrupts but fall back to polling of hardware registers, for instance.

The future of realtime Linux in doubt

Posted Jul 11, 2014 20:18 UTC (Fri) by dlang (guest, #313) [Link] (1 responses)

no, just set your deadlines so that if you miss them, you still have time to implement the recovery before things get too bad.

you aren't going to be able to predict that you will miss the deadline with any sort of reliability.

The future of realtime Linux in doubt

Posted Jul 13, 2014 16:28 UTC (Sun) by roblucid (guest, #48964) [Link]

I think that's effectively saying the same thing, using soft sub-goals which mitigate a "miss". By definition of "hard" RT, being allowed to miss a deadline, makes the system no longer "hard" but "soft" so I ruled out this strategy as not meeting spec.

The idea of an "up-clocking" strategy to increase resources "on demand at cost of power" was to mitigate the inherent indeterminism of modern CPU.

I considered how you can make case to be able to meet "hard" deadlines, and assumed any "hard" RT program that risks paging from disk, waits on non predictable event, or as in Mars Pathfinder case blocking on a lower priority process due to inversion is inherently "broken" thus not "hard" RT.

This came out of considering a conversation in late '89 with an RT developer colleague of control systems. Hard RT just loved polling because of it's predictability and simplicity, never mind the performance disadvantages. It seems that just runs counter to philosophy of Linux, which appreciates performance over predictability or simplicity.

A "fast path" which conserves power, but falls back to brute force, polling of registers etc, might be a viable hybrid strategy.

The future of realtime Linux in doubt

Posted Jul 12, 2014 13:40 UTC (Sat) by ianmcc (subscriber, #88379) [Link] (5 responses)

There are examples of that. I can't immediately point to some links but IIRC it was a car (a BMW?) where the engine spontaneously shut down on the motorway, for some relatively trivial reason. The driver made it out alive. But it brings up a good point, even with 'hard' real-time, coping with a failure mode is very important. And if you can cope adequately with a failure, are you really 'hard' real-time anymore?

I think, going into the future, where even simple microcontrollers have pretty substantial CPU power, the issue of 'hard' real time is less important than robustness under failure conditions. The surrounding hardware has some failure rate too, and the software (1) needs to cope with that as best it can and (2) there is no point going to extreme lengths to engineer software that has a failure rate of X if the failure rate of the installation as a whole is 100000*X.

The future of realtime Linux in doubt

Posted Jul 13, 2014 4:26 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (1 responses)

Sounds like a problem I had with my Jeep. There's a sensor which detects where the camshaft is to know when to fire the right spark plug. The wire from it shorted on the engine block and rather than firing willy-nilly (and destroying some pistons and/or chambers), it just stopped firing which basically shuts the vehicle off. Granted, there was probably very little ECU involvement here (it is a 1989 after all), but failure modes are important.

The future of realtime Linux in doubt

Posted Jul 15, 2014 15:11 UTC (Tue) by Wol (subscriber, #4433) [Link]

Or like my Vectra ...

The cambelt fell off!!! Which was a known failure mode :-( the fact it wrecked the engine was a minor point ... I was on a motorway doing 70. Fortunately, late at night, there was no traffic so getting onto the hard shoulder wasn't hard. But if it had been daytime and busy ...

Cheers,
Wol

The future of realtime Linux in doubt

Posted Jul 13, 2014 15:40 UTC (Sun) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

It can get much worse.

The more-reliable software might consume more memory. Adding more memory will degrade the overall system reliability, perhaps even to the point that the system containing more memory and more-reliable software is less reliable than the original system. As the old saying goes: "Be careful, it is a real world out there!"

The future of realtime Linux in doubt

Posted Jul 13, 2014 16:34 UTC (Sun) by roblucid (guest, #48964) [Link] (1 responses)

That's like the first twin engined planes.. unfortunately they relied on the increased engine power so became LESS reliable, as failure chances were doubled.

With enough "more" memory though, things like ECC and a fault tolerant technique like say triple channel with independent implementations allowing a "vote", then you gain greater reliability, like with modern planes which may tolerate multiple engine failures.

The future of realtime Linux in doubt

Posted Jul 16, 2014 19:19 UTC (Wed) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Agreed, at least assuming that you were not already using ECC and triple modulo redundancy. If you are already using one of these techniques, then adding more hardware can still increase the failure rate, though hopefully at a much smaller rate than for systems not using these techniques. Of course, adding triple modulo redundancy is not a magic wand -- it adds more code, which of course can add complexity and thus more bugs. :-(

The future of realtime Linux in doubt

Posted Jul 11, 2014 19:37 UTC (Fri) by marcH (subscriber, #57642) [Link] (5 responses)

> your definition is so simple that it excludes the possibility of any real-world system being a real-time system.

Not sure why you assume that defining the software forbids defining the rest of the system... strange.

The future of realtime Linux in doubt

Posted Jul 13, 2014 14:01 UTC (Sun) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (4 responses)

> Not sure why you assume that defining the software forbids defining the rest of the system... strange.

Nice try, but you missed the mark.

The problem is that as you push towards the limits of the system's capabilities, you must be increasingly careful about when and how you apply abstraction. Out at the limits, little things that might otherwise be confined to a given piece of the system start making themselves felt globally. Therefore, out at the limits, premature abstraction is the root of all evil.

In short, at the extremes, you don't have the luxury of defining the software independently of the rest of the system.

One saving grace is that today's hardware can trivially meet requirements that were out at the limits a few decades ago. So the limits have moved out considerably over that time. However, the global awareness required at the limits remains unchanged.

The future of realtime Linux in doubt

Posted Jul 13, 2014 16:46 UTC (Sun) by roblucid (guest, #48964) [Link] (1 responses)

The overhead of abstraction, is simply a factor to consider so long if it's predictable.

The cost in slightly more performant hardware, may be a wise investment to reduce system complexity, developer time and improved QA of the result.

If "all you care about" is meeting deadlines, sailing close to limits seems perverse and more a "soft" RT thing, where errors in assumptions are tolerable ie network sound streaming, where no-one dies if it fails in unusual circumstances.

Having seen specs and done a requirements analysis for a nuclear power station project, I can assure you squeezing the most out of the HW was the last thing desired. I suspect (I left the project for another opportunity) any error, would have been in over-provision causing too great an increase in system complexity and making it hard to analyse.

The future of realtime Linux in doubt

Posted Jul 16, 2014 19:11 UTC (Wed) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Overprovisioning the hardware does make a lot of sense when feasible.

But if you require deadlines down in the tens of microseconds, overprovisioning may not help much. Last I heard, nuclear power stations had much longer deadlines, but you would know better than I.

The future of realtime Linux in doubt

Posted Jul 15, 2014 21:37 UTC (Tue) by marcH (subscriber, #57642) [Link] (1 responses)

> In short, at the extremes, you don't have the luxury of defining the software independently of the rest of the system.

So, why do you go there?

On life-critical systems (or... weapon systems) you do have this "luxury". Except for the name; I guess it's rather called "certification requirement" or something similar.

The future of realtime Linux in doubt

Posted Jul 16, 2014 19:13 UTC (Wed) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

If you do have to go there, then the certifications will of course be limited to those that can be obtained for the system as a whole.