The future of realtime Linux in doubt

Posted Jul 11, 2014 14:00 UTC (Fri) by roblucid (guest, #48964)
In reply to: The future of realtime Linux in doubt by ortalo
Parent article: The future of realtime Linux in doubt

Wouldn't it be better to initiate the recovery procedure, when the assumptions that mean you WILL meet the deadline are no longer true.
If I had a car engine, which tried to recover after missing a deadline which meant damage, I'd be pretty annoyed, when it turned itself off to avoid further problem. Or say, break fluid pressure low, best not to allow acceleration, but warn and put hazard lights on when travelling at speed.

Much better would be to see it might miss the deadline and for system to take avoiding action, so it meets a goal perhaps with degraded performance.

A processor might normally run down-clocked in a power saving freq. state. if a process which required 1ms CPU time every 100ms according to some worst case analysis, was in 'danger' then scheduler engaging a turbo mode 10ms before expiry and running that task as priority, provides the CPU time without reducing other tasks resources.

Presumably it's possible to have hardware normally use interrupts but fall back to polling of hardware registers, for instance.

The future of realtime Linux in doubt

Posted Jul 11, 2014 20:18 UTC (Fri) by dlang (guest, #313) [Link] (1 responses)

no, just set your deadlines so that if you miss them, you still have time to implement the recovery before things get too bad.

you aren't going to be able to predict that you will miss the deadline with any sort of reliability.

The future of realtime Linux in doubt

Posted Jul 13, 2014 16:28 UTC (Sun) by roblucid (guest, #48964) [Link]

I think that's effectively saying the same thing, using soft sub-goals which mitigate a "miss". By definition of "hard" RT, being allowed to miss a deadline, makes the system no longer "hard" but "soft" so I ruled out this strategy as not meeting spec.

The idea of an "up-clocking" strategy to increase resources "on demand at cost of power" was to mitigate the inherent indeterminism of modern CPU.

I considered how you can make case to be able to meet "hard" deadlines, and assumed any "hard" RT program that risks paging from disk, waits on non predictable event, or as in Mars Pathfinder case blocking on a lower priority process due to inversion is inherently "broken" thus not "hard" RT.

This came out of considering a conversation in late '89 with an RT developer colleague of control systems. Hard RT just loved polling because of it's predictability and simplicity, never mind the performance disadvantages. It seems that just runs counter to philosophy of Linux, which appreciates performance over predictability or simplicity.

A "fast path" which conserves power, but falls back to brute force, polling of registers etc, might be a viable hybrid strategy.

The future of realtime Linux in doubt

Posted Jul 12, 2014 13:40 UTC (Sat) by ianmcc (subscriber, #88379) [Link] (5 responses)

There are examples of that. I can't immediately point to some links but IIRC it was a car (a BMW?) where the engine spontaneously shut down on the motorway, for some relatively trivial reason. The driver made it out alive. But it brings up a good point, even with 'hard' real-time, coping with a failure mode is very important. And if you can cope adequately with a failure, are you really 'hard' real-time anymore?

I think, going into the future, where even simple microcontrollers have pretty substantial CPU power, the issue of 'hard' real time is less important than robustness under failure conditions. The surrounding hardware has some failure rate too, and the software (1) needs to cope with that as best it can and (2) there is no point going to extreme lengths to engineer software that has a failure rate of X if the failure rate of the installation as a whole is 100000*X.

The future of realtime Linux in doubt

Posted Jul 13, 2014 4:26 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (1 responses)

Sounds like a problem I had with my Jeep. There's a sensor which detects where the camshaft is to know when to fire the right spark plug. The wire from it shorted on the engine block and rather than firing willy-nilly (and destroying some pistons and/or chambers), it just stopped firing which basically shuts the vehicle off. Granted, there was probably very little ECU involvement here (it is a 1989 after all), but failure modes are important.

The future of realtime Linux in doubt

Posted Jul 15, 2014 15:11 UTC (Tue) by Wol (subscriber, #4433) [Link]

Or like my Vectra ...

The cambelt fell off!!! Which was a known failure mode :-( the fact it wrecked the engine was a minor point ... I was on a motorway doing 70. Fortunately, late at night, there was no traffic so getting onto the hard shoulder wasn't hard. But if it had been daytime and busy ...

Cheers,
Wol

The future of realtime Linux in doubt

Posted Jul 13, 2014 15:40 UTC (Sun) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

It can get much worse.

The more-reliable software might consume more memory. Adding more memory will degrade the overall system reliability, perhaps even to the point that the system containing more memory and more-reliable software is less reliable than the original system. As the old saying goes: "Be careful, it is a real world out there!"

The future of realtime Linux in doubt

Posted Jul 13, 2014 16:34 UTC (Sun) by roblucid (guest, #48964) [Link] (1 responses)

That's like the first twin engined planes.. unfortunately they relied on the increased engine power so became LESS reliable, as failure chances were doubled.

With enough "more" memory though, things like ECC and a fault tolerant technique like say triple channel with independent implementations allowing a "vote", then you gain greater reliability, like with modern planes which may tolerate multiple engine failures.

The future of realtime Linux in doubt

Posted Jul 16, 2014 19:19 UTC (Wed) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Agreed, at least assuming that you were not already using ECC and triple modulo redundancy. If you are already using one of these techniques, then adding more hardware can still increase the failure rate, though hopefully at a much smaller rate than for systems not using these techniques. Of course, adding triple modulo redundancy is not a magic wand -- it adds more code, which of course can add complexity and thus more bugs. :-(