Another Hard real time Linux

Posted Feb 19, 2009 10:12 UTC (Thu) by i3839 (guest, #31386)
In reply to: Another Hard real time Linux by razb
Parent article: Interview: the return of the realtime preemption tree

> correct. but it is limited only to:
> 1. accessing ***vmalloc**** space ***directly*** . You can access any
> kmalloc'ed address directly , and access vmalloc'ed space by walking
> on the pages. what I mean is that you can access everything.
> 2. unable to kmalloc
> 3. unable to free memory. ( For example : kfree ).

What's dangerous about accessing vmalloced space directly if it's pinned? Or did I misunderstand?

> You can access any facility in the kernel. you can send or receive
> packets. and I do it on AMD-Intel machines successfully.

Though those facilities may not access vmalloc space directly, nor allocate/free memory? Seems very fragile, because you can't know if they will in the future (assuming you audited all the code that may be executed by those facilities, which is a lot of tricky work).

How can you send and receive packets if you can't allocate the space needed for them? Not with the standard networking stack, can you?

> gettimeofday is not a timer, it is a clock. try and schedule a task to
> be run T microseconds from now, you will skew, and the more tasks, it
> will skew more.

Right, totally different, sorry. But you only run one task, so the timer is just a more efficient way of not doing anything in the meantime?

> even with NAPI you may get your system to be jammed, and worst of all
> even with unrelated traffic, offsched suggests another approach of
> containing incoming traffic to a single or more cores. This way cpu0,
> the main operating system processor, will not be at risk.

This is a generic problem: Any (user or kernel) process can use too many resources, slowing down the machine as a whole. Offsched doesn't solve that at all, except for some explicit kernel cases which are 'ported' to offsched, which is a lot of work.

realtime preemption, on the other hand, tries to solve this problem in a more generic way.

And moving networking to offsched may contain the damage to one core, but it doesn't solve the real problem, e.g. sshing into the box doesn't work quicker or better in any way. If the NIC generates more packets than can be handled, the right solution is to drop some early. Basically what you always do in an overload situation: Don't try to do everything, drop some stuff.

Now the nasty thing is that it's hard to see the difference between a DoS and just a very high load.

Besides, handling the network packets with all cores instead of one may be the difference between being DoSed and just slowed down.

> you cannot run user space with interrupts disabled. So you probably
> meant kernel space, and it will look something like this:

Bad wording on my part, sorry. No, I meant that all interrupt handlers are executed on other cores than the "special" one, and the few that would happen anyway are disabled semi-permanently. (The scheduling clock can be disabled because a rt task is running and no involuntary scheduling should happen. Easier now with dynticks though.)

Basically moving the special kernel task running on that core to a special user space task running on that core. Or at least add it as an option. Add some special syscalls or character drivers to do the more esoteric stuff and voila, all done.

> but you will fail.
> a processor must walk trough a quiescent state ; if you try it, you will
> have RCU starvation, and I have been there... :) . one of my papers
> explains that.

This problem is still there though. But it seems like a minor adjustment to RCU to teach it that some cores should be ignored, or to keep track if some cores did any RCU stuff at all (perhaps it already does that now, didn't check).

All in all what you more or less have is standard Linux kernel besides a special mini-RT-OS, running on a separate core. Only, you extend the current kernel to include the functionality of that RT-OS, and use other bits and pieces of the kernel when convenient. This is better than a totally separate RT-OS, but still comes with the disadvantages of one: Very limited and communication with the rest of the system is tricky. If done well it's a small step forwards, but why not think bigger and try to solve the tougher problems?

Another Hard real time Linux

Posted Feb 20, 2009 22:19 UTC (Fri) by razb (guest, #43424) [Link]

> Another Hard real time Linux
> [Kernel] Posted Feb 19, 2009 10:12 UTC (Thu) by i3839
>
>> correct. but it is limited only to:
>> 1. accessing ***vmalloc**** space ***directly*** . You can access any
>> kmalloc'ed address directly , and access vmalloc'ed space by walking
>> on the pages. what I mean is that you can access everything.
>> 2. unable to kmalloc
>> 3. unable to free memory. ( For example : kfree ).
>
> What's dangerous about accessing vmalloced space directly if it's
> pinned? Or did I misunderstand?
vmalloc pages are updated to the kernel master page table in the
VMALLOC area. when the processor mmu tries to access these pages it
faults. but, hey , offsched cannot fault.
kmalloc pages are static and do not require faults.
>> You can access any facility in the kernel. you can send or receive
>> packets. and I do it on AMD-Intel machines successfully.
>
> Though those facilities may not access vmalloc space directly, nor
> allocate/free memory? Seems very fragile, because you can't know if they
> will in the future (assuming you audited all the code that may be
> executed by those facilities, which is a lot of tricky work).
vmalloc memory is rarely used. it is used in audio drivers, and for
loading modules which is no more than an annoying problem.

> How can you send and receive packets if you can't allocate the space
> needed for them? Not with the standard networking stack, can you?
Recv: offsched is used for mere packet parsing . once done with the
parsing packet will be moved to kernel or dropped.
Send: pre-allocate all you need.
I am using a private UDP stack. udp is not a big deal.

>> gettimeofday is not a timer, it is a clock. try and schedule a task to
>> be run T microseconds from now, you will skew, and the more tasks, it
>> will skew more.
>
> Right, totally different, sorry. But you only run one task, so the timer
> is just a more efficient way of not doing anything in the meantime?
Only one task ? why not have both recv and transmit ? why do you think
an OS processor is fully utilized ?
Benchmarks show a speed up of 2.8 for an 8 cores machine.
>> even with NAPI you may get your system to be jammed, and worst of all
>> even with unrelated traffic, offsched suggests another approach of
>> containing incoming traffic to a single or more cores. This way cpu0,
>> the main operating system processor, will not be at risk.
>
> This is a generic problem: Any (user or kernel) process can use too many
> resources, slowing down the machine as a whole. Offsched doesn't solve
In NAPI we consume entire system computation power, in offsched we don't. I decided to call it offsched containment concept.
> that at all, except for some explicit kernel cases which are 'ported' to
> offsched, which is a lot of work.
Yes, it is a lot of work, unfortunately. currently i do not know how
much work it is to climb up a TCP stack in offsched context. Do you know of a good RT tcp stack ?
Also, rule of 80-20 proves that 20% of the code can handle 80% of the
cases,so i may find ,myself fixing only 20% of the tcp code. very much depends whether offsched will ever reach mainline.
> realtime preemption, on the other hand, tries to solve this problem in a
> more generic way.

> And moving networking to offsched may contain the damage to one core,
> but it doesn't solve the real problem, e.g. sshing into the box doesn't
> work quicker or better in any way. If the NIC generates more packets
> than can be handled, the right solution is to drop some early. Basically
> what you always do in an overload situation: Don't try to do everything,
> drop some stuff.
why a single NIC ? Many appliances if not most are shipped with an
administration interface, and a public interface.
The public is the exposed interface. if it is under attack, the entire
system is under attack , especially in a world 10G interfaces.
In offsched, we assign OFFSCHED-NAPI over 10G interface....
> Now the nasty thing is that it's hard to see the difference between a
> DoS and just a very high load.
>
> Besides, handling the network packets with all cores instead of one may
> be the difference between being DoSed and just slowed down.
who says a single OFFSCHED core is used ?
>> you cannot run user space with interrupts disabled. So you probably
>> meant kernel space, and it will look something like this:
>
> Bad wording on my part, sorry. No, I meant that all interrupt handlers
> are executed on other cores than the "special" one, and the few that
This is soft real time. user space cannot do hard real time. you can
never guarantee meeting deadlines because you are in ring 3. If you want to use a high priority kernel thread, you probably pre-allocate memory(..well... i do.. ) . so ? better use offsched.
> would happen anyway are disabled semi-permanently. (The scheduling clock
> can be disabled because a rt task is running and no involuntary
> scheduling should happen. Easier now with dynticks though.)
It is a good idea, why not wrap offsched timer with clockevents?
thanks.
> Basically moving the special kernel task running on that core to a
> special user space task running on that core. Or at least add it as an
> option. Add some special syscalls or character drivers to do the more
> esoteric stuff and voila, all done.
>> but you will fail.
>> a processor must walk trough a quiescent state ; if you try it, you
> will
>> have RCU starvation, and I have been there... :) . one of my papers
>> explains that.
>
> This problem is still there though. But it seems like a minor adjustment
> to RCU to teach it that some cores should be ignored, or to keep track
> if some cores did any RCU stuff at all (perhaps it already does that
> now, didn't check).
>
> All in all what you more or less have is standard Linux kernel besides a
> special mini-RT-OS, running on a separate core. Only, you extend the
> current kernel to include the functionality of that RT-OS, and use other
> bits and pieces of the kernel when convenient. This is better than a
> totally separate RT-OS, but still comes with the disadvantages of one:
> Very limited and communication with the rest of the system is tricky. If
> done well it's a small step forwards, but why not think bigger and try
> to solve the tougher problems?
correct. I decided to call it "hybrid system",this is because you
enjoy the stabilty of linux server and OFFSCHED. If A is the size of
your software, and B is the size of the Real time code, B/A is likely
to be small. Why mess with a big RT system for such small fraction ?
You are more than welcome to suggest other strategies.