User: Password:
|
|
Subscribe / Log in / New account

Another Hard real time Linux

Another Hard real time Linux

Posted Feb 19, 2009 6:44 UTC (Thu) by i3839 (guest, #31386)
In reply to: Another Hard real time Linux by razb
Parent article: Interview: the return of the realtime preemption tree

If I understood your idea correctly, you basically just run some very limited kernel code on a dedicated core with all unrelated interrupts etc. disabled?

That seems so limited that it has not much practical use. Biggest problems are that it can't run user code and that any communication with other cores easily breaks the real-time guarantuee.

Examples:

- 1us accurate timer.
Standard gettimeofday gives me that. The system has plenty very accurate timers, the problem is transferring that info fast enough to where it's needed.

- Firewall/routing/etc. offloading.
This is totally real-time unrelated. Basically it wastes one whole core on doing that instead of letting that core do also other things, and adds extra communication overhead between cores/subsystems (still need to get the packets from somewhere and tell which ones go where etc). It seems the same can be achieved by pinning the NIC interrupts to one core and giving all network related stuff highest priority.

You basically replace standard processes with very limited kernel code running on dedicated core. I don't say this is a bad idea in itself, but for this to make sense you want to have many (independent, low power) cores. I suspect that PC hardware isn't very suitable for this, because too much is shared by cores. It probably makes more sense for embedded systems, but even there it's questionable because of the kernel code only limitation.

What's the advantage of offsched compared to running a user space process at real-time priority pinned on a core with interrupts disabled?

Or in other words, what problem does your approach solve?


(Log in to post comments)

Another Hard real time Linux

Posted Feb 19, 2009 8:34 UTC (Thu) by razb (guest, #43424) [Link]

>If I understood your idea correctly, you basically just run some very limited kernel code on a dedicated core with all unrelated interrupts etc. disabled?

correct. but it is limited only to:
1. accessing ***vmalloc**** space ***directly*** . You can access any kmalloc'ed address directly , and access vmalloc'ed space by walking on the pages. what I mean is that you can access everything.
2. unable to kmalloc
3. unable to free memory. ( For example : kfree ).

>- 1us accurate timer.
>Standard gettimeofday gives me that. The system has plenty very accurate timers, the problem is transferring that info fast enough to where it's needed.
gettimeofday is not a timer, it is a clock. try and schedule a task to be run T microseconds from now, you will skew, and the more tasks , it will skew more.

>- Firewall/routing/etc. offloading.
>This is totally real-time unrelated. Basically it wastes one whole core on doing that instead of letting that core do also other things, and adds extra communication overhead between cores/subsystems (still need to get the packets from somewhere and tell which ones go where etc). It seems the same can be achieved by pinning the NIC interrupts to one core and >giving all network related stuff highest priority.

First, you are correct . It is real time unrelated. offsched is not just for real time use, but for many other things.having high ingest traffic means you will probably enable NAPI, and NAPI disables incoming interrupts to reduce interrupts overhead, and even with NAPI you may get your system to be jammed, and worst of all even with unrelated traffic, offsched suggests another approach of containing incoming traffic to a single or more cores. This way cpu0 , the main operating system processor, will not be at risk. Also, in regard to the waste of processors, again you are correct; but offsched is not meant to be used on your laptop, but on appliances with several cores; which , unfortunately never achieve linear speed-up.

>You basically replace standard processes with very limited kernel code running on dedicated core. I don't say this is a bad idea in itself, but for this to make sense you want to have many (independent, low power) cores. I suspect that PC hardware isn't very suitable for this, because too much is shared by cores. It probably makes more sense for embedded systems, but even there it's questionable because of the kernel code only limitation.
You can access any facility in the kernel. you can send or receive packets. and I do it on AMD-Intel machines successfully.

>What's the advantage of offsched compared to running a user space process at real-time priority pinned on a core with interrupts disabled?
you cannot run user space with interrupts disabled. So you probably meant kernel space, and it will look something like this:
cli
foo()
sti
but you will fail.
a processor must walk trough a quiescent state ; if you try it, you will have RCU starvation, and I have been there... :) . one of my papers explains that.

>Or in other words, what problem does your approach solve?
I merely suggest a different approach for real time and security for machine with several cores or hyper threading.
I am using offsched on my appliances for network work.

Another Hard real time Linux

Posted Feb 19, 2009 10:12 UTC (Thu) by i3839 (guest, #31386) [Link]

> correct. but it is limited only to:
> 1. accessing ***vmalloc**** space ***directly*** . You can access any
> kmalloc'ed address directly , and access vmalloc'ed space by walking
> on the pages. what I mean is that you can access everything.
> 2. unable to kmalloc
> 3. unable to free memory. ( For example : kfree ).

What's dangerous about accessing vmalloced space directly if it's pinned? Or did I misunderstand?

> You can access any facility in the kernel. you can send or receive
> packets. and I do it on AMD-Intel machines successfully.

Though those facilities may not access vmalloc space directly, nor allocate/free memory? Seems very fragile, because you can't know if they will in the future (assuming you audited all the code that may be executed by those facilities, which is a lot of tricky work).

How can you send and receive packets if you can't allocate the space needed for them? Not with the standard networking stack, can you?

> gettimeofday is not a timer, it is a clock. try and schedule a task to
> be run T microseconds from now, you will skew, and the more tasks, it
> will skew more.

Right, totally different, sorry. But you only run one task, so the timer is just a more efficient way of not doing anything in the meantime?

> even with NAPI you may get your system to be jammed, and worst of all
> even with unrelated traffic, offsched suggests another approach of
> containing incoming traffic to a single or more cores. This way cpu0,
> the main operating system processor, will not be at risk.

This is a generic problem: Any (user or kernel) process can use too many resources, slowing down the machine as a whole. Offsched doesn't solve that at all, except for some explicit kernel cases which are 'ported' to offsched, which is a lot of work.

realtime preemption, on the other hand, tries to solve this problem in a more generic way.

And moving networking to offsched may contain the damage to one core, but it doesn't solve the real problem, e.g. sshing into the box doesn't work quicker or better in any way. If the NIC generates more packets than can be handled, the right solution is to drop some early. Basically what you always do in an overload situation: Don't try to do everything, drop some stuff.

Now the nasty thing is that it's hard to see the difference between a DoS and just a very high load.

Besides, handling the network packets with all cores instead of one may be the difference between being DoSed and just slowed down.

> you cannot run user space with interrupts disabled. So you probably
> meant kernel space, and it will look something like this:

Bad wording on my part, sorry. No, I meant that all interrupt handlers are executed on other cores than the "special" one, and the few that would happen anyway are disabled semi-permanently. (The scheduling clock can be disabled because a rt task is running and no involuntary scheduling should happen. Easier now with dynticks though.)

Basically moving the special kernel task running on that core to a special user space task running on that core. Or at least add it as an option. Add some special syscalls or character drivers to do the more esoteric stuff and voila, all done.

> but you will fail.
> a processor must walk trough a quiescent state ; if you try it, you will
> have RCU starvation, and I have been there... :) . one of my papers
> explains that.

This problem is still there though. But it seems like a minor adjustment to RCU to teach it that some cores should be ignored, or to keep track if some cores did any RCU stuff at all (perhaps it already does that now, didn't check).

All in all what you more or less have is standard Linux kernel besides a special mini-RT-OS, running on a separate core. Only, you extend the current kernel to include the functionality of that RT-OS, and use other bits and pieces of the kernel when convenient. This is better than a totally separate RT-OS, but still comes with the disadvantages of one: Very limited and communication with the rest of the system is tricky. If done well it's a small step forwards, but why not think bigger and try to solve the tougher problems?

Another Hard real time Linux

Posted Feb 20, 2009 22:19 UTC (Fri) by razb (guest, #43424) [Link]

> Another Hard real time Linux
> [Kernel] Posted Feb 19, 2009 10:12 UTC (Thu) by i3839
>
>> correct. but it is limited only to:
>> 1. accessing ***vmalloc**** space ***directly*** . You can access any
>> kmalloc'ed address directly , and access vmalloc'ed space by walking
>> on the pages. what I mean is that you can access everything.
>> 2. unable to kmalloc
>> 3. unable to free memory. ( For example : kfree ).
>
> What's dangerous about accessing vmalloced space directly if it's
> pinned? Or did I misunderstand?
vmalloc pages are updated to the kernel master page table in the
VMALLOC area. when the processor mmu tries to access these pages it
faults. but, hey , offsched cannot fault.
kmalloc pages are static and do not require faults.
>> You can access any facility in the kernel. you can send or receive
>> packets. and I do it on AMD-Intel machines successfully.
>
> Though those facilities may not access vmalloc space directly, nor
> allocate/free memory? Seems very fragile, because you can't know if they
> will in the future (assuming you audited all the code that may be
> executed by those facilities, which is a lot of tricky work).
vmalloc memory is rarely used. it is used in audio drivers, and for
loading modules which is no more than an annoying problem.

> How can you send and receive packets if you can't allocate the space
> needed for them? Not with the standard networking stack, can you?
Recv: offsched is used for mere packet parsing . once done with the
parsing packet will be moved to kernel or dropped.
Send: pre-allocate all you need.
I am using a private UDP stack. udp is not a big deal.

>> gettimeofday is not a timer, it is a clock. try and schedule a task to
>> be run T microseconds from now, you will skew, and the more tasks, it
>> will skew more.
>
> Right, totally different, sorry. But you only run one task, so the timer
> is just a more efficient way of not doing anything in the meantime?
Only one task ? why not have both recv and transmit ? why do you think
an OS processor is fully utilized ?
Benchmarks show a speed up of 2.8 for an 8 cores machine.
>> even with NAPI you may get your system to be jammed, and worst of all
>> even with unrelated traffic, offsched suggests another approach of
>> containing incoming traffic to a single or more cores. This way cpu0,
>> the main operating system processor, will not be at risk.
>
> This is a generic problem: Any (user or kernel) process can use too many
> resources, slowing down the machine as a whole. Offsched doesn't solve
In NAPI we consume entire system computation power, in offsched we don't. I decided to call it offsched containment concept.
> that at all, except for some explicit kernel cases which are 'ported' to
> offsched, which is a lot of work.
Yes, it is a lot of work, unfortunately. currently i do not know how
much work it is to climb up a TCP stack in offsched context. Do you know of a good RT tcp stack ?
Also, rule of 80-20 proves that 20% of the code can handle 80% of the
cases,so i may find ,myself fixing only 20% of the tcp code. very much depends whether offsched will ever reach mainline.
> realtime preemption, on the other hand, tries to solve this problem in a
> more generic way.

> And moving networking to offsched may contain the damage to one core,
> but it doesn't solve the real problem, e.g. sshing into the box doesn't
> work quicker or better in any way. If the NIC generates more packets
> than can be handled, the right solution is to drop some early. Basically
> what you always do in an overload situation: Don't try to do everything,
> drop some stuff.
why a single NIC ? Many appliances if not most are shipped with an
administration interface, and a public interface.
The public is the exposed interface. if it is under attack, the entire
system is under attack , especially in a world 10G interfaces.
In offsched, we assign OFFSCHED-NAPI over 10G interface....
> Now the nasty thing is that it's hard to see the difference between a
> DoS and just a very high load.
>
> Besides, handling the network packets with all cores instead of one may
> be the difference between being DoSed and just slowed down.
who says a single OFFSCHED core is used ?
>> you cannot run user space with interrupts disabled. So you probably
>> meant kernel space, and it will look something like this:
>
> Bad wording on my part, sorry. No, I meant that all interrupt handlers
> are executed on other cores than the "special" one, and the few that
This is soft real time. user space cannot do hard real time. you can
never guarantee meeting deadlines because you are in ring 3. If you want to use a high priority kernel thread, you probably pre-allocate memory(..well... i do.. ) . so ? better use offsched.
> would happen anyway are disabled semi-permanently. (The scheduling clock
> can be disabled because a rt task is running and no involuntary
> scheduling should happen. Easier now with dynticks though.)
It is a good idea, why not wrap offsched timer with clockevents?
thanks.
> Basically moving the special kernel task running on that core to a
> special user space task running on that core. Or at least add it as an
> option. Add some special syscalls or character drivers to do the more
> esoteric stuff and voila, all done.
>> but you will fail.
>> a processor must walk trough a quiescent state ; if you try it, you
> will
>> have RCU starvation, and I have been there... :) . one of my papers
>> explains that.
>
> This problem is still there though. But it seems like a minor adjustment
> to RCU to teach it that some cores should be ignored, or to keep track
> if some cores did any RCU stuff at all (perhaps it already does that
> now, didn't check).
>
> All in all what you more or less have is standard Linux kernel besides a
> special mini-RT-OS, running on a separate core. Only, you extend the
> current kernel to include the functionality of that RT-OS, and use other
> bits and pieces of the kernel when convenient. This is better than a
> totally separate RT-OS, but still comes with the disadvantages of one:
> Very limited and communication with the rest of the system is tricky. If
> done well it's a small step forwards, but why not think bigger and try
> to solve the tougher problems?
correct. I decided to call it "hybrid system",this is because you
enjoy the stabilty of linux server and OFFSCHED. If A is the size of
your software, and B is the size of the Real time code, B/A is likely
to be small. Why mess with a big RT system for such small fraction ?
You are more than welcome to suggest other strategies.

Another Hard real time Linux

Posted Feb 20, 2009 13:25 UTC (Fri) by saffroy (subscriber, #43999) [Link]

Another approach is to use a real-time hypervisor: you can have real-time scheduling, (almost) full access to the bare-metal, and even (more or less) friendly APIs to communicate with the other OS. You can even have a full-featured RTOS running there.

BTW, is it reasonable to imagine the RT-preempt tree running kvm running a RTOS ?

Another Hard real time Linux

Posted Feb 20, 2009 22:26 UTC (Fri) by razb (guest, #43424) [Link]

> Another Hard real time Linux
> [Kernel] Posted Feb 20, 2009 13:25 UTC (Fri) by saffroy
>
> Another approach is to use a real-time hypervisor: you can have
> real-time scheduling, (almost) full access to the bare-metal, and even
> (more or less) friendly APIs to communicate with the other OS. You can
> even have a full-featured RTOS running there.
Funny you mention it. I actually thought of using this technology to have a solution for a single cpu machines. But it turned out that hyper-threading is good enough for offsched, so i did not try it. But i very much agree, we do not utilize the machines enough.
> BTW, is it reasonable to imagine the RT-preempt tree running kvm running
> a RTOS ?
don't know.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds