|| ||Evgeniy Polyakov <firstname.lastname@example.org>|
|| ||[Announce] New netchannels implementation. Userspace network stack.|
|| ||Fri, 20 Oct 2006 13:53:05 +0400|
Netchannel  is pure bridge between low-level hardware and user, without any
special protocol processing involved between them.
Users are not limited to userspace only - I will use this netchannel
infrastructure for fast NAT implementation, which is purely kernelspace user
(although it is possible to create NAT in userspace, but price of the
kernelspace board crossing is too high, which only needs to change some fields
in the header and recalculate checksum).
Userspace network stack  is another user of the new netchannel subsystem.
Current netchannel version supports data transfer using copy*user().
One could ask how does it differ from netfilter's queue target?
There are three differencies (read advantages):
* it does not depend on netfilter (and thus does not introduce it's slow path)
* it is very scalable, since it does not use neither hash tables, nor lists
* it does not depend on netfilter (and thus does not introduce it's slow path).
Yes, again, since if we get into account NAT implementation, then we
need to add dependency on connection tracking, which is not needed for
existing netchannels implementation.
It is also much smaller and scalable compared to tun/tap devices.
And some other small advantages: possibility to perform zero-copy sending and
receiving using network allocator's  facilities (not implemented in the current
version of netchannels), it is very small, there are no locks in the very short
fast path (except RCU and skb queue linking lock, which is held for 5
operations) and so on...
There are also some limitations: it is only possible to get one packet per read
from netchannel's file descriptor (it is possible to extend it to read several
packets, but right now I leave it as is), it is ipv4 only (I'm lazy and only
implemented tree comparison functions for IPv4 addresses).
First user of the netchannel subsystem is userspace network stack , which
* TCP/UDP sending and receiving.
* Timestamp, window scaling, MSS TCP options.
* Slow start and congestion control.
* Route table (including startic ARP cache).
* Socket-like interface.
* IP and ethernet processing code.
* complete retransmit algorithm.
* fast retransmit support.
* support for TCP listen state (only point-to-point mode, i.e. no new data
channels are created, when new client is connected, instead state is changed
according to protocol (TCP state is changed to ESTABLISHED).
* support for the new netchannels interface.
Speed/CPU usage graph for the socket code (which uses epoll and send/recv) is attached.
With the same 100 Mbit speed, CPU usage for netchanenls and userspace
network stack is about 2-3 times smaller than socket one with small
packet (128 bytes) sending/receiving.
There is very strange behaviour of userspace time() function, which if
being used actively results in extremely high kernel load and following
functions start to appear on the top of profiles:
* get_offset_pmtmr() - 25%, second position, even higher than sysenter_past_esp().
* do_gettimeofday() - 0.6%, 4'th place.
* delay_pmtmr() - 0.29%, 11'th place.
First place is poll_idle().
Testing system, which runs either netchannel or socket tests runs
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
stepping : 7
with 1GB of RAM and e100 network adapter on Linux 2.6.17-rc3.
Main (vanilla) system is amd64 with 1GB of RAM and 8169 gigabit adapter
on Linux 2.6.18-1.2200.fc5, software is either netcat dumping data into
/dev/null or sendfile based server.
All sources are available on project's homepages.
1. Netchannels subsystem.
2. Userspace network stack.
3. Network allocator.
If you have read upto here, then I want you to know that adverticement is
over. Thanks again.