LWN: Comments on "NAPI polling in kernel threads"

NAPI polling in kernel threads

immibis — Mon, 02 Nov 2020 17:00:56 +0000

That sounds exactly like NOHZ_FULL

NAPI polling in kernel threads

amworsley — Sat, 24 Oct 2020 01:39:45 +0000

About time. Having packets processed in high priority software IRQs is a gift to those who want to carry out denial of service attacks against CPU limited embedded processors.

NAPI polling in kernel threads

ncm — Wed, 14 Oct 2020 22:38:17 +0000

Agreed, I don't have any direct experience with DPDK. I have used Onload/Ef_vi for Solarflare hardware (sold by Xilinx, maybe soon AMD), libexanic for Exablaze hardware (sold by Cisco now), and Napatech. I have studied Netronome, which enables running eBPF on the packet before it hits host memory, that can drop the packet at that stage.

Each has its own idiosyncratic filtering configuration and ring buffer layout. ExaNIC is unusual in delivering packets 120 bytes at a time, enabling partial processing while the rest of the packet is still coming in.

There are various accommodations to use in VMs, which I have not experimented with.

Keeping the kernel's greedy fingers off of my cores is one of the harder parts of the job. It means lots of custom boot parameter incantations, making deployment to somebody else's equipment a chore. It would be much, much better if the process could simply tell the kernel, "I will not be doing any more system calls, please leave my core completely alone from this point", and have that stick. Such a process does all its subsequent work entirely via mapped memory.

NAPI polling in kernel threads

wkudla — Tue, 13 Oct 2020 20:59:25 +0000

I don't think they are referring to DPDK. It's rather about solutions such as SolarFlare NICs and kernel bypass with OpenOnload.
It's extremely popular in fintech and other latency sensitive fields.

I can't wait to get rid of softirqs from my critical CPUs. Those and tasklets are a nightmare when you're trying to reduce platform jitter to the minimum.

NAPI polling in kernel threads

marcH — Tue, 13 Oct 2020 03:15:56 +0000

You mean like DPDK?

Any reason not to mention any specific example(s)?

NAPI polling in kernel threads

nevets — Mon, 12 Oct 2020 20:08:15 +0000

My slides are there: https://blog.linuxplumbersconf.org/2009/slides/Steven-Ros...

NAPI polling in kernel threads

nevets — Mon, 12 Oct 2020 20:01:45 +0000

Exactly! I proposed this work back in 2009 at Linux Plumbers. My idea was to call it "ENAPI" for "Even-Newer API".

https://blog.linuxplumbersconf.org/ocw/proposals/53

I may even be able to find my slides somewhere. There was a lot of skepticism about this approach (even from Eric Dumazet), but like threaded interrupts in general, I was confident that this would sooner or later be something that non RT folks would want.

NAPI polling in kernel threads

darwi — Mon, 12 Oct 2020 03:40:47 +0000

> I am aware, but NAPI is used only in networking AFAIK.

Yes of course. My point was that RT runs almost all softirqs at kthread/task context, not just NAPI. Thus, RT handles the generic case (almost all softirqs), while the patch set mentioned in the article only handles one of its special cases (NAPI).

NAPI polling in kernel threads

alison — Sun, 11 Oct 2020 22:16:57 +0000

> Softirqs are used still used in a big number of places beyond networking.

I am aware, but NAPI is used only in networking AFAIK. Thanks for saying "softirqs" rather than "software interrupts": ugh!

NAPI polling in kernel threads

darwi — Sun, 11 Oct 2020 17:46:10 +0000

> Indeed, it makes one wonder if a new implementation is needed.

Softirqs are used still used in a big number of places beyond networking. See the full list, enum *_SOFTIRQ, at include/linux/interrupt.h

NAPI polling in kernel threads

itsmycpu — Sun, 11 Oct 2020 12:29:00 +0000

It sounds like it can be toggled through sysfs. So much better than having to (re)compile the kernel. :)

NAPI polling in kernel threads

tpo — Sat, 10 Oct 2020 15:41:59 +0000

Wow, excellent article about fundamental mechanisms and concepts and how they are evolving. So much appreciated <3 !

NAPI polling in kernel threads

ncm — Sat, 10 Oct 2020 08:40:15 +0000

Those of us who care most about performance and minimizing overhead are using "kernel-bypass" libraries with NICs that dump incoming packets into a ring buffer mapped into user-space memory. The kernel driver for such a NIC sets up filter criteria programmed into registers in an ASIC or FPGA on the NIC, and then leaves it to run freely DMAing incoming packets sequentially into the ring buffer interspersed with annotations like length, timestamp, and checksum, and updates an atomic shared index/pointer when the packet is ready.

The user program just needs to poll for updates to this index, and then finish all its work on the packet before it gets overwritten, as little as a few ms later. That work might be just to copy the packet to a bigger ring buffer for other processes to look at under more-relaxed time constraints.

The kernel driver watches its own mapping of such a ring buffer, and copies out packets that processes have expressed interest in to regular buffers to be delivered, or to be processed according to TCP protocol, e.g. to acknowledge, or to run them past the firewall first.

NAPI polling in kernel threads

alison — Sat, 10 Oct 2020 04:30:05 +0000

> this would also be very helpful in making !RT and RT kernels closer...

Indeed, it makes one wonder if a new implementation is needed.

I suppose we'll call this solution the NNAPI.

NAPI polling in kernel threads

darwi — Fri, 09 Oct 2020 20:54:55 +0000

> Once NAPI polling moves to its own kernel thread, it becomes much more visible and subject to administrator control. A kernel thread can have its priority changed, and it can be bound to a specific set of CPUs; that allows the administrator to adjust how that work is done in relation to the system's user-space workload. Meanwhile, the CPU scheduler will have a better understanding of how much CPU time NAPI polling requires... Time spent handling software interrupts, instead, is nearly invisible to the scheduler.

IMHO, this would also be very helpful in making !RT and RT kernels closer...

PREEMPT_RT already runs softirqs in their own kernel threads, so that they can be prioritized and not affect random victims and real-time threads. Maybe, soon, all mainline kernels will be like RT in that regard ;-)