|
|
Subscribe / Log in / New account

OpenSSL 3.2.0 released

OpenSSL 3.2.0 released

Posted Nov 29, 2023 15:09 UTC (Wed) by wtarreau (subscriber, #51152)
In reply to: OpenSSL 3.2.0 released by DemiMarie
Parent article: OpenSSL 3.2.0 released

The cost of running in userland is the overhead of syscalls per packet. And sendmmsg() or recvmmsg() will not change much, since every buffer must have its address checked, which also comes with a cost. An alternate approach would be to make it run entirely in userland with a userland driver such as DPDK or using AF_PACKET or AF_XDP etc. In this case you retrieve batches of packets and don't need to recheck their individual addresses.

Regarding the choice of language, that's interesting. Low levels like this require cross-references between many elements (packets, ack ranges, rx buffers etc) so writing them in too strict languages either would require a lot of unsafe sections (hence no benefit) or invent a complex and expensive model to track everything. Given that the first concern about packet-based protocols like this is the processing cost, a more complex and more expensive implementation could very possibly become its own security issue by being easier to DoS. Such a design must not be neglected at all and there is absolutely no room for compromise between performance and safety here, you need to have both, even if the second one is only guaranteed by the developer.


to post comments

OpenSSL 3.2.0 released

Posted Nov 29, 2023 15:45 UTC (Wed) by paulj (subscriber, #341) [Link] (6 responses)

There is also GSO segment offload (what I brainfartingly referred to as "overload" in sibling comment) - you can send many packets worth of data with /1/ sendmsg call and the kernel sends that as a series of packets to the same destination. You can combine with sendmmsg to send trains of packets to multiple destinations. Worth a good bit according to Cloudflare. You can also control the pacing (to an extent) with SO_TXTIME / SCM_TXTIME - you can specify the launch time for each message (before GSO) - may be important for congestion control.

DPDK is not an option at all for many use-cases (mobile, shared servers, containers, VMs, etc..) - also an energy muncher in the typical busy-loop, poll-driven use. I think it is meant to support interrupts now though, not kept up with how well that works. (??)

I agree on the language thing.

OpenSSL 3.2.0 released

Posted Nov 29, 2023 16:54 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (5 responses)

For GSO we intend to study it. I'm not much convinced for now, I suspect it could add more complexity on the sender side to send perfectly aligned packets so that the stack cuts them on the correct boundaries. But that's still on the todo list.

DPDK is not interesting for regular servers, but network equipment vendors (DDoS protection, load balancers etc) need to cram the highest possible performance in a single device and they already use that extensively.

DPDK uses

Posted Nov 29, 2023 18:49 UTC (Wed) by DemiMarie (subscriber, #164188) [Link] (1 responses)

Does HAProxy Technologies use DPDK in its commercial products?

DPDK uses

Posted Dec 1, 2023 6:06 UTC (Fri) by wtarreau (subscriber, #51152) [Link]

> Does HAProxy Technologies use DPDK in its commercial products?

No. We gave it a try 10 years ago for anti-ddos stuff and we found that it was much more efficient to implement it early in the regular driver (hence the NDIV framework I created by then, presentation here: https://kernel-recipes.org/en/2014/ndiv-a-low-overhead-network-traffic-diverter/ ). Recently we ported it to XDP, losing a few optimizations but apparently recent updates should allow us to recover them. And that way we don't have to maintain our patches to these drivers anymore.

The reason why solutions like netmap and DPDK are not interesting in our case is that we still want to use the NIC as a regular one. With these frameworks, you lose the NIC from the system so it's up to the application to forward packets in and out using a much slower API (we tried). DPDK is very interesting when you process 100% of the NIC's traffic inside the DPDK application, and for TCP you'd need to use one of the available TCP stacks. But I still prefer to the use kernel's stack for TCP, as it's fast, reliable and proven. It's already possible for us to forward 40 GbE of L7 TLS traffic on an outdated 8th gen 4-core desktop CPU, and 100 GbE on an outdated 8-core one. DPDK would allow us to use even smaller CPUs but there's no point doing this, those who need such levels of traffic are not seeking to save $50 on the CPU to reuse an old machine that will cost much more on the electricity bill! Thus when you use the right device for the job, for L7 proxying these frameworks do not bring benefits.

OpenSSL 3.2.0 released

Posted Dec 1, 2023 11:53 UTC (Fri) by paulj (subscriber, #341) [Link] (2 responses)

On UDP GSO, you control the packet size - the kernel does not arbitrarily chop your buffer into whatever packets (that wouldn't work for the reason you give). You specify the packet size, either via a socket option on the socket, or a per-call cmsg on when you send your msg. See "Optimizing UDP for content delivery: GSO, pacing and zerocopy" for an example.

The packets then all have to be same size, but that's the common case when sending trains of max-size packets.

So basically with GSO + sendmmsg you can send:

Time t_1:
- burst x_1 to dest x
- burst y_1 to dest y
<etc>
Time t_2:
- burst x_2 to x
- <etc>
....
Time t_n:
<etc>

You can send a CWND worth of packet trains to many destinations, with the packets to each destination correctly spaced out into smaller bursts to be network congestion-control friendly. All in 1 syscall.

OpenSSL 3.2.0 released

Posted Dec 1, 2023 11:56 UTC (Fri) by paulj (subscriber, #341) [Link] (1 responses)

Oh, and to be clear, only packets in the same burst (i.e., same "super-"message) must have the same size - you can set the GSO packet size in the cmsg for the msg that is to be split via GSO.

OpenSSL 3.2.0 released

Posted Dec 1, 2023 13:09 UTC (Fri) by wtarreau (subscriber, #51152) [Link]

Thanks for the summary and the pointer, I've looked at the doc from Alex and Eric and it's pretty clear on how to proceed. This will definitely encourage us to start to experiment with it soon ;-)

OpenSSL 3.2.0 released

Posted Nov 29, 2023 18:47 UTC (Wed) by DemiMarie (subscriber, #164188) [Link] (2 responses)

Cloudflare’s QUIC implementation is written in Rust and powers their edge network, so I’m not concerned about Rust being too slow.

OpenSSL 3.2.0 released

Posted Nov 30, 2023 0:25 UTC (Thu) by wahern (guest, #37304) [Link] (1 responses)

quiche seems to conveniently omit implementing the layers that require the complex reference graphs: "The application is responsible for providing I/O (e.g. sockets handling) as well as an event loop with support for timers." IOW, quiche exposes an interface for processing and producing packets for individual connections. The actual process I/O (blocking, non-blocking, aggregating with sendmmsg, etc) as well as global connection book keeping (e.g. indexing connection state) is left up to the application.

This actually seems like a solid example of how to best make use of Rust's strengths, admitting some of its deficits as a standalone language or for writing soup-to-nuts frameworks.

OpenSSL 3.2.0 released

Posted Dec 1, 2023 11:35 UTC (Fri) by paulj (subscriber, #341) [Link]

This is a very common pattern for user-space network protocol libraries (ones that want to be widely used anyway).

You want to avoid them doing the actual I/O, you want to avoid coding them to any specific event library. So they generally end up having 2 sets of interfaces: a) The direct API the user calls into the library with, to supply inbound packets, trigger timing events, etc.; b) The indirect API by which the library calls out to and outputs it's work back to the user, e.g. to send packets, to setup a timer event, etc - i.e. a set of callbacks the user supplies in setup, using the direct API.

Google Quiche (yay, multiple projects in the QUIC space have the same name!) and LsQuic have the same pattern.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds