OpenSSL 3.2.0 released

[Posted November 26, 2023 by corbet]

OpenSSL 3.2.0 has been released. New features include client-side QUIC support, a number of new cryptographic algorithms, support for TCP fast open, TLS certificate compression, and more.

OpenSSL 3.2.0 released

Posted Nov 26, 2023 21:40 UTC (Sun) by Kamilion (subscriber, #42576) [Link] (43 responses)

Forgive my memory, but wasn't OpenSSL3's big pain-in-the-side the *removal* of a number of cryptographic algorithms? Now we're back to adding crypto-legos again? I'm all-in for adopting ed25519, but this is a little on the ridiculous side. "It only took two point releases" and all the fallout from the OpenSSL1 -> 3 transition is still making enough noise on my mailing list Geiger counters for me to consider it radioactive.

Oh well, at least it released "Early enough" for it to have some consideration to land in Noble.

OpenSSL 3.2.0 released

Posted Nov 26, 2023 21:46 UTC (Sun) by NightMonkey (subscriber, #23051) [Link] (1 responses)

I get that no one likes too much administrative work, but cryptography is not a stale branch of math or science, and neither is computer science and hardware development, so I think we should expect ongoing changes, additions, removals and updates. :)

Change is the one thing that is a safe bet. Cheers.

OpenSSL 3.2.0 released

Posted Nov 26, 2023 22:18 UTC (Sun) by Kamilion (subscriber, #42576) [Link]

Change, I'm not against, in the least.

It's more the fact that the express stated intent of the transition to 3.x was to reduce the combinatorial explosion of stack-your-own-lego-towers.

Hopefully these added primitives are assemblies that 'work well', and not just a tub of legos where you just can't seem to find another red brick when you need it.

OpenSSL 3.2.0 released

Posted Nov 27, 2023 9:50 UTC (Mon) by DemiMarie (subscriber, #164188) [Link] (40 responses)

OpenSSL 3 has horrible lock contention problems, and I don’t think anyone intends to use its QUIC implementation.

OpenSSL 3.2.0 released

Posted Nov 27, 2023 13:30 UTC (Mon) by paulj (subscriber, #341) [Link] (39 responses)

I'm guessing the QUIC support is there for s_client support? There to help debug, not for any serious (performance, etc.) use?

OpenSSL 3.2.0 released

Posted Nov 27, 2023 18:55 UTC (Mon) by wtarreau (subscriber, #51152) [Link] (38 responses)

The thing is that the team in charge of openssl wastes a huge amount of time working on a useless QUIC client (single stream last time I checked) instead of spending the time recovering the massive performance they've lost from 1.1.1. We're speaking of ratios up to 400x in certain environments!

And all those using QUIC (which provides a reliable transport layer over datagrams, i.e. does not compare at all with TLS) already use either their own implementation or available ones. It takes about 3 man-years to write your own QUIC stack and even when you end up there you still don't have the best of the world, let alone a trivially portable library. So it will take a huge amount of time before they can reach something barely usable by a tool such as Curl.

Then there is the problem of performance. One reason there are so many implemlentations is that different products have different constraints (threading, event loops, malloc vs no-malloc etc). There's no way a one-size-fits-all lib will be usable anywhere performance matters, so that's another reason for the uselessness of this development. And if you count the fact that those in charge of the architecture are the same who accidentally divided the lib's performance by 400 without even noticing (since until a few months ago they were not even testing their changes), you cannot expect anything that you would even recommend to your ennemies.

No really that's a pure waste of time and energy and during this time the support for the last working openssl version (1.1.1) is fading away. Fortunately there are others like wolfssl and aws-lc who are trying to provide working alternatives but the situation is still too cahotic for anyone to be able to claim "have fun with your toy, guys, we don't need openssl anymore".

My personal feeling is that they probably needed to present new features to justify salaries to the foundation, because "oops we messed up and we need one year to fix all this" doesn't sell well, nor does "we'll finally merge the quictls patchset that we've been wiping our asses with for 4 years", which was THE very important thing to do.

Before the QUIC fiasco, we really intended to dedicate some manpower to help them work on the performance issues. But for what given that this lib is now dead, on purpose by its maintainers ? No QUIC, no future. It's sad and it's a mess as there are so many projects relying on it... (well, on its API).

OpenSSL 3.2.0 released

Posted Nov 28, 2023 11:04 UTC (Tue) by paulj (subscriber, #341) [Link] (37 responses)

Huh, wow. Some simple client that was just good enough for s_client for debug stuff, ok possibly (though, why not use a library?). But, if they put a lot of energy into this with the idea that it would be used by others for real stuff, as part of the OpenSSL lib? Weird :).

FWIW, wrt QUIC and performance. QUIC is possibly misnamed... Certainly, there are major implementations that are essentially atrocious on performance if used server side (OK on the client). Which I think is why so many cloud companies have had to go and write their own implementation. The protocol also has a couple of places where it allows too much, meaning supporting this leads to a bit of extra complexity and (on one certain thing) a /lot/ more worst-case performance overheads, when compared to TCP.

OpenSSL 3.2.0 released

Posted Nov 28, 2023 11:06 UTC (Tue) by DemiMarie (subscriber, #164188) [Link] (16 responses)

Which implementations are you referring to, and why is their performance so bad?

OpenSSL 3.2.0 released

Posted Nov 28, 2023 11:27 UTC (Tue) by paulj (subscriber, #341) [Link] (15 responses)

Public Google QUIC ("quiche" - not to be confused with other implementation also called "quiche"). It's used by Chrome, and is fine there. It has some bad performance issues in ACK processing, which affect server use.

OpenSSL 3.2.0 released

Posted Nov 28, 2023 11:47 UTC (Tue) by paulj (subscriber, #341) [Link] (14 responses)

Oh, and on "allowing too much" - ACKs in QUIC have an unlimited number of ACK ranges (other than the general bounds on what QUIC frames can hold). This implies using general purpose range tracking data-structures. Which come with much higher overheads compared to a data-structure to track a known, fixed set of ranges. Additionally, QUIC also uses packet numbers and ACKs packet numbers. Which is a double edged sword - it fixes some issues TCP has (no ambiguity on which sent packet is being ACKed), but it adds costs (you now have an extra level of indirection between what your ACKs are ACKing and your data-stream, which you must efficiently manage).

This is a major reason why QUIC server side code costs a lot more than TCP on ACK processing. And it's very easy to make a small change and blow your performance out by 100x.

I havn't seen clear evidence that the unbounded ACK ranges of QUIC have given it an improvement on congestion control, to make up for the data-structure costs. Most people who measure find HTTP/QUIC to have lower throughput than HTTP/TCP. Usually they look to low-level UDP I/O optimisations to regain performance (UDP GRO, mmsg, etc.), but I think the protocol itself has some complications that require extra care to get performance. Which not all implementations have.

LsQuic is pretty fast though - much faster than Google Quiche - server side. Though, still not as efficient as kernel TCP.

OpenSSL 3.2.0 released

Posted Nov 29, 2023 12:35 UTC (Wed) by DemiMarie (subscriber, #164188) [Link] (12 responses)

Would a kernel-mode QUIC implementation solve the performance problems? I would be opposed to one written in C due to security concerns, but one written in Rust would be quite interesting, especially with sendfile and io_uring integration.

OpenSSL 3.2.0 released

Posted Nov 29, 2023 14:07 UTC (Wed) by paulj (subscriber, #341) [Link]

The performance issues around the protocol features that require data-structures with more overhead to track, compared to TCP - those can not be fixed by a kernel implementation. I.e., unbounded ACK ranges, and the added indirection between the packet numbers in ACKs and the data byte ranges.

You just need an implementation that is /super/ careful about both the /design/ and implementation of the state tracking code to handle those things.

If you have that, how much difference could a kernel implementation make? I don't know. Personally, I think with io_uring and segment overload there isn't much difference to be had on I/O costs.

One place a kernel implementation can win is that (at least some) handling of ACKs can be done in soft-IRQs, from any user context - no need to switch to the specific user context, or switch from kernel to user mode. So that would probably get some performance benefit - not just in CPU, but, e.g., also in terms of reduced jitter in sending ACKs for received data, which would improve the congestion control behaviour of the protocol and potentially squeeze out a little bit more performance from the network.

But hard to say. ;)

OpenSSL 3.2.0 released

Posted Nov 29, 2023 15:09 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (10 responses)

The cost of running in userland is the overhead of syscalls per packet. And sendmmsg() or recvmmsg() will not change much, since every buffer must have its address checked, which also comes with a cost. An alternate approach would be to make it run entirely in userland with a userland driver such as DPDK or using AF_PACKET or AF_XDP etc. In this case you retrieve batches of packets and don't need to recheck their individual addresses.

Regarding the choice of language, that's interesting. Low levels like this require cross-references between many elements (packets, ack ranges, rx buffers etc) so writing them in too strict languages either would require a lot of unsafe sections (hence no benefit) or invent a complex and expensive model to track everything. Given that the first concern about packet-based protocols like this is the processing cost, a more complex and more expensive implementation could very possibly become its own security issue by being easier to DoS. Such a design must not be neglected at all and there is absolutely no room for compromise between performance and safety here, you need to have both, even if the second one is only guaranteed by the developer.

OpenSSL 3.2.0 released

Posted Nov 29, 2023 15:45 UTC (Wed) by paulj (subscriber, #341) [Link] (6 responses)

There is also GSO segment offload (what I brainfartingly referred to as "overload" in sibling comment) - you can send many packets worth of data with /1/ sendmsg call and the kernel sends that as a series of packets to the same destination. You can combine with sendmmsg to send trains of packets to multiple destinations. Worth a good bit according to Cloudflare. You can also control the pacing (to an extent) with SO_TXTIME / SCM_TXTIME - you can specify the launch time for each message (before GSO) - may be important for congestion control.

DPDK is not an option at all for many use-cases (mobile, shared servers, containers, VMs, etc..) - also an energy muncher in the typical busy-loop, poll-driven use. I think it is meant to support interrupts now though, not kept up with how well that works. (??)

I agree on the language thing.

OpenSSL 3.2.0 released

Posted Nov 29, 2023 16:54 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (5 responses)

For GSO we intend to study it. I'm not much convinced for now, I suspect it could add more complexity on the sender side to send perfectly aligned packets so that the stack cuts them on the correct boundaries. But that's still on the todo list.

DPDK is not interesting for regular servers, but network equipment vendors (DDoS protection, load balancers etc) need to cram the highest possible performance in a single device and they already use that extensively.

DPDK uses

Posted Nov 29, 2023 18:49 UTC (Wed) by DemiMarie (subscriber, #164188) [Link] (1 responses)

Does HAProxy Technologies use DPDK in its commercial products?

DPDK uses

Posted Dec 1, 2023 6:06 UTC (Fri) by wtarreau (subscriber, #51152) [Link]

> Does HAProxy Technologies use DPDK in its commercial products?

No. We gave it a try 10 years ago for anti-ddos stuff and we found that it was much more efficient to implement it early in the regular driver (hence the NDIV framework I created by then, presentation here: https://kernel-recipes.org/en/2014/ndiv-a-low-overhead-network-traffic-diverter/ ). Recently we ported it to XDP, losing a few optimizations but apparently recent updates should allow us to recover them. And that way we don't have to maintain our patches to these drivers anymore.

The reason why solutions like netmap and DPDK are not interesting in our case is that we still want to use the NIC as a regular one. With these frameworks, you lose the NIC from the system so it's up to the application to forward packets in and out using a much slower API (we tried). DPDK is very interesting when you process 100% of the NIC's traffic inside the DPDK application, and for TCP you'd need to use one of the available TCP stacks. But I still prefer to the use kernel's stack for TCP, as it's fast, reliable and proven. It's already possible for us to forward 40 GbE of L7 TLS traffic on an outdated 8th gen 4-core desktop CPU, and 100 GbE on an outdated 8-core one. DPDK would allow us to use even smaller CPUs but there's no point doing this, those who need such levels of traffic are not seeking to save $50 on the CPU to reuse an old machine that will cost much more on the electricity bill! Thus when you use the right device for the job, for L7 proxying these frameworks do not bring benefits.

OpenSSL 3.2.0 released

Posted Dec 1, 2023 11:53 UTC (Fri) by paulj (subscriber, #341) [Link] (2 responses)

On UDP GSO, you control the packet size - the kernel does not arbitrarily chop your buffer into whatever packets (that wouldn't work for the reason you give). You specify the packet size, either via a socket option on the socket, or a per-call cmsg on when you send your msg. See "Optimizing UDP for content delivery: GSO, pacing and zerocopy" for an example.

The packets then all have to be same size, but that's the common case when sending trains of max-size packets.

So basically with GSO + sendmmsg you can send:

Time t_1:
- burst x_1 to dest x
- burst y_1 to dest y
<etc>
Time t_2:
- burst x_2 to x
- <etc>
....
Time t_n:
<etc>

You can send a CWND worth of packet trains to many destinations, with the packets to each destination correctly spaced out into smaller bursts to be network congestion-control friendly. All in 1 syscall.

OpenSSL 3.2.0 released

Posted Dec 1, 2023 11:56 UTC (Fri) by paulj (subscriber, #341) [Link] (1 responses)

Oh, and to be clear, only packets in the same burst (i.e., same "super-"message) must have the same size - you can set the GSO packet size in the cmsg for the msg that is to be split via GSO.

OpenSSL 3.2.0 released

Posted Dec 1, 2023 13:09 UTC (Fri) by wtarreau (subscriber, #51152) [Link]

Thanks for the summary and the pointer, I've looked at the doc from Alex and Eric and it's pretty clear on how to proceed. This will definitely encourage us to start to experiment with it soon ;-)

OpenSSL 3.2.0 released

Posted Nov 29, 2023 18:47 UTC (Wed) by DemiMarie (subscriber, #164188) [Link] (2 responses)

Cloudflare’s QUIC implementation is written in Rust and powers their edge network, so I’m not concerned about Rust being too slow.

OpenSSL 3.2.0 released

Posted Nov 30, 2023 0:25 UTC (Thu) by wahern (guest, #37304) [Link] (1 responses)

quiche seems to conveniently omit implementing the layers that require the complex reference graphs: "The application is responsible for providing I/O (e.g. sockets handling) as well as an event loop with support for timers." IOW, quiche exposes an interface for processing and producing packets for individual connections. The actual process I/O (blocking, non-blocking, aggregating with sendmmsg, etc) as well as global connection book keeping (e.g. indexing connection state) is left up to the application.

This actually seems like a solid example of how to best make use of Rust's strengths, admitting some of its deficits as a standalone language or for writing soup-to-nuts frameworks.

OpenSSL 3.2.0 released

Posted Dec 1, 2023 11:35 UTC (Fri) by paulj (subscriber, #341) [Link]

This is a very common pattern for user-space network protocol libraries (ones that want to be widely used anyway).

You want to avoid them doing the actual I/O, you want to avoid coding them to any specific event library. So they generally end up having 2 sets of interfaces: a) The direct API the user calls into the library with, to supply inbound packets, trigger timing events, etc.; b) The indirect API by which the library calls out to and outputs it's work back to the user, e.g. to send packets, to setup a timer event, etc - i.e. a set of callbacks the user supplies in setup, using the direct API.

Google Quiche (yay, multiple projects in the QUIC space have the same name!) and LsQuic have the same pattern.

OpenSSL 3.2.0 released

Posted Nov 29, 2023 16:09 UTC (Wed) by paulj (subscriber, #341) [Link]

There is maybe a simple tweak that can cut gQuiche's ACK processing costs by perhaps 20% to 50%... Just validating that.

However, that stack was over an order of magnitude higher in CPU costs compared to the likes of LsQuic, and scales worse. So even if I rerun those comparisons with that tweak, gQuiche will still not be very good (ACK intensive side - i.e. server side).

QUIC permissiveness

Posted Nov 28, 2023 11:07 UTC (Tue) by DemiMarie (subscriber, #164188) [Link] (12 responses)

Where does QUIC allow too much?

QUIC permissiveness

Posted Nov 28, 2023 11:36 UTC (Tue) by paulj (subscriber, #341) [Link] (11 responses)

QUIC specifies 2 ways to identify connections - by 4-tuple, or by negotiated "Connection IDs" (CIDs). So your demux path has to be able to switch between 4-tuple or CIDs to route packets through to the appropriate state for connection-packet processing. Further, the CIDs are negotiated, rotated and retired by other means, so you have to keep track of all that and get it right (and some implementations just have convoluted quirks in their code as a result). Further, there is an implicit binding between CID and 4-tuple, if using CIDs, which must also be tracked and acted on if changes are sensed.

The point of all this complexity is to try prevent middle-boxes correlating paths to connections, to try enhance privacy. Which is something it can not at all guarantee or do well. If you have this level of privacy requirements, you need Tor - not this network-visible-CID rotating stuff in QUIC, which is giving you only minimal privacy guarantees, at best.

QUIC permissiveness

Posted Nov 28, 2023 11:52 UTC (Tue) by farnz (subscriber, #17727) [Link] (10 responses)

I was under the impression that the goal is not to enhance privacy specifically; rather, it's to prevent the situation we see in TCP, where cheap middle-boxes drop or modify packets that don't conform to their idea of what TCP "should" look like, making things like MPTCP a pain to design and deploy. This has effects like middle-boxes changing parts of the packet to get specific congestion control behaviour out of TCP on the assumption that a specific algorithm is in use, making it hard to change the algorithm.

Thus, it doesn't need to be a privacy guarantee - it just needs to be work that the end-points can do, but that middle-boxes will struggle to do, so that middle-boxes don't interfere with QUIC expecting certain behaviours from it.

QUIC permissiveness

Posted Nov 28, 2023 12:58 UTC (Tue) by paulj (subscriber, #341) [Link] (9 responses)

Keeping as much of the packet state encrypted as possible prevents midlde-boxes from having ideas about state.

The CID is there to give a packet demux ID independent of the 4-tuple - so a connection can survive a NAT change. "Ah, but that means a middle-box could know that different end-points were in fact the same!" - so the CID rotation stuff is added. Except an observer in the middle will get to see the old CID on the changed 4-tuple anyway - before the end-points see. Rotating CIDs there after doesn't give any great privacy benefit.

There isn't any need to change it in QUIC now that it's there, but I also think it was... slightly over complicated for minimal benefit.

QUIC permissiveness

Posted Nov 28, 2023 19:40 UTC (Tue) by riking (guest, #95706) [Link] (3 responses)

And don't forget that they added the spin bit back in, which explicitly allows middleboxes to measure E2E latency. More complexity!

QUIC permissiveness

Posted Nov 29, 2023 13:58 UTC (Wed) by paulj (subscriber, #341) [Link] (2 responses)

Spin bit is kind of neat. Not too complicated to support. Though, not entirely accurate either.

I'm in 2 minds about the loss of insight into performance of transport flows with QUIC. With TCP you can capture and make nice sequence graphs showing exactly what's going on from a network POV. With QUIC, that is lost - unless you have the private key. Which a network operator will not have, and which even the application owner generally will not retrospectively have. It's a real shame to lose that insight. On the other hand, it's nice to make the transport opaque.

Does QUIC have the balance right? I don't know.

QUIC permissiveness

Posted Nov 29, 2023 14:32 UTC (Wed) by farnz (subscriber, #17727) [Link] (1 responses)

The network operators have demonstrated that if they have the private key, they will misuse it (as they have misused the ability to tamper with TCP traffic beyond port numbers changing in a NAPT). It's just a shame that applications don't make it easy to record the private keys for retrospective analysis by the owner of an endpoint.

QUIC permissiveness

Posted Nov 29, 2023 15:02 UTC (Wed) by paulj (subscriber, #341) [Link]

You don't need to give operators the private key to let them see information. E.g. the QUIC header is not encrypted (but is part of the MAC, so can not be tampered with) - that's why they can see the spin bit.

Might it be better to give the network a bit more and higher-quality information about the congestion-related state of the flow, so network operators could debug problems?... maybe.

QUIC permissiveness

Posted Nov 29, 2023 15:15 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (4 responses)

The purpose of the multiple CIDs are only to break the tracking during a connection migration: let's say you start from 4G and you migrate to the local WiFi or conversely. Someone elsewhere on the path seeing the two connection flows has no useful information to correlate them (IP:ports differ, connection IDs have rotated), so for example nothing allows to figure it's the same person for both connections, hence it's not really possible to geolocate one client by correlating its multiple source IPs. Beyond that it doesn't bring much.

QUIC permissiveness

Posted Nov 29, 2023 16:13 UTC (Wed) by paulj (subscriber, #341) [Link] (3 responses)

Thing is, you have a shared secret - negotiated in crypto setup. You could also just use a derivation function to allow both sides to calculate the same set of N currently valid CIDs, and retirement can be automatically done via a sliding window.

No real need to have all this machinery to send messages to update and retire CIDs.

QUIC permissiveness

Posted Nov 29, 2023 16:51 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (2 responses)

Here it allows the client to decide when to switch. A sliding window would not exactly, unless the client indicates which key number it's using. But that info would need to pass one way or another and be detectable, I suspect that's the reason why it was not adopted. With that said, the machinery is not *that* complex, the server produces N CIDs that it's willing to recognize, the client is free to drop them if it does not intend to use them. It's not exactly something terribly difficult.

QUIC permissiveness

Posted Nov 29, 2023 16:59 UTC (Wed) by paulj (subscriber, #341) [Link] (1 responses)

You just update the window when the other side uses a new CID. (Maintain 1 pool for each side, as is done now). Yeah, the explicit message based stuff is not terribly difficult. But it's a bunch of extra pointless code on top of that (and at least some implementations are already using a derivation function from the shared secret to generate their CIDs!). Anyway, just a nit.

My real gripe though is that QUIC /also/ requires to support no-CID, 4-tuple. So 2 distinct ways required to demux the incoming packet and match up to and validate to the connection state (and /both/ ways require 4-tuple lookup and validation, just differently). Just... annoying.

QUIC permissiveness

Posted Nov 29, 2023 17:06 UTC (Wed) by paulj (subscriber, #341) [Link]

Oh, one other annoyance in QUIC. A network protocol, created in the 21st C, and the framing does /not/ have a length field. (And QUIC has variable length encoding!). So if you add a frame type, any tooling like wireshark will fail to parse a packet once that new frame type is there. Any other frames past that that it could recognise - tough.

Really annoying.

OpenSSL 3.2.0 released

Posted Nov 29, 2023 14:59 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (6 responses)

> Huh, wow. Some simple client that was just good enough for s_client for debug stuff, ok possibly (though, why not use a library?).

You just don't debug anything protocol-wise with a single stream in QUIC, and that library is totally irrelevant to the rest of the ecosystem for not even being interoperable (hence testable) with anything else. Other libs such as ngtcp2 which have been there from the beginning and have adapted to their users' needs are well-tested and permit everything and much more than s_client could ever do. Even some command-line tools such as picoquicdemo are more relevant since relying on a proven and testable implementation.

> But, if they put a lot of energy into this with the idea that it would be used by others for real stuff, as part of the OpenSSL lib? Weird :).

Except they have exactly zero idea what their users want. Their users already expressed that 4.5 years ago, en masse, these people were just thanked by a middle finger. The simple fact of saying "ok we showed in 3.x how we could fail when touching stuff we have no skills on, but instead of trying to fix it we'll now switch to something completely new for us, a QUIC implementation that you don't want" is a great indicator that they have no care for their users' needs. All main HTTP implementations now have their own QUIC stack so anything that could be done by the openssl team now will just be limited to s_client and nothing else. Pure waste of time, effort, energy and trust.

> Certainly, there are major implementations that are essentially atrocious on performance if used server side (OK on the client).

Oh it's possible, but not all of them. We managed to pull 260 Gbps out of haproxy's QUIC stack (22M pkt/s both directions) on a perfectly standard Linux network stack. That's not bad at all and at least it scales well!

> Which I think is why so many cloud companies have had to go and write their own implementation.

No that's not the reason at all, the protocol was designed so that it runs entirely in userland to speed up protocol version upgrade deployments, and as such, there is no uniform API to implement it the best way for each implementation. Depending on your event model you'll use a set of totally different mechanisms and that's perfectly fine. From this point it becomes difficult to both uniformize everything and keep performance, however some of the stacks that offload all the painful stuff for you are still usable, but with some necessary overhead.

> The protocol also has a couple of places where it allows too much, meaning supporting this leads to a bit of extra complexity and (on one certain thing) a /lot/ more worst-case performance overheads, when compared to TCP.

Absolutely, and that's precisely one of the reasons why this should only be implemented by those who have enough time to grow transport-layer skills and become experts on the matter, instead of being developed as yet-another activity by a crypto team who thought they could play with a new toy (and worse, let people believe this will eventually be usable).

OpenSSL 3.2.0 released

Posted Nov 29, 2023 17:16 UTC (Wed) by paulj (subscriber, #341) [Link] (2 responses)

260 Gb/s? Chapeau. ;)

I guess API could be another reason too. The (google) Quiche API looks OK, and I don't see an obstacle to integrating it with other main loops (it has its own abstraction). The code is a bit sprawling, as it common with C++ code-bases that like to use inheritance. But the slowness is a major deal-breaker from my perspective.

YMMV. ;)

OpenSSL 3.2.0 released

Posted Nov 29, 2023 18:53 UTC (Wed) by DemiMarie (subscriber, #164188) [Link] (1 responses)

I don’t think gQuiche is meant for serious server use. As you mentioned, it is used in clients and works fine there, and there are other stacks that are much faster on the server end.

OpenSSL 3.2.0 released

Posted Nov 30, 2023 10:32 UTC (Thu) by paulj (subscriber, #341) [Link]

Hard to know for sure, but my reading of talks Google have given on QUIC in their production use over the years is they probably used Quiche on the server side too. And they had the obvious performance issues, which they have optimised away to a large extent (still not as good as TCP in the last performance comparisons they've made available) - I assume optimised Quiche, rather than write a complete new QUIC.

There is some (very) low hanging fruit in Quiche wrt gaining performance. I'm surprised no one has fixed it. Hopefully I'll be allowed to send a patch at some point. ;)

OpenSSL 3.2.0 released

Posted Dec 18, 2023 17:50 UTC (Mon) by starox (subscriber, #168285) [Link] (2 responses)

We managed to pull 260 Gbps out of haproxy's QUIC stack (22M pkt/s both directions) on a perfectly standard Linux network stack

There is something that I don't understand. Doing simple math, it gives packet sizes in the GBytes order of magnitude. Is this really packets per seconds or "application connection streams" per second ?

OpenSSL 3.2.0 released

Posted Dec 18, 2023 20:06 UTC (Mon) by pizza (subscriber, #46) [Link] (1 responses)

260Gbps == 260000Mbps

260000Mbps / 22Mpps = ~11818 bits per packet = 1477 bytes per packet, which is a rounding error from the 1472 byte max payload of a TCP+UDP packet.

OpenSSL 3.2.0 released

Posted Dec 18, 2023 20:25 UTC (Mon) by starox (subscriber, #168285) [Link]

I ought to write it down to see that I forget one million factor ... Sorry for the noise.

Thanks !