OpenSSL 3.2.0 released
Posted Nov 26, 2023 21:40 UTC (Sun)
by Kamilion (subscriber, #42576)
[Link] (43 responses)
Oh well, at least it released "Early enough" for it to have some consideration to land in Noble.
Posted Nov 26, 2023 21:46 UTC (Sun)
by NightMonkey (subscriber, #23051)
[Link] (1 responses)
Change is the one thing that is a safe bet. Cheers.
Posted Nov 26, 2023 22:18 UTC (Sun)
by Kamilion (subscriber, #42576)
[Link]
It's more the fact that the express stated intent of the transition to 3.x was to reduce the combinatorial explosion of stack-your-own-lego-towers.
Hopefully these added primitives are assemblies that 'work well', and not just a tub of legos where you just can't seem to find another red brick when you need it.
Posted Nov 27, 2023 9:50 UTC (Mon)
by DemiMarie (subscriber, #164188)
[Link] (40 responses)
Posted Nov 27, 2023 13:30 UTC (Mon)
by paulj (subscriber, #341)
[Link] (39 responses)
Posted Nov 27, 2023 18:55 UTC (Mon)
by wtarreau (subscriber, #51152)
[Link] (38 responses)
And all those using QUIC (which provides a reliable transport layer over datagrams, i.e. does not compare at all with TLS) already use either their own implementation or available ones. It takes about 3 man-years to write your own QUIC stack and even when you end up there you still don't have the best of the world, let alone a trivially portable library. So it will take a huge amount of time before they can reach something barely usable by a tool such as Curl.
Then there is the problem of performance. One reason there are so many implemlentations is that different products have different constraints (threading, event loops, malloc vs no-malloc etc). There's no way a one-size-fits-all lib will be usable anywhere performance matters, so that's another reason for the uselessness of this development. And if you count the fact that those in charge of the architecture are the same who accidentally divided the lib's performance by 400 without even noticing (since until a few months ago they were not even testing their changes), you cannot expect anything that you would even recommend to your ennemies.
No really that's a pure waste of time and energy and during this time the support for the last working openssl version (1.1.1) is fading away. Fortunately there are others like wolfssl and aws-lc who are trying to provide working alternatives but the situation is still too cahotic for anyone to be able to claim "have fun with your toy, guys, we don't need openssl anymore".
My personal feeling is that they probably needed to present new features to justify salaries to the foundation, because "oops we messed up and we need one year to fix all this" doesn't sell well, nor does "we'll finally merge the quictls patchset that we've been wiping our asses with for 4 years", which was THE very important thing to do.
Before the QUIC fiasco, we really intended to dedicate some manpower to help them work on the performance issues. But for what given that this lib is now dead, on purpose by its maintainers ? No QUIC, no future. It's sad and it's a mess as there are so many projects relying on it... (well, on its API).
Posted Nov 28, 2023 11:04 UTC (Tue)
by paulj (subscriber, #341)
[Link] (37 responses)
FWIW, wrt QUIC and performance. QUIC is possibly misnamed... Certainly, there are major implementations that are essentially atrocious on performance if used server side (OK on the client). Which I think is why so many cloud companies have had to go and write their own implementation. The protocol also has a couple of places where it allows too much, meaning supporting this leads to a bit of extra complexity and (on one certain thing) a /lot/ more worst-case performance overheads, when compared to TCP.
Posted Nov 28, 2023 11:06 UTC (Tue)
by DemiMarie (subscriber, #164188)
[Link] (16 responses)
Posted Nov 28, 2023 11:27 UTC (Tue)
by paulj (subscriber, #341)
[Link] (15 responses)
Posted Nov 28, 2023 11:47 UTC (Tue)
by paulj (subscriber, #341)
[Link] (14 responses)
This is a major reason why QUIC server side code costs a lot more than TCP on ACK processing. And it's very easy to make a small change and blow your performance out by 100x.
I havn't seen clear evidence that the unbounded ACK ranges of QUIC have given it an improvement on congestion control, to make up for the data-structure costs. Most people who measure find HTTP/QUIC to have lower throughput than HTTP/TCP. Usually they look to low-level UDP I/O optimisations to regain performance (UDP GRO, mmsg, etc.), but I think the protocol itself has some complications that require extra care to get performance. Which not all implementations have.
LsQuic is pretty fast though - much faster than Google Quiche - server side. Though, still not as efficient as kernel TCP.
Posted Nov 29, 2023 12:35 UTC (Wed)
by DemiMarie (subscriber, #164188)
[Link] (12 responses)
Posted Nov 29, 2023 14:07 UTC (Wed)
by paulj (subscriber, #341)
[Link]
You just need an implementation that is /super/ careful about both the /design/ and implementation of the state tracking code to handle those things.
If you have that, how much difference could a kernel implementation make? I don't know. Personally, I think with io_uring and segment overload there isn't much difference to be had on I/O costs.
One place a kernel implementation can win is that (at least some) handling of ACKs can be done in soft-IRQs, from any user context - no need to switch to the specific user context, or switch from kernel to user mode. So that would probably get some performance benefit - not just in CPU, but, e.g., also in terms of reduced jitter in sending ACKs for received data, which would improve the congestion control behaviour of the protocol and potentially squeeze out a little bit more performance from the network.
But hard to say. ;)
Posted Nov 29, 2023 15:09 UTC (Wed)
by wtarreau (subscriber, #51152)
[Link] (10 responses)
Regarding the choice of language, that's interesting. Low levels like this require cross-references between many elements (packets, ack ranges, rx buffers etc) so writing them in too strict languages either would require a lot of unsafe sections (hence no benefit) or invent a complex and expensive model to track everything. Given that the first concern about packet-based protocols like this is the processing cost, a more complex and more expensive implementation could very possibly become its own security issue by being easier to DoS. Such a design must not be neglected at all and there is absolutely no room for compromise between performance and safety here, you need to have both, even if the second one is only guaranteed by the developer.
Posted Nov 29, 2023 15:45 UTC (Wed)
by paulj (subscriber, #341)
[Link] (6 responses)
DPDK is not an option at all for many use-cases (mobile, shared servers, containers, VMs, etc..) - also an energy muncher in the typical busy-loop, poll-driven use. I think it is meant to support interrupts now though, not kept up with how well that works. (??)
I agree on the language thing.
Posted Nov 29, 2023 16:54 UTC (Wed)
by wtarreau (subscriber, #51152)
[Link] (5 responses)
DPDK is not interesting for regular servers, but network equipment vendors (DDoS protection, load balancers etc) need to cram the highest possible performance in a single device and they already use that extensively.
Posted Nov 29, 2023 18:49 UTC (Wed)
by DemiMarie (subscriber, #164188)
[Link] (1 responses)
Posted Dec 1, 2023 6:06 UTC (Fri)
by wtarreau (subscriber, #51152)
[Link]
No. We gave it a try 10 years ago for anti-ddos stuff and we found that it was much more efficient to implement it early in the regular driver (hence the NDIV framework I created by then, presentation here: https://kernel-recipes.org/en/2014/ndiv-a-low-overhead-network-traffic-diverter/ ). Recently we ported it to XDP, losing a few optimizations but apparently recent updates should allow us to recover them. And that way we don't have to maintain our patches to these drivers anymore.
The reason why solutions like netmap and DPDK are not interesting in our case is that we still want to use the NIC as a regular one. With these frameworks, you lose the NIC from the system so it's up to the application to forward packets in and out using a much slower API (we tried). DPDK is very interesting when you process 100% of the NIC's traffic inside the DPDK application, and for TCP you'd need to use one of the available TCP stacks. But I still prefer to the use kernel's stack for TCP, as it's fast, reliable and proven. It's already possible for us to forward 40 GbE of L7 TLS traffic on an outdated 8th gen 4-core desktop CPU, and 100 GbE on an outdated 8-core one. DPDK would allow us to use even smaller CPUs but there's no point doing this, those who need such levels of traffic are not seeking to save $50 on the CPU to reuse an old machine that will cost much more on the electricity bill! Thus when you use the right device for the job, for L7 proxying these frameworks do not bring benefits.
Posted Dec 1, 2023 11:53 UTC (Fri)
by paulj (subscriber, #341)
[Link] (2 responses)
The packets then all have to be same size, but that's the common case when sending trains of max-size packets.
So basically with GSO + sendmmsg you can send:
Time t_1:
You can send a CWND worth of packet trains to many destinations, with the packets to each destination correctly spaced out into smaller bursts to be network congestion-control friendly. All in 1 syscall.
Posted Dec 1, 2023 11:56 UTC (Fri)
by paulj (subscriber, #341)
[Link] (1 responses)
Posted Dec 1, 2023 13:09 UTC (Fri)
by wtarreau (subscriber, #51152)
[Link]
Posted Nov 29, 2023 18:47 UTC (Wed)
by DemiMarie (subscriber, #164188)
[Link] (2 responses)
Posted Nov 30, 2023 0:25 UTC (Thu)
by wahern (guest, #37304)
[Link] (1 responses)
This actually seems like a solid example of how to best make use of Rust's strengths, admitting some of its deficits as a standalone language or for writing soup-to-nuts frameworks.
Posted Dec 1, 2023 11:35 UTC (Fri)
by paulj (subscriber, #341)
[Link]
You want to avoid them doing the actual I/O, you want to avoid coding them to any specific event library. So they generally end up having 2 sets of interfaces: a) The direct API the user calls into the library with, to supply inbound packets, trigger timing events, etc.; b) The indirect API by which the library calls out to and outputs it's work back to the user, e.g. to send packets, to setup a timer event, etc - i.e. a set of callbacks the user supplies in setup, using the direct API.
Google Quiche (yay, multiple projects in the QUIC space have the same name!) and LsQuic have the same pattern.
Posted Nov 29, 2023 16:09 UTC (Wed)
by paulj (subscriber, #341)
[Link]
However, that stack was over an order of magnitude higher in CPU costs compared to the likes of LsQuic, and scales worse. So even if I rerun those comparisons with that tweak, gQuiche will still not be very good (ACK intensive side - i.e. server side).
Posted Nov 28, 2023 11:07 UTC (Tue)
by DemiMarie (subscriber, #164188)
[Link] (12 responses)
Posted Nov 28, 2023 11:36 UTC (Tue)
by paulj (subscriber, #341)
[Link] (11 responses)
The point of all this complexity is to try prevent middle-boxes correlating paths to connections, to try enhance privacy. Which is something it can not at all guarantee or do well. If you have this level of privacy requirements, you need Tor - not this network-visible-CID rotating stuff in QUIC, which is giving you only minimal privacy guarantees, at best.
Posted Nov 28, 2023 11:52 UTC (Tue)
by farnz (subscriber, #17727)
[Link] (10 responses)
I was under the impression that the goal is not to enhance privacy specifically; rather, it's to prevent the situation we see in TCP, where cheap middle-boxes drop or modify packets that don't conform to their idea of what TCP "should" look like, making things like MPTCP a pain to design and deploy. This has effects like middle-boxes changing parts of the packet to get specific congestion control behaviour out of TCP on the assumption that a specific algorithm is in use, making it hard to change the algorithm.
Thus, it doesn't need to be a privacy guarantee - it just needs to be work that the end-points can do, but that middle-boxes will struggle to do, so that middle-boxes don't interfere with QUIC expecting certain behaviours from it.
Posted Nov 28, 2023 12:58 UTC (Tue)
by paulj (subscriber, #341)
[Link] (9 responses)
The CID is there to give a packet demux ID independent of the 4-tuple - so a connection can survive a NAT change. "Ah, but that means a middle-box could know that different end-points were in fact the same!" - so the CID rotation stuff is added. Except an observer in the middle will get to see the old CID on the changed 4-tuple anyway - before the end-points see. Rotating CIDs there after doesn't give any great privacy benefit.
There isn't any need to change it in QUIC now that it's there, but I also think it was... slightly over complicated for minimal benefit.
Posted Nov 28, 2023 19:40 UTC (Tue)
by riking (guest, #95706)
[Link] (3 responses)
Posted Nov 29, 2023 13:58 UTC (Wed)
by paulj (subscriber, #341)
[Link] (2 responses)
I'm in 2 minds about the loss of insight into performance of transport flows with QUIC. With TCP you can capture and make nice sequence graphs showing exactly what's going on from a network POV. With QUIC, that is lost - unless you have the private key. Which a network operator will not have, and which even the application owner generally will not retrospectively have. It's a real shame to lose that insight. On the other hand, it's nice to make the transport opaque.
Does QUIC have the balance right? I don't know.
Posted Nov 29, 2023 14:32 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (1 responses)
The network operators have demonstrated that if they have the private key, they will misuse it (as they have misused the ability to tamper with TCP traffic beyond port numbers changing in a NAPT). It's just a shame that applications don't make it easy to record the private keys for retrospective analysis by the owner of an endpoint.
Posted Nov 29, 2023 15:02 UTC (Wed)
by paulj (subscriber, #341)
[Link]
Might it be better to give the network a bit more and higher-quality information about the congestion-related state of the flow, so network operators could debug problems?... maybe.
Posted Nov 29, 2023 15:15 UTC (Wed)
by wtarreau (subscriber, #51152)
[Link] (4 responses)
Posted Nov 29, 2023 16:13 UTC (Wed)
by paulj (subscriber, #341)
[Link] (3 responses)
No real need to have all this machinery to send messages to update and retire CIDs.
Posted Nov 29, 2023 16:51 UTC (Wed)
by wtarreau (subscriber, #51152)
[Link] (2 responses)
Posted Nov 29, 2023 16:59 UTC (Wed)
by paulj (subscriber, #341)
[Link] (1 responses)
My real gripe though is that QUIC /also/ requires to support no-CID, 4-tuple. So 2 distinct ways required to demux the incoming packet and match up to and validate to the connection state (and /both/ ways require 4-tuple lookup and validation, just differently). Just... annoying.
Posted Nov 29, 2023 17:06 UTC (Wed)
by paulj (subscriber, #341)
[Link]
Really annoying.
Posted Nov 29, 2023 14:59 UTC (Wed)
by wtarreau (subscriber, #51152)
[Link] (6 responses)
You just don't debug anything protocol-wise with a single stream in QUIC, and that library is totally irrelevant to the rest of the ecosystem for not even being interoperable (hence testable) with anything else. Other libs such as ngtcp2 which have been there from the beginning and have adapted to their users' needs are well-tested and permit everything and much more than s_client could ever do. Even some command-line tools such as picoquicdemo are more relevant since relying on a proven and testable implementation.
> But, if they put a lot of energy into this with the idea that it would be used by others for real stuff, as part of the OpenSSL lib? Weird :).
Except they have exactly zero idea what their users want. Their users already expressed that 4.5 years ago, en masse, these people were just thanked by a middle finger. The simple fact of saying "ok we showed in 3.x how we could fail when touching stuff we have no skills on, but instead of trying to fix it we'll now switch to something completely new for us, a QUIC implementation that you don't want" is a great indicator that they have no care for their users' needs. All main HTTP implementations now have their own QUIC stack so anything that could be done by the openssl team now will just be limited to s_client and nothing else. Pure waste of time, effort, energy and trust.
> Certainly, there are major implementations that are essentially atrocious on performance if used server side (OK on the client).
Oh it's possible, but not all of them. We managed to pull 260 Gbps out of haproxy's QUIC stack (22M pkt/s both directions) on a perfectly standard Linux network stack. That's not bad at all and at least it scales well!
> Which I think is why so many cloud companies have had to go and write their own implementation.
No that's not the reason at all, the protocol was designed so that it runs entirely in userland to speed up protocol version upgrade deployments, and as such, there is no uniform API to implement it the best way for each implementation. Depending on your event model you'll use a set of totally different mechanisms and that's perfectly fine. From this point it becomes difficult to both uniformize everything and keep performance, however some of the stacks that offload all the painful stuff for you are still usable, but with some necessary overhead.
> The protocol also has a couple of places where it allows too much, meaning supporting this leads to a bit of extra complexity and (on one certain thing) a /lot/ more worst-case performance overheads, when compared to TCP.
Absolutely, and that's precisely one of the reasons why this should only be implemented by those who have enough time to grow transport-layer skills and become experts on the matter, instead of being developed as yet-another activity by a crypto team who thought they could play with a new toy (and worse, let people believe this will eventually be usable).
Posted Nov 29, 2023 17:16 UTC (Wed)
by paulj (subscriber, #341)
[Link] (2 responses)
I guess API could be another reason too. The (google) Quiche API looks OK, and I don't see an obstacle to integrating it with other main loops (it has its own abstraction). The code is a bit sprawling, as it common with C++ code-bases that like to use inheritance. But the slowness is a major deal-breaker from my perspective.
YMMV. ;)
Posted Nov 29, 2023 18:53 UTC (Wed)
by DemiMarie (subscriber, #164188)
[Link] (1 responses)
Posted Nov 30, 2023 10:32 UTC (Thu)
by paulj (subscriber, #341)
[Link]
There is some (very) low hanging fruit in Quiche wrt gaining performance. I'm surprised no one has fixed it. Hopefully I'll be allowed to send a patch at some point. ;)
Posted Dec 18, 2023 17:50 UTC (Mon)
by starox (subscriber, #168285)
[Link] (2 responses)
Posted Dec 18, 2023 20:06 UTC (Mon)
by pizza (subscriber, #46)
[Link] (1 responses)
260000Mbps / 22Mpps = ~11818 bits per packet = 1477 bytes per packet, which is a rounding error from the 1472 byte max payload of a TCP+UDP packet.
Posted Dec 18, 2023 20:25 UTC (Mon)
by starox (subscriber, #168285)
[Link]
Thanks !
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
DPDK uses
DPDK uses
OpenSSL 3.2.0 released
- burst x_1 to dest x
- burst y_1 to dest y
<etc>
Time t_2:
- burst x_2 to x
- <etc>
....
Time t_n:
<etc>
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
Cloudflare’s QUIC implementation is written in Rust and powers their edge network, so I’m not concerned about Rust being too slow.
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
QUIC permissiveness
QUIC permissiveness
QUIC permissiveness
QUIC permissiveness
QUIC permissiveness
QUIC permissiveness
QUIC permissiveness
QUIC permissiveness
QUIC permissiveness
QUIC permissiveness
QUIC permissiveness
QUIC permissiveness
QUIC permissiveness
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released
We managed to pull 260 Gbps out of haproxy's QUIC stack (22M pkt/s both directions) on a perfectly standard Linux network stack
There is something that I don't understand.
Doing simple math, it gives packet sizes in the GBytes order of magnitude.
Is this really packets per seconds or "application connection streams" per second ?
OpenSSL 3.2.0 released
OpenSSL 3.2.0 released