|
|
Subscribe / Log in / New account

Moving past TCP in the data center, part 1

Moving past TCP in the data center, part 1

Posted Nov 2, 2022 4:13 UTC (Wed) by willy (subscriber, #9762)
In reply to: Moving past TCP in the data center, part 1 by Cyberax
Parent article: Moving past TCP in the data center, part 1

A flow-control-free protocol is not exactly a new idea. Here's IL from plan9 (circa 1993; the "4th edition" in the URL is misleading): http://doc.cat-v.org/plan_9/4th_edition/papers/il/

They still saw an advantage to maintaining a connection in order to manage reliable service. I don't know that was the right choice, but I'm looking forward to reading about Homa's design decisions.


to post comments

Moving past TCP in the data center, part 1

Posted Nov 2, 2022 4:50 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

Even Homa is excessive for most cases. Way back when I worked in a large cloud company, we did some experiments with UDP-based transport for RPC.

The idea was dead simple: just throw away EVERYTHING.

The network layer simply used fixed-size jumbo Ethernet frames and a request could contain up to 4 of them. The sender simply sent them one by one. No retransmissions or buffering, so the code was dead simple.

The receiver simply reassembled the frames into a buffer. Since the frame sizes were fixed, only 1 data copy was necessary (or none, if a request fits into 1 packet). No NAKs, ACKs or anything was needed, just a simple timer to discard the data and fail the request in case of a timeout due to a lost packet.

Everything else was handled on the upper levels. E.g. a packet loss simply got translated into a timeout error from the RPC layer. Instead of a network-level retry, regular retry policy for the service calls was used. It worked surprisingly well in experiments, and actually had a very nice property of making sure that the network congestion pressure rapidly propagates upstream.

Moving past TCP in the data center, part 1

Posted Nov 2, 2022 9:21 UTC (Wed) by amboar (subscriber, #55307) [Link]

This is roughly the strategy used by the DMTF's MCTP protocol for intra-platform communication between devices. It's kinda impressive in its simplicity and effectiveness

Moving past TCP in the data center, part 1

Posted Nov 2, 2022 12:36 UTC (Wed) by paulj (subscriber, #341) [Link] (3 responses)

Sending back-to-back frames - at a low enough level to ensure no other station could try send in-between - was a trick SGI used in IRIX NFS to get performance - I think I remember reading this in comments on LWN before, perhaps in a reply to you mentioning the same thing before. :)

NFS was of course RPC over UDP based too.

Moving past TCP in the data center, part 1

Posted Nov 2, 2022 21:12 UTC (Wed) by amarao (guest, #87073) [Link] (2 responses)

I remember time people said that udp for NFS is back and we should switch to tcp. Had something changed?

Moving past TCP in the data center, part 1

Posted Nov 2, 2022 21:32 UTC (Wed) by joib (subscriber, #8541) [Link]

https://www.man7.org/linux/man-pages/man5/nfs.5.html#TRAN... has a discussion about the pitfalls of NFS over udp.

Should also be noted that current Linux kernels actually no longer support NFS over UDP.

Moving past TCP in the data center, part 1

Posted Nov 3, 2022 9:52 UTC (Thu) by paulj (subscriber, #341) [Link]

NFS has switched to TCP since /ages/ ago. However, in the beginning (and for quite a while) NFS ran its RPC protocol over UDP. Just, without any real flow-control, and poor loss handling. TCP provides flow-control and better loss-handling - though, still not great, (because of the impedance mismatch of TCP knowing only about the data as a byte-stream, and not being able to take the app. message boundaries into consideration), as per this talk.

Moving past TCP in the data center, part 1

Posted Nov 10, 2022 1:06 UTC (Thu) by TheJosh (guest, #162094) [Link]

Did this ever get used in production?


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds