Should this be implemented in endpoints at all?
Should this be implemented in endpoints at all?
Posted Jan 3, 2025 17:34 UTC (Fri) by john_ousterhout (guest, #175303)In reply to: Should this be implemented in endpoints at all? by buck
Parent article: The Homa network protocol
Posted Jan 3, 2025 22:29 UTC (Fri)
by bvanassche (subscriber, #90104)
[Link] (1 responses)
Infiniband has just about all of the performance problems of TCP when it comes to congestion control etc.. The only advantage of Infiniband is that people like Mellanox built really nice NICs for it that bypass the kernel.
Is there any scientific paper that backs the above statement about congestion? Multiple papers have been published about how to handle congestion in datacenter RDMA networks. Two examples:
Posted Jan 6, 2025 17:10 UTC (Mon)
by paulj (subscriber, #341)
[Link]
Congestion control is a bit easier on low-hop, tightly controlled networks - i.e. DCs - but even there it is not solved. Fairness across different kinds of CC in particular is a bitch, as is fairness across flows with very different RTTs and/or BDPs. E.g., congestion controller might work great competing with low-latency, fast connections (i.e. intra-DC), but have issues with fairness competing with flows with different properties, like much higher RTT (e.g., cross-region DC to DC). It's clearly not at all an easy problem.
Posted Jan 4, 2025 6:35 UTC (Sat)
by buck (subscriber, #55985)
[Link]
Your reply was, by contrast, most gracious (I say as someone who has no emotional attachment to Infiniband design [grin]).
But since I can't withdraw my comment (I think), I at least fixed the Subject of this reply to reflect what my provocative question really was. (I certainly didn't mean to exclude NICs as an implementation target, which are probably considered, by anybody's definition, part of an "endpoint", or maybe even an "endpoint" in their own right, if they are smart NICs/"DPUs".)
That said, if you are being gracious enough to give your code away, it's not my business to question what use the rest of the world may find to make of it. Clearly it has found plenty of use for Raft, TCL, etc.
Should this be implemented in endpoints at all?
Should this be implemented in endpoints at all?
Should this be implemented in software at all?