Why not over the Internet?

Posted Nov 10, 2022 9:29 UTC (Thu) by epa (subscriber, #39769)
Parent article: Moving past TCP in the data center, part 2

"Homa is not suitable for wide-area networks (WANs)". Why not? If there is retrying at the application level it ought to work well enough.

Why not over the Internet?

Posted Nov 10, 2022 11:55 UTC (Thu) by james (subscriber, #1325) [Link] (3 responses)

As I read it, a very low round-trip time is a key part of the assumptions.

When a message needs to be sent to a receiver, the sender can transmit a few "unscheduled packets" immediately, but additional "scheduled packets" must each wait for an explicit "grant" from the receiver

which requires one round trip for each RPC larger than "a few" packets, then at least another one for the rest of the data and the response. If you have a suitable protocol running over TCP, the connection will already be set up and you can send all the data at once.

Also, I suspect that

A receiver can choose not to send grants if it detects congestion in its top-of-rack (TOR) switch; it can pause or slow the grants until that condition subsides

embeds the assumption that this is where congestion will occur.

Why not over the Internet?

Posted Nov 10, 2022 14:25 UTC (Thu) by joib (subscriber, #8541) [Link] (2 responses)

> > When a message needs to be sent to a receiver, the sender can transmit a few "unscheduled packets" immediately, but additional "scheduled packets" must each wait for an explicit "grant" from the receiver

Isn't this similar to the problem of window size scaling in TCP? I.e. in a network with a big bandwidth delay product, you need a bigger window in TCP, or more 'unscheduled packets' and more in-flight 'grants' in Homa?

> the assumption that this is where congestion will occur.

That might actually be an issue, yes. If the idea is to not use packet drops as a congestion signal, how is congestion somewhere in the middle of the path detected? Guess you would need some kind of BBR-style timing based congestion control?

Why not over the Internet?

Posted Nov 10, 2022 17:03 UTC (Thu) by james (subscriber, #1325) [Link]

It's somewhat similar, but the difference is that TCP as it is used today has long-lived connections to which you can attach things like window size.

From the article:

There is no long-lived connection state stored by Homa; once an RPC completes, all of its state is removed

including anything like window size. The next RPC (larger than a "few packets") has to discover that all over again.

Or so I understand.

Why not over the Internet?

Posted Nov 10, 2022 19:22 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

> That might actually be an issue, yes. If the idea is to not use packet drops as a congestion signal, how is congestion somewhere in the middle of the path detected? Guess you would need some kind of BBR-style timing based congestion control?

According to https://homa-transport.atlassian.net/wiki/spaces/HOMA/pages/262171/Concerns+Raised+About+Homa, this should not be a problem if different packets can be routed differently from each other.

In theory, you could still run out of bandwidth for the entire network overall, but that probably requires a much higher volume than is typical of a TCP-driven network, and should likely be dealt with using some combination of explicit provisioning and QoS (both of which are far more suitable to a closed DC-type network than to an open WAN).

Why not over the Internet?

Posted Nov 10, 2022 12:03 UTC (Thu) by Sesse (subscriber, #53779) [Link]

Random guesses (I haven't read the Homa paper): Receiver-initiated control doesn't work well when time scales are milliseconds and not microseconds (you very quickly run out of credits). Lack of encryption. More packet loss requires more sophisticated retransmit.

Why not over the Internet?

Posted Nov 10, 2022 12:11 UTC (Thu) by k3ninho (subscriber, #50375) [Link]

WAN is ??? about where each link goes and what routes are available. Datacentre cabling is very controlled and very specific, so you also guess at maximal link capability in designing a protocol for it; maybe the summary is that TCP survives the worst link-failure/link-capacity cases where Homa aims to maximise known-good links in known-good configurations.

K3n.