IPv6 segment routing

Posted May 20, 2017 9:22 UTC (Sat) by paulj (subscriber, #341)
In reply to: IPv6 segment routing by Cyberax
Parent article: IPv6 segment routing

MPLS (and ipv6sr) allows to work around that - you can have a central "route planning" system that uploads the full routing information into your edge routers and they basically encode it in every packet and the rest of the network can stay dumb and stateless.

The rest of the network can't stay stateless though. In order for the transmitting node to specify exactly the route it wants, it needs to have have a lot of information on the state network (there's not much source routing to do otherwise). The state of the network must be gathered and communicated around, by the network - which requires holding state. The network can never abstract that information either, if source routing is to be useful, so the bigger the network the worse this scales.

Source routing is powerful, but it comes with its own set of problems. As is often the case in networking and computer systems, we tend to oscillate back and forth and re-visit things. What is old becomes hot again. Distributed routing! Centralised! Next-hop routing! Source-routing! This has been going on for many decades... ;)

IPv6 segment routing

Posted Dec 21, 2017 3:16 UTC (Thu) by immibis (subscriber, #105511) [Link] (4 responses)

Exactly... the *rest* of the network (i.e. not the edge) can stay stateless.

IPv6 segment routing

Posted Dec 18, 2020 12:20 UTC (Fri) by paulj (subscriber, #341) [Link] (3 responses)

How does the network communicate the information about its state - as required to compute source-specified routes through it - to the edges, without holding state?

IPv6 segment routing

Posted Dec 18, 2020 20:38 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

In MPLS you have a stack of labels attached to every packet. Every router pops a label from the stack and does some action based on that.

You can literally encode all the route information in labels that are interpreted like this: "send the packet to port 8, pop label, send the packet to port 2, pop label, ...". And all this encoding can be done on the edge router so the inner routers don't have to know anything.

As you said, the problem is that "port 2" might not actually be working, so all the traffic is going to be blackholed. This is typically solved by having a separate control plane network that is used to communicate the state of individual routers. It can physically run on top of the same media, just with special labels that instruct the routers to do something like "decapsulate packet, and punt it to the CPU". The CPU then runs a classic IS-IS/OSPF stack and marks the outgoing packets with the same "decapsulate and punt" label.

One large cloud provider has a fully separate network for the control plane, it's not even using the same physical hardware to avoid any possibility of interference of customer data flows with the control plane.

IPv6 segment routing

Posted Dec 18, 2020 22:43 UTC (Fri) by paulj (subscriber, #341) [Link] (1 responses)

Yes, I'm familiar with the forwarding plane.

How do you build a distributed control-plane for it that can allow the 'core' to be dumb and at least scale well (i.e., the state grows at a less-than-linear rate, relative to growth in the number of nodes), if not stateless (which to my thinking, implies constant state at each node, regardless of the size of the node).

OSPF/IS-IS don't scale up arbitrarily. They need O(N . LogN + M) entries in each flooding domain, for N nodes with M links, and a further O(N.logN) amount of state for vectored destinations outside of flooding domains. RSVP-TE scales much worse again.

If you know how to make source-routing scale well, you should write a Ph.D. on it - or start a company.

IPv6 segment routing

Posted Dec 18, 2020 23:04 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> How do you build a distributed control-plane for it that can allow the 'core' to be dumb and at least scale well (i.e., the state grows at a less-than-linear rate, relative to growth in the number of nodes), if not stateless (which to my thinking, implies constant state at each node, regardless of the size of the node).
Individual data-plane routers just need a handful of rules (in the extreme, equal to the number of ports).

> OSPF/IS-IS don't scale up arbitrarily. They need O(N . LogN + M) entries in each flooding domain, for N nodes with M links, and a further O(N.logN) amount of state for vectored destinations outside of flooding domains. RSVP-TE scales much worse again.
You need OSPF/ISIS only for the control plane nodes, and you don't need many of them. Perhaps tens of thousands even for the extremely large networks (e.g. an Amazon AWS region). This amount of state can be easily managed through OSPF/ISIS flooding over gigabit-range links.

You also don't need to do that in hardware (so no worries about TCAM capacity), purely software routing is fine.

> If you know how to make source-routing scale well, you should write a Ph.D. on it - or start a company.
I did look into it, but it's hard. You might have maybe several dozens of very large potential customers and it's a very hard market to get into. Every customer will require a lot of custom integration with their system, and for a startup it's just not feasible.