Decentralized SDN, Democracy Net!

Zartbot
5 min readAug 21, 2020

Software-defined networking (SDN) technology is an approach to network management that enables dynamic, programmatically efficient network configuration in order to improve network performance and monitoring.

Almost commercial SDN systems has a centralized controller. It runs as an autocratic leader with a strong, top-down leadership approach. The autocratic leadership style also brings with it a faster decision-making process, which benefits the network orchestration and maintainese. Since only one controller at the top is responsible for all major decisions, things tend to move more quickly in this type of environment.

But at a large scale, this kind of centralized autocratic approach get the performance bottom neck. Even controller could scale out as a clusters, it may works well in Campus LAN or Datacenter network, but it may failed for SDWAN scenario, unstable control channel, high latency, link failure may cause network configuration inconsistent.

The privacy concerns and WAN bandwidth limitation cause the controller can not retrive much telemetry data from enviroment, and it’s very hard for the controller make decisions for hundreds of thousands of edge devices.

Distributed consistency algorithm (e.g. Paxos/Raft) gives us a different approach to design and evolve the next generation SDN design and especially for SDWAN. It’s more like a democracy society that give the network element a choice, but also defined some laws let them follow.

The device is much smarter than before to construct such democracy community, just think the packet route worked as a smart phone with auto-drive car.

A great idea was born in my mind, If we install a VPN software on our endpoint and abstract it as a router linecard ? Then we have a routing protocol and segment routing over internet, we can build a new SDN solution, more democracy, more efficient, isn’t it ?

So we start to use ETCD build a prototype called “Ruta”, it just has nearly the same sound of “Router”. It has fabric node at public cloud to provide more stable connectivity than internet since the cloud provider may optimize their WAN link bandwidth.

At the same time, the linecard has different types, it could be an VPN software on a host, it could be an CNI for container, and it could be an embedded software on a legacy router/wireless access point.

A distributed KV store based routing protocol is designed to handle this situation. you may ask why not BGP? BGP has slower convergence time, the fundamental problem for a routing protocol is distributed consistency problem. Internet bandwidth is much more than 3 decades ago, we could use gRPC encode routing message rather than write some highly compressed TLV.It’s much more easier for us to extend protocol in the future

We use ETCD as our KV store , and these node were placed in some core network nodes. Each of the network element could be an etcd proxy to help other device connect to the ETCD store via link local address.

With this approach, the devices could construct a network by themselves. We also defined link discovery and link state protocol by ETCD service discovery approach. all the link performance could be measured by a TWAMP like protocol and distributed in ETCD.

Finally , all the linecard node may get know the entire network linkstate:

For the overlay routing , we leaverage BGP-EVPN based approach , and provide Type-2 and Type-5 route update to ETCD. and the nexthop towards a node name. We also use a SRoU Locator describe the node.

Network operator also could defined some laws to represent their intent in this system, policies were distributed in the KV store, we do not need any YANG model/ GNMI etc… to provision the device. All device just sync the laws and follow the laws, more democracy ?

This is all about the control plane design. At the same times, how to select route over an internet ? VXLAN with NSH may has complex encoding schema, SRv6 does not support IPv4. Some of the RFC write about the MPLS over UDP or SR with MPLS 24bit label over UDP, but it does not support NAT.

We create an encapsulation called SRoU(Segment Routing over UDP).

It consistent follow the SRv6 and support both IPv4 and IPv6 . If we says MPLS is a Layer 2.5 protocol. SRv6 maybe Layer 3.5 protocol. The SRoU is a Layer 4.5 protocol.

It gets huge benefits from SRv6 especially from network programming capabilities. we can defined end.X to support VPN services and replace VXLAN. even more , the endpoint may use a socket control their path !

Consider if you are an developer of Webex /Zoom, you can directly set your udp session with options to enforce the path!

The SRoU initially designed to give segment routing capability for userspace software and interworking with QUIC to provide extremely flexibility for next generation internet.

We’ve build a working prototype and deploy it in 20 public cloud available zone all over the world (Sanjose/Virginia, Frankfurt, London,Moscow,Dubai,Shanghai, Hongkong,Tokoyo, Sydney)

Now we can let the linecard select best route by themselves which give me access any node in the world with less than 200ms latency!

In the future , this kind of architecture could be integrated with SDP to provide the next generation network system.

Endpoints are much more smarter than before, we have smart phone, smart NIC, anything is smarter than 3 decades ago.Why not build a democracy network system rather than a cerntalized autocracy approach ?

--

--