Evolving Data Center Network from Cloud-Ready to Cloud-Native

Parantap Lahiri, VP, Network and Datacenter Engineering, eBayMonolithic applications running on dedicated servers have evolved into cloud-native microservices that freely scale on the cloud. However, in the enterprise space, more often than not, ‘networking’ has acted as a necessary evil that slows down progress and results in unplanned outages. Interestingly enough, networks can quickly become an asset from liability if used properly. Let’s take a deeper look.

Networking in the enterprise domain grew up to provide connectivity between office buildings and provide access to the Internet and corporate services within data centers. Since many enterprise applications came as third party software, physical networks had to facilitate and enforce segmentation and security needs. The networks were complex, inconsistent, and fragile, with heavy dependence on a set of in-house support staff as well as dedicated engineers from vendors.

Other than configuration inconsistency and failures due to lack of change rigor, the primary factor that contributed to this fragility was the inherent weakness of protocols like “spanning-tree” that were used to ensure a loop-free forwarding path for switching domains. Layer 2 switching domains were needed to support VLAN (Virtual Local Area Network), which has been an integral part of most enterprise networks to ensure IP mobility, enforcing firewalls as the default gateways, etc. These domains frequently suffered from broadcast storms that melted the networks due to loop creation. More so, loop-free requirements created topologies that resulted in congestion.

Fast-forwarding a little, when companies went into delivering online services, many of the same enterprise networks were used to host services. For online services that are expected to be available for at least 99.99% of the time, these enterprise network designs have been a fundamental misfit.

Now, to take this discussion one level deeper, let’s analyze the impact of a typical enterprise network on cloud-ready workloads first and then on cloud-native workloads. Broadly speaking, cloud-ready workloads are monolithic applications that are packed into a virtual machine (VMs) as opposed to the original bare metal servers.
In environments with a strong dependency on their incumbent physical network, such cloud-ready VMs have simply slid in to replace physical servers. In many cases, the VMs simply connect to the VLANs, and the overall online service depends on security and load-balancing services through underlay networks. There have been continuous efforts to use the network, firewall, and load-balancing automation to cater to the agility needs of such services.

However, cloud-native workloads are challenging this very premise. By definition, cloud-native workloads are formed out of applications getting decomposed into microservices. These microservices get automatically deployed, scaled, and managed as containers through orchestration systems like Kubernetes, etc. Now the question becomes, in an environment with a heavy dependency on incumbent physical network services, should the cloud-native applications take any dependency on the services provided by the underlay network, or technology leaders should act cautiously and ensure proper decoupling of dependency.

To get deeper into this discussion, it is important to understand how the cloud-native workloads and orchestration systems have evolved themselves. Orchestration systems, along with other approaches like service-mesh, etc. are continually advancing to take care of many other application needs beyond the simple placement of the containers on appropriate nodes.

Firstly, they are facilitating a lot more granular implementation of security controls that go beyond simple enforcement on protocol types and ports, and secondly, they are facilitating advanced capabilities to balance and distribute workload sessions. These capabilities are implemented on the server themselves in a highly scaled out and well-managed way enabling end-to-end encryption in many cases. So bringing some of those complexities back into the underlay network is somewhat redundant. To put it more bluntly, the traditional enterprise network controls should get out of the cloud-native way. They can definitely manage the legacy environments in case of a brownfield situation and enforce some base layer security controls, but that’s where they should draw the line.

Now, to discuss the actual needs of cloud-native services, the workloads along with supporting services like Hadoop, AI/ML and distributed storage, etc., ideally want unlimited server-to-server east-west capacity from the network. They also need a quick correlation of the network issues and application issues while diagnosing service impairments.

In order to cater to the needs of cloud-native applications, many modern cloud scale data centers have built dense mesh Layer 3 routed networks using mainly simple and standardized protocols like BGP (Border Gateway Protocol)

Thus, in order to cater to the needs of cloud-native applications, many modern cloud scale data centers have built dense mesh Layer 3 routed networks using mainly simple and standardized protocols like BGP (Border Gateway Protocol). Typically the platforms used in these networks are based on commoditized chipsets that provide high-bandwidth at very reasonable cost points. Automation focus is placed on producing consistent build standards along with automated isolation of network degradation and remediation. Technology leaders have been guarding these networks against taking on unnecessary complexity to ensure that each domain delivers the right services for the right reasons. Interestingly, building the high-volume interconnect capacity has impacted the networking budget favorably and made the network a lot more robust.

To summarize, the journey to cloud-native infrastructure entails not only decomposing applications into microservices running on containers but also looking at the infrastructure in a holistic way. Entrusting the orchestration systems to manage the granular security and session controls and letting the underlay network provide speeds and feeds could be the winning formulae.