4 September 2020

Network infrastructure planning

The modern, interoperable DC - Part 2: EVPN as a universal solution for VM, container and BMS networking (video)

1 minute reading

The modern, interoperable DC - Part 2: EVPN as a universal solution for VM, container and BMS networking (video)

This video is a part of our series "The modern, interoperable DC", which walks you through connectivity between different types of resources.

In Part 1 we guided you through a solution for DC connectivity based on a combination of FRR, Unnumbered BGP (IPv4 with IPv6 link-local NH) and eBPF.

Now it’s time for Part 2!

  • In this video we will continue to enhance our DC with additional features made possible by open standards.

  • We will show how to leverage a BGP router running on servers to provide layer 2 connectivity between heterogeneous resources, such as virtual machines, containers and bare metal servers (both legacy and FRR-based).

  • You will also learn about CNI (Container Network Interface) and how it can be integrated with FRR in order to automatically advertise information about IPs and MACs of newly created containers.

  • The speakers will also explain:

    • What VXLAN tunnels are and how they carry layer 2 traffic
    • What an Ethernet VPN is and how it can be useful in DC
    • How to provide multi-tenancy through the use of VRFs and tunnelling protocols
    • How to interconnect VMs, containers, BMS and other resources through an IP fabric
  • We will present a demo showing how the solution works in practice.

The source code, topology and configurations used during this presentation are available at Github repo.

Hello and welcome to today's webinar for those who have missed the previous part. My name is Adam and my name is Jerzy and we both work at CodiLime. Today, we'll introduce you to the so-called Ethernet VPN and how it can be used to interconnect different resources at the data center. But first, a few words about CodiLime  CodiLime was founded in 2011 and now we have more than 200 people on board. We are no longer a startup, that's obvious, but we try to keep its spirit, a culture of agility, innovation and adaptability. And most of our team is located in Warsaw and Gdansk. However, we work with clients in six different time zones, so part of our team is also located there. With all that said, we often work with data centers that are deployed in spine and leaf fashion. That experience got us thinking is that architecture final? Can it be improved? What can be done to speed up deployments, speed up change and speed up things overall? The previous, and today's presentation is all about that. A lot of technology will be used during this presentation and demo. However, do not be afraid. They are not new. They are open source, they are well known. We are just connecting them all together to create something better, something newer. Right. So let's start by revising the set of mechanisms that we introduced during the first webinar as this is something that we will build upon and extend today. So what we did is we used the so-called IPv6 link local addresses, allowing us to automatically assign IP addresses for the interfaces, interconnecting the networking devices and also for the interfaces between the leaf switches and the servers. And this is especially important in case of large data centers with thousands of connections, thousands of servers, because this IP addressing does not have to be done manually. Now, even though we are using IP version six addressing, we can still achieve native IP before connectivity between the servers. So we've got the whole stack. We can use IPv6 and we can use IPv4.  Thanks to using IPv6. We also get the possibility to use Neighbor Discovery protocol, which is also part of this standard. And thanks to it, we can automatically establish BGP sessions between, for example, our servers and the networking devices. So, we do not need to manually specify the IP addresses of peers in the BGP configuration. Now, thanks to using a dynamic routing protocol, we can achieve load balancing when forwarding traffic through our data center. So, we've got multiple paths to the destination IP. These paths can be used together simultaneously to achieve better throughput. And if one of the interfaces or if one of the devices goes down, the fadeover will be really fast because we've introduced bi-directional forwarding detection into this topology, which allows for sub-second fadeovers. And the choice of BGP is very intentional here because it is a very scalable protocol. It is very flexible. It allows for root filtering for root  aggregation, which further increases scalability, especially when we have thousands of IP addresses and thousands of servers. But it is also a prerequisite to some additional features such as Ethernet VPN. And this is something that we are going to talk about in today's webinar. So you might know that in case of typical layer two, communication in a data center, traditionally VLANs were used, so we needed to stretch VLANs throughout the DC. However, here we are using IP fabric. In order to provide layer two connectivity through an IP fabric, we are going to use so-called VXLAN tunneling where Ethernet VPN will  allow us easier configuration and some additional benefits which we will talk about in a second. Now, this layer two connectivity can be used to provide connections between some IP addresses configured on physical interfaces. It might be addressed as configured on some logical interfaces in the Linux operating system. It might be some Mac addresses configured on containers. On virtual machines. So basically we can interconnect heterogeneous resources and provide this layer two connectivity between them and at the same time we also can provide multitenancy. So multitenancy, shortly speaking, is the ability to separate services of different customers, even if they are running on the same server, on the same operating system, and even if the services are communicating over, the same physical network. We'll show a working demo solution of what we are going to talk about today. And at the end of this presentation, we'll have a short Q&A session. So if you have any questions during the presentation, please use the YouTube chat and we'll try to answer these questions at the end of this webinar. So, this is the agenda. And we hope that you will enjoy this presentation and that you will find it interesting. Let's start by introducing two main actors that will play a major role in this demo and this presentation. The VXLAN and the EVPN. The VXLAN stands for Virtual Extensible LAN. It's a network virtualization technology that attempts to address the scalability problems that are associated with large deployments. It uses, a VLAN encapsulation technique It encapsulates Layer 2 frame inside the UDP datagrams VLAN endpoints which terminate VXLAN tunnel may be at virtual or physical ports or virtual ports Linux known as VXLAN interfaces. There is one implementation of the VXLAN with the switch on the operating system. It's an Open vSwitch. It's a fine example, but we'll concentrate on using plain bridges in the Linux operating system along with the FRR and along with the VXLAN interfaces. In Linux. the VXLAN can be configured in three ways as a point-to-point interface with local and remote IP address to connect both sides. It can be configured as a point to multipoint interface, with local address. And remote addresses are discovered using multicast. Also, it can be used as a point to multi-point interface where all other endpoints are discovered and advertise BGP protocol or via some [SDN controller](https://codilime.com/glossary/sdn-controller). Since the solution where the EVPN and BGP protocol is the most flexible one and scale's find 2000 nodes we'll use EVPN inside the FRR software in our demo and in our presentation. Now let's talk about the Ethernet VPN. The Ethernet VPN allows you to connect group VXLAN and extend layer to resources, layer two bridges over layer three network The EVPN is used for signaling and for transportation. You can use VXLAN or MPLS, however, to avoid building MPLS data plane, and MPLS label exchange information we'll use VXLAN. That requires only a working IP fabric below the EVPN allows us to stretch, as I said before, the layer two connectivity and provide segmentation and isolation just like VLAN, but without any limitation of traditional networks such as Spanning tree. We no longer have to use it.  We want to counter layer two loops. There won't be Active/Passive links and the \[[BUM traffic](https://codilime.com/blog/infographic-bum-traffic-in-the-l2-and-l3-domains)] will be limited to as little as possible. Also, the EVPN has some additional features, such as Mac IP, advertisement or Mac mobility. So as soon as the resource appears on the network, the rest of the node, the rest of the termination endpoint will be informed immediately that the BGP protocol. So  there won't be the information of forwarding an unknown unicast. The end point address will be well known. So also, as soon as the endpoint will move, the Mac address will appear on a different machine. The dataplane will be updated using a BGP protocol. There is also a Layer 3 advertisement just like in the layer three VPN. But we are still using EVPN for layer two and layer three, using different type prefixes. For the layer three information, exchange for the layer three routing. We use type five prefixes and we can advertise slash 24 prefix, for example, into a different segment, different VRFs to have routing. Last but not least, the BGP EVPN is an open standard, is not limited to the proprietary vendors or appropriate equipment it can be run on Linux. And one of the examples of the demand that are supporting BGP EVPN is the FRR, the free range routing demon. And that demon can be placed on the machines that are hosting virtual machines containers Kubernetes  being a bare metal itself. This all will be shown during the demo. OK, so let's take a look at how EVPN can work in practice here on this example technology. We've got three different servers and a firewall and we want these resources to be able to be able to communicate at layer two. Now, the devices, the networking devices are configured using IP fabric, so we've got only layer three connectivity possible between them. So, in order to provide these layer two connections for our servers and the firewall, we will use the VXLAN tunneling, which will be configured on the leaf one and leaf two switch. What we'll also need is a BGP protocol with EVPN enabled. Thanks to it, it will automatically advertise information about the known Mac addresses that have been learned by each of the switches and will also advertise information on how to reach these Mac addresses meaning which VXLAN tunnel to use in order to do that. What we need to do in the configuration of leaf devices is to associate these interfaces with specific VXLANs, with specific virtual network identifiers. So, port one is assigned to VXLAN one and port two assigned to VXLAN two. And the same on the other device. When we have this configuration, we can see how this Mac address learning and connectivity works. So, for the Mac address learning, we've got server three and let's assume that it needs to send traffic to the firewall on the interface with Mac E so it will create a new Ethernet frame with source Mac address C and a destination Mac address E.  This frame will be forwarded to leaf two interface and it will arrive on Port one and leaf two will perform standard Ethernet switch learning. So, it will take a look at the incoming frame and it will write it down in the switching table. It will write down this source Mac address and it will write down the interface through which this Mac address is reachable. Now, before forwarding this frame to the destination, EVPN will also take a look at the switching table and it will notice. All right, I've got a new Mac address, Mac address C. I need to advertise it to all other BGP peers that are also running EVPN. So, it will take this Mac address and in our case, advertise it to leaf one. In the case of a large data center, this Mac address will be advertised to hundreds, sometimes thousands of devices that are also running EVPN. OK and leaf one upon receiving this advertisement will take a look at it and it will see. I've got a new Mac address, Mac address C, which will be put into the switching table and the next copy interface in the direction of the destination will be a logical, VXLAN one tunnel interface that leads to leaf two. OK, so this is the way that the new Mac addresses are advertised throughout the data center. Now, for the forwarding of traffic through the VXLAN tunnels. Let's say server one needs to communicate with server three. So again, a new Ethernet frame is created, source Mac Address A and destination Mac Address C. And this frame is forwarded to leaf one switch. It will take a look at this at the switching table and it will see I need to forward packets to Mac address C, so it needs to be encapsulated within VXLAN one tunnel and forwarded to leaf two so as an IP packet with UDP and VXLAN headers, it is forwarded to leaf two switch where it is de-encapsulated. So the original frame is taken out of the VXLAN header and leaf two switch will check the destination mark of this Ethernet frame again using a switching table. So, we've got Mac address C reachable through Port one, so this is exactly through which  interface the original frame will be forwarded to the server three. So, this concept is actually quite simple when it comes to what we've seen on the slide. However, the capabilities that it gives us are actually quite far reaching. And this is what we want to show you in the following examples. All right, so before, we mentioned that the VXLAN tunneling  and the EVPN can also be run in software, so not only on the networking devices, but also, for example, on the Linux operating system. In our case, in order to do that, we are using FRR because it supports [data plane](https://codilime.com/glossary/data-plane/) integration, meaning that it can manipulate Linux's bridge tables and routing tables. Now, let's take a look for a moment, at  server three. And we can see here that server three has one physical interface over here connecting it to the networking switch. But it also has some logical interfaces, bridge one and bridge two interfaces. And in Linux, there are several types of logical interfaces that can be configured in case of bridge interfaces. This is like putting a network segment inside of a single server, inside of a single operating system where layer two forwarding can be done. Now, it is also possible to assign an IP address to a logical interface, including bridge interfaces. So, here we've got IP address B and IP address H for the second interface. Now these IP addresses can communicate with each other, at least by default. So we ought to be able to ping one IP address from the other. However, in multitenancy scenarios we would actually require these bridges and for these IP addresses to be separated, because, for example, one of the IP addresses might be used to run a service from one customer and the second IP address might be running service from another customer, and they should not be able to communicate with each other. In the case of Linux there are several ways to achieve this functionality. And what we did here is we used VRFs so-called VRFs, which stands for virtual routing and forwarding and basically creating a VRF is creating additional routing and switching tables. So, here we've got Bridge one, which is assigned to VRF one, and that means that now Bridge one can only communicate with other interfaces, be it logical or physical, that are also connected to the same VRF. It is no longer able to communicate with interfaces located in global routing table or in other VRFs. Now, the same functionality is also available in most of modern data center switches where we can assign, for example, a physical interface port one to VRF one and interface port two to VRF two. And thanks to that, server one and server two would also not be able to communicate with each other. So this multitenancy can also be provided easily by the networking devices. It is also supported on the VXLAN tunnels. If we use different virtual network identifiers, then the network segments in the overlay topology that we create are also separated from each other, at least by default. So, this is OK for us. Now the only thing that is left is to also have EVPN sustain this separation, sustain this multitenancy. And it does support it. And in order to do that, it uses so-called route targets. This is a feature of BGP protocol and basically it is associating, it is adding some extra values to the Mac and IP information that is being advertised by the EVPN protocol. So, for example, here we've got again server three with bridge one and this bridge one interface has Mac address B associated with it. This Mac address will be put by the operating system in the switching table for VRF one and EVPN will notice, all right, I've got a new Mac address that I need to advertise. However, it will be advertised with the route target value that is associated with this VRF one routing table and switching table. So, this information will be sent to leaf one switch and leaf one will take a look at the advertisement and also make sure to note the route target, because this will determine which switching table it will put the information into. Here we see that Mac address B is being entered into the switching table for VRF one and the destination is VXLAN interface locator leading to server three. All right, and thanks to this configuration we achieve a topology, we achieve an environment  where we have server one with IP address configured on physical interface, which can communicate at layer two and layer three with an IP address that is set up, that is configured on some logical interface on server three. Another server, server two, being able to communicate with an IP address configured on the second interface on server three. However, these interfaces are separated from each other using VRFs. So, we've got this multitenancy sustained here as well. One last thing to notice on this slide is that leaf two does not take part in VXLAN tunneling and doesn't need to run EVPN. And in truth, if most of our servers would run FRR, then we wouldn't need many or we wouldn't need networking devices that support EVPN or VXLANs, depending on the scenario. So, it  can be something that can decrease the costs of the solution if we are running EVPN and VXLANs in software. OK, so you might be thinking that this EVPN must be very complicated, that creating these VRFs on Linux is also probably hard work. But in truth, when you get the hang of it, it is not that hard and there are not many commands in order to configure it. So for example, here we've got a server where we want to create a new bridge interface with some IP address assigned to it. In order to do that, we issue two commands in Linux. We create a new interface, with the name Bridge one and the type of network bridge. And we assign an IP address to it. Now we want to separate this interface from all other IP addresses and interfaces that might be configured on the same server. So we are creating a VRF one and putting the bridge interface into it. So these are the three commands: create new VRF, enable it and associate bridge one interface with VRF one. As the next step, we want to be able to communicate, we want to be able to allow for communication with whatever is connected to bridge one through the VXLAN tunnel, to all the other Mac and IP addresses that are also associated, connected to VNA 1 and in the opposite direction. Also, everything that is coming from the VXLAN one tunnel to the server will be sent to the bridge one interface. So, in order to do that, we create the VXLAN interface with virtual network identifier one, standard UDP destination port for VXLANs. And we also make sure to specify here that the local endpoint will be at IP address of 10 01 11. So the end point of the VXLAN is at the loopback interface over here. We enable this new interface and we attach it to the bridge one logical interface. So, this is it when it comes to the configuration of interfaces in Linux. The last thing that we need to do is to enable advertisements using EVPN. And here we assume that we are starting from the place where we left off at the last webinar. So, the FRR demon is up and running. The BGP protocol is enabled. And what we do is we just enable EVPN address family, create configuration for VNI one where we specify appropriate root targets for our VRF and then we enable the EVPN advertisements. So, in truth, this is quite a few commands, but not a lot. And this kind of configuration, keep in mind, that can be simplified and automated using scripting. So it is not that hard to do at all. OK, so we have Linux and bridges connected and assigned to the VRFs. So now let's move to virtual machines.  We can connect them, there are two ways to do so. Each virtual machine in Linux, most frequently, is associated with the Tap interface, the layer two inte