29 September 2020

Low-level programming

P4-programmable smartNIC controlled by ONOS (video)

65 minutes reading

P4-programmable smartNIC controlled by ONOS (video)

The traditional NIC (Network Interface Card) is a relatively simple device equipped with Ethernet interface(s) and used to enable connectivity between machines. SmartNICs, by contrast, offer much more sophisticated capabilities and allow you to perform advanced operations on packets. SmartNICs are thus perfectly suited to optimize network performance in a data center, especially those that are programmable and offer the computing resources required.

Consider the example of VNF (Virtual Network Function) offloading. The idea is to execute common network functions—a firewall, a NAT or a load balancer—directly on a smartNIC rather than as a virtual appliance (a virtual machine or a container) deployed on the server.

SmartNICs with P4 support are a particularly compelling solution in this context. They make it possible to express how the network dataplane has to process packets using P4 language, which is gaining in popularity in the networking industry and is considered the next step in the evolution of SDN.

An important aspect is how to effectively control the P4 smartNIC at runtime phase. What you need is control plane software. It can be based on an SDN controller, for example. The video below walks you through how we integrated the smartNIC with an open-source ONOS controller using a dedicated smartNIC proxy developed expressly for the purpose by CodiLime’s R&D. We present a VNF offloading use case in practice by providing a custom P4 implementation of an example network function (a firewall) and loading it onto the smartNIC.

In fact, the scope of our PoC is wider than only P4-related aspects. We show how a heterogeneous dataplane—P4 smartNIC with emulated OVS-based leaf-spine fabric—can be controlled by a single SDN solution (ONOS) using different protocols. We also enhanced ONOS webUI and CLI with additional features related to our firewall use case. To the best of our knowledge, this is among the first such PoCs, if not the first.

Hi, my name is Paweł, and my name is Artur. Welcome to our webinar. Today we would like to present the solution we have developed in CodiLime's R&D department and this is a P4-programmable smartNIC controlled by ONOS controller.

So before we start, let me say a few words about our company. CodiLime is a networking industry expert. We've been providing cloud and network expertise to top global networking hardware and software providers and telecoms since 2011. Currently, we have more than 200 network, software and DevOps engineers on board. And what is important regarding the topic of this webinar, we do have experience in integrating smartNIC devices with third-party management or control plane software, including SDN controllers. And we are also experts in the DPDK and P4 development for those areas. We have also successfully delivered a number of such projects already for top tech companies in Silicon Valley.

p4 development

OK, so I think it's high time to get started with the content we have prepared for you today. So we'll start with just a few words on technologies or components that are important to our solution. Next, we'll give you a brief overview of our PoC architecture and then we'll discuss its software components one by one. We'll also present how the entire environment is configured and how it can be set up in an optimized way. And finally, we'll show you a demo and then there will be time for a Q&A session, as we usually do during our webinars. So let's start.

As I mentioned, the essence of our solutions is mainly controlling P4-programmable smartNIC by ONOS SDN controller. To the best of our knowledge, this is among the first such PoC, if not the first. So before we start discussing the detailed architecture of our proof of concept and its internal components, we'd like to give you a very brief introduction to what P4 is all about, what smartNIC we used and how P4 is in general handled in ONOS.

OK, so what is the P4? If I had to answer this question with just one statement, I would say that the P4 approach is to define how the data plane of a given network device is to process packets. This concept is based on generally the same principle that is used in the areas of CPU, GPU or DSP programming where we write high-level programs using a domain-specific language. Next, we compile the code and it is executed by the given domain-specific processor. And now how to build a network system according to the same paradigm?

So basically doing the same thing. But it is absolutely crucial that we have to start using programmable chips. In addition, we need a framework to describe what network functions we want to perform and how the overall data plane processing pipeline should be organized. And this is where P4 language helps us to express that in a standardized way. So when your P4 programs are ready, you just compile it and load executable files into the chip and that's all.

P4 stands for programming protocol-independent packet processors. It comes from the paper published in 2013 where the idea of P4 language was originally described. The syntax of the language is similar to the C language, I would say. But P4 as such is less flexible than C. For example, some constructs that are present in C like loops, are not available in P4. So far two versions of the language have been released. The first one was called P414 and the second one is known as P416.

What is very important is that P416 specification introduces the so-called architectural model, which defines all existing function blocks for a given data plane. And here on the slide, you can find different examples of architectural models. In general, all those models are defined as a so-called pipeline the incoming packets will be processed within, and such a pipeline contains different blocks. Some of them will be fully programmable and these are called programmable blocks and some will be fixed functions, meaning that P4 programmers will not have any control over them. So now let's have a look at how to create a P4 program and apply it to the target device.

So, the first step would be to check the architecture model our device complies with because we need to know which blocks are programmable in fact. Now we can write all the P4 code and after it is ready, we compile it using the compiler, which is usually provided by the device manufacturer. And the compiler generates binaries which are loaded onto the device and since then all the objects defined in the P4 code are present in a data plane.

Now, to effectively talk to this data plane, we need a control plane software. For this purpose, we can use, for example, some open-source SDN controller. And in our case, a data plane is a smartNIC and the control plane is based on ONOS. A couple of words on ONOS. It's a well known open-source controller developed by Open Networking Foundation. I would say, it has a typical architecture of a core part where ONOS services are implemented. Those services provide a kind of inventory of currently connected devices, host, links, etc. And they also provide an overview on a current network topology. And they also store all match action rules installed in the devices. And those subsystems provide APIs, which can be in turn consumed by ONOS applications. Some of them are offered just out of the box within the given ONOS release.

But since ONOS is a mobile platform, you can write a totally new app on your own, extending the ONOS capabilities with these new functions. In this old one part, you've got a collection of drivers and protocol libraries you can use to communicate with particular devices. One of them is P4Runtime, which is a protocol used to control P4-defined data plane. In fact, P4Runtime is an API specification based on the so called protocol buffers, which provide a method for serializing structured data in an efficient way. It was designed to be oriented around simplicity and performance. Data structures are called the messages and they are defined in so called .PROTO files. P4Runtime uses gRCP as a transport, which is an open-source RCP system. And its main role is to manage the way a client and the server can interact.

And what's important, it has a lot of nice features supported natively like security, like authentication mechanism or bidirectional streaming, etc. And what is more, using protobufs and gRCP allows you to automatically generate the required code for both the client and the server for many different languages like Java, C++, Python, Go, etc. And just to give you an example, maybe without any deep dive, since we don't have time for this today. This is how the P4Runtime WriteRequest message may look like. Here it is presented in the so-called protobuf text format. Here we can see how an example table defined in some P4 program can be populated with sample data using this message. ONF has been developing a number of projects where P4 is used.

However, those implementations are dedicated to a multipart physical switch like a DC fabric switch, for example. And what we would like to focus on today is the smartNIC as a device onto which P4 programs are loaded. Unlike the traditional network interface, smartNICs usually offer some level of programming capabilities, meaning that you can define how they will perform operations on packets. And especially those devices that support P4 can be an interesting option. You might want to read our blog posts where we analyze such solutions.

All right. One of possible scenarios where we can employ smartNIC is the so called VNF offloading. This can be an interesting use case when we think about some deployments in the data center for which you would like to generally optimize network performance. And this is what we want to show you today. Which smartNIC did we use for that purpose? So we took a solution offering a native support for P4 - a Netronome Agilio card equipped with two 10GbE interfaces with the NFP chip on board.

The latest version of the SDK declares support for the P4Runtime protocol here, but it is not actually fully supported. And therefore another protocol or method which is called Thrift is the preferred way to control this card at runtime. We'll talk on that later on.

Now, I would like to briefly discuss the high-level architecture of our PoC that we have developed in CodiLime's R&D department. So, the main data plane component here is the smartNIC, of course, that is installed in one of our bare metal servers. What we want to achieve is to offload some network functions to be executed directly on the smartNIC. It can be a firewall, it can be a load balancer, it can be NAT, et cetera.

For the first round, we have prepared some example P4 implementation of the firewall. But as I said, this is only an option, in fact, because other functions can be executed as well after preparing the required P4 implementations in the future. And as you can see, in fact, the scope of our PoC is wider than only P4 itself because apart from the smartNIC we have setup on the other bare metal server, a kind of emulated leaf-spine fabric based on the OVS software switches And all the components above the smartNIC and the emulatated leaf-spine are controlled by the same SDN controller, which is ONOS in our case. What we want to show is that a single SDN controller can control a heterogeneous data plane environment using multiple protocols.

So we are using OpenFlow for OVS switches and to control their smartNIC data plane we have developed a dedicated proxy that converts P4Runtime protocol to the Thrift protocol, which is understood by the smartNIC, as mentioned earlier. Now let's start to discuss different elements one by one. So maybe let's focus on the P4 program first. Okay. Now I would like to briefly discuss the highly, sorry I would like to present our P4 code to you to make you familiar with details of our implementation and with P4 language itself. As it was said before, P416 is a universal language which can be executed on different types of devices. Abilities of those devices are described by P4 architecture shipped by manufacturer. Here we see the V1Model, which is one of the reference architecture provided by P4.org. And also it can be used by our Agilio smartNIC.

This model is composed of the parser, five control blocks and one fixed function block. Parser and deparser are responsible for packing and unpacking packets in the appropriate way. Checksum-related blocks can be used to verify and update checksum in case of packet header manipulation. Ingress and egress blocks are used to execute the processing logic. The parser is responsible for extracting previously defined packet headers. Headers extraction is necessary, to allow the P4 program to operate on those data. The parser block is quite limited and it mainly allows us to make use of the IF statements.

In fact, the entire logic of our firewall is sitting in the ingress block. The heart of it are two tables. First one, is the forwarding table. The second one, is the actual firewall table. Our forwarding mechanism is quite simple, and it differentiates packets only per input port value. Then, it forwards the packet to the given output port or drops it. We had to do this because in P4 there is no default behavior and all codes need to be defined by ourselves. The firewall table contains all fields which we use for magic of course, like protocol from IPv4 header or source address from Ethernet header. For the given packet our table can perform one of two defined actions: allow or drop. Actually, there is a third action here - no action, but it is only to make it default.

The last part is the deparser block, which is responsible for reassembling packets. During the build phase the P4 code is changed to three necessary products which are relevant to Agilio smartNIC: Netronome binary file, design file and P4 info file. Aside from the binary and design, we have P4 info file, which is a definition of our data plane implementation. Next, this file is being utilized by P4Runtime client. The next element of our PoC architecture I would like to discuss is our custom firewall control application. When it comes to say about the details of our application, it is composed of many different parts which are present in the tree diagram. Starting from the top, we can specify two basic elements of our firewall application: the pipeconf and the control application.

First, let's talk about the pipeconf. In the ONOS, pipeconf is a service to manage the configuration of protocol-independent pipelines. It's actually from the documentation of ONOS. Going down here, we divide the pipeconf package into three components: behaviors, models and extensions. Actually, this division is imposed by the way of creating a pipeconf object in the code. Here we have an example from our codebase how we create the pipeconf object with previously mentioned elements. Our implementation of the pipeline interpreter is acting as a behavior component.

The task of the behavior part is to map ONOS internal data structure to P4-defined data fields. For example, we need to tell ONOS that a field named protocol from IPv4 structure, defined in our P4 program of course, corresponds to IPv4 ProtoField in ONOS internal structures. But passing only the field name is not enough. Actually, ONOS talks to the smartNIC via P4Runtime protocol which uses identifiers instead of names. Here comes the model part of the pipeconf object. It loads P4 info file and in an automatic way allows us to use P4 names in the ONOS code. Those names are mapped to the identifiers during communication with a P4-defined database. Here is an example of this. We can see that table t_firewall is translated to this ID. The same is with one of the fields. In this part, it's translated to ID 1. Also, this definition contains the name of this field, the size and match type.

Also, actions are mapped to the IDs. The last part are the extensions of pipeconf objects. In this case, we determine which files specific for a given data plane have to be used, of course in our pipeconf object. We have discussed the structure of pipeconf package. Now let's take a look at the control application, which is a control plane, part of the firewall application. In ONOS, you can write an application, which is pipeline-aware, meaning that this application is dedicated for a given data plane. Our control application contains firewall service to manage firewall devices and the rules and also a fully featured user interface. Our graphical user interface's components are based on Angular integrated with ONOS domestic interface. Here is an example of our firewall Web UI.

Also, a firewall-related command line interface is integrated with the ONOS environment. It extends ONOS native CLI with some firewall-specific commands. Here are examples of this CLI and its usage. Firewall service is the actual logic executor for our application. The main task of this firewall service is to manage and apply rules for P4-defined devices. The last element of our PoC software stack, which I would like to discuss, is a smartNIC proxy. Our proxy acts as an adaptation layer for Agilio smartNIC runtime interface. To better understand why we decided to build this adapter, let's start from the beginning. Normally, ONOS communicates with P4-capable devices via P4Runtime protocol.

As we mentioned earlier, Agilio smartNIC manufacturer declares support for P4Runtime as well, additionally to legacy Thrift-based protocol. And these are the two ways to control the smartNIC at runtime. Since both ends support P4Runtime, our natural choice was to use it in order to realize communication between data plane and control plane. But after a first try, it turned out we are not able to do this. We realized that ONOS supports the newest version of P4Runtime protocol. Well, LTE for our smartNIC was using some very early P4Runtime version.

And those two versions were not compatible with each other. Then, our next step was to downgrade the version of P4Runtime in ONOS, but it didn't work either. to which we got from Netronome seems to have some limitations. For example, it was not possible to use ternary matching. Also, packet-out mechanism was not implemented. As you can see, it was not possible to make it work according to our expectations. Then, we started to think about how to deal with it. There is a Thrift-based protocol supported by smartNIC, but ONOS doesn't contain it. So, we made a decision to implement a smartNIC proxy. ONOS populates the firewall tables with P4Runtime protocol, which are accepted by the proxy gRPC server. After that, this data is repacked and packed with Thrift-based protocol and sent to a smartNIC. OK, so maybe a couple of words about the architecture of our PoC.

OK, so as I mentioned earlier, the main data plane component is a P4-programmable smartNIC, as you know. But we also have a kind of leaf-spine fabric which is based on Open vSwitch instances, and this is created using the mininet utility. And both smartNIC and the leaf-spine fabric are controlled by ONOS, but using different protocols. And additionally, they're emulated hosts connected to the data plane network just to generate and receive real traffic within the data plane. So we've got three end-user instances and two hosts, acting as application servers.

All the emulated hosts and the mininet with leaf-spine topology are running in Docker containers To enable interconnection between all these components common Linux network techniques and objects are used, such as linux-bridge instances or veth links, etc. And this is how it looks like on a physical level. So the demo is deployed on two bare metal servers. The first hosts the smartNIC equipped with 10G Ethernet interfaces. And on the second one, all other data plane and control plane components are set up. The general idea of the demo is that end users depicted here as Host A, Host B and Host C will try to access services available on the application servers that you can see here. In order to do that, the traffic first needs to go through the OVS-based leaf-spine fabric that end users' co