Blog>>Networks>>Network infrastructure>>Open Programmable Infrastructure - a common standard for DPU/IPU-like devices

Open Programmable Infrastructure - a common standard for DPU/IPU-like devices

With the extreme growth of data that is processed every day in data centers, network performance is constantly being challenged and improved. In this reality, hardware acceleration and offloading have become more popular, which causes the evolution of network interface cards even faster. Hardware vendors build more specialized devices called SmartNICs and DPUs/IPUs or, recently, superNICs. Unfortunately, the usual problem with young technologies is that each vendor develops their own standards and interfaces, leading to lower flexibility and vendor lock-ins. To address this issue, major companies in the field, like Nvidia, Intel, Marvel, F5, and others, started cooperation within the Linux Foundation project to create a shared, open-source standard for managing DPU/IPU-like devices. As a result, the Open Programmable Infrastructure (aka OPI) project was created.

OPI focuses on three main regions: lifecycle provisioning, API, and development environment. This article briefly explains each one of them.

Lifecycle provisioning

As the technology evolves from NICs through SmartNICs to DPUs/IPUs, the devices become more of a system rather than just peripherals. This requires monitoring and managing the entire lifecycle of the device and its system, from boot-up to shutdown. To answer that, OPI adopts multiple existing standards and defines its own solutions when needed.

Fig.1: Generalized example of a new system architecture
Generalized example of a new system architecture

Provisioning

OPI adopts Secure Zero Touch Provisioning (SZTP RFC8572 standard). Similarly to the classic Zero Touch Provisioning, SZTP allows automatic device configuration without any manual intervention. On power up, the device requests an IP address from a DHCP server to download required images and configuration files from the network and installs or updates its software automatically. To provide additional security, the Secure Zero Touch Provisioning adds a bootstrap server, which is used for mutual device-network authentication and requires the provisioned device to have preconfigured certificates and cryptographic keys installed. During bootstrap, the device needs to authenticate itself to the server, but it also validates the authenticity of the network in which it is deployed. This prevents unauthorized device usage in case of theft or wrong shipment. Finally, all the files containing device configuration have to be cryptographically signed by the server and checked for authenticity by the device. To satisfy all of that, OPI implements a reference SZTP client to run on the DPUs/IPUs.

Inventory

In the case of inventory queries, there are two separate scenarios. First, applications running inside the DPU/IPU need access to the inventory information locally. In this case, OPI adopts an already existing standard - DMTF DMI/SMBIOS, which is a well-known format for presenting management information. Only a subset of the standard is relevant for DPU/IPU-like devices, so the OPI team develops the smbios-validation-tool that can be used to test compliance with the specification. In the second scenario, applications outside the device need access to the inventory over the network. In this case, OPI defines a gRPC-based API. To achieve this, a simple container with a gRPC server inside called opi-smbios-bridge was developed, which is dedicated to being run on each DPU/IPU compliant with the standard. The server retrieves local management information upon request and responds with the retrieved data, so the gRPC client receives the SMBIOS information remotely.

Monitoring & Telemetry

Another aspect of system deployment is monitoring and telemetry. When running multiple different services in the whole network, it is essential to stream metrics, traces, or logs in a vendor-agnostic way to a centralized location, where the data is presented with Grafana or similar tools. OPI adopts the Open Telemetry (OTEL) standard. To show the solution, the OPI repository includes an example of an OTEL collector, but in general, the OPI standard mandates only OTEL specification. SDK and Collector-specific implementations are left to users. However, OPI recommends deploying an OTEL collector on every device, as close to the data generation as possible, and another aggregating collector outside the device in the central location.

API

The API region focuses on configuring services that run on DPU/IPU-like devices. OPI identifies a set of use cases and defines interfaces for them. These are Storage, Network, Security, and AI/ML. Every API is gRPC-based, which allows remote configuration. On top of that, there is an additional API gateway for authentication and monitoring purposes to avoid implementing these in each specific API.

OPI's API region

Storage

In the storage use case, OPI defines three sets of APIs - front end, middle end, and back end. The front end is responsible for configuring emulated devices exposed to the host. Currently, OPI defines front-end APIs for emulated nvme devices, virtio-blk, virtio-fs, and virtio-scsi. The back end is responsible for configuring network-facing functions. There are multiple ways of accessing the actual drives, such as using nvme over fabrics. All the configuration related to the storage connection is called the back end. When writing this article, OPI defines APIs for nvme, aio, null, and malloc volumes. Finally, the middle end is responsible for configuring services in between the previous two. For example, the data received from the host must be compressed or encrypted before being sent to the network. All this is configured by the middle end. Currently, OPI defines encryption and QoS services. The OPI repository also contains a few reference implementations for storage to present the functionality.

Network

In the network use case, OPI identifies three major areas—cloud, Telco, and Kubernetes. Currently, only the cloud API is defined, which allows the configuration of a DPU/IPU-like device as a kind of SDN switch/router. The two remaining APIs still need to be defined, but conceptual work is being done in the OPI API working group to achieve them soon. The set of areas is not closed, and probably, more areas will be defined later.

Security

The security area focuses on offloading security capabilities like TLS, IPsec, crypto, policy, filters, etc. onto DPU/IPU-like devices. Currently, only the IPsec API is defined, but the OPI repository also contains a strong swan reference solution to show how to use the API.

AI/ML

The AI/ML use case comes from customers' requests, but it is not designed yet, and at the time of writing this article, not even conceptual work has been done in this area.

Development environment

OPI also contains a virtualized development environment, which is a set of containers orchestrated by the docker-compose tool. All the pieces described above can be built, run and tested in this environment. Additionally, this approach allows hybrid environments to be built with pieces of hardware and simulated pieces cooperating with each other. Besides that, the OPI community also creates a lab with real hardware, so hopefully, we will see more advanced demonstrations of OPI on real DPU/IPU devices.

OPI's virtualized development environment

Conclusion

We can observe a growing interest in hardware acceleration and offloading as the technology evolves. From standard NICs, through SmartNICs, to DPUs/IPUs and superNICs, vendors create more and more complex devices with wider configuration possibilities. On the other hand, customers identify a problem with multiple different APIs and vendor locks. As a result, significant movement towards standardization has been observed in the industry, and the OPI project is a perfect example of this phenomenon. With the parallel development of hardware and projects like the one described in this article, we can hope for a bright future for accelerated networks.

Tkaczyk Andrzej

Andrzej Tkaczyk

Software Engineer

Andrzej is a Software Engineer at CodiLime, working mainly with DPDK, P4Lang, network drivers, and traffic offloading. He is also experienced in Linux Kernel Module programming and Netlink. In his free time, Andrzej likes choral singing and playing chess.Read about author >

Read also

Get your project estimate

For businesses that need support in their software or network engineering projects, please fill in the form and we'll get back to you within one business day.