At CodiLime our R&D team is always looking for creative ways to approach the challenges that modern infrastructures face—whether it’s finding smarter ways to handle IoT traffic, accelerating network functions with hardware, or simplifying control over complex environments.
In this article, we’re excited to share the fruits of our research with four of our latest Proofs of Concept (PoCs). Each PoC addresses a unique challenge: optimizing MQTT communication with hardware offloading, accelerating packet filtering through FPGA-based eBPF offloading, simplifying packet generation with SmartNICs programmed in P4, and unifying control over programmable devices with ONOS. Together, these projects prove how hardware acceleration can deliver real benefits to real-world infrastructures.
Hardware TCP Offloading for MQTT Protocol Acceleration
Overview of the Proof of Concept
This Proof of Concept (PoC) explores using hardware TCP offloading to improve the MQTT protocol, a widely used communication protocol in IoT systems. MQTT operates atop TCP, facilitating message exchanges between publishers, brokers, and subscribers.
The PoC aims to enhance the efficiency of MQTT operations by leveraging SmartNIC-based offloading for specific broker tasks, including packet distribution and subscriber matching. This approach significantly reduces latency and CPU usage on the host system.
Key Benefits
Reduced Latency
By offloading packet distribution and topic matching to a dedicated protocol offload system, the PoC achieved up to a threefold decrease in MQTT message distribution time.
Optimized CPU Usage
The solution lowers the computational burden on the host CPU by isolating intensive tasks on hardware acceleration platforms, freeing resources for other operations.
Scalability
The offloading architecture demonstrated robust performance even under increased loads showcasing its scalability for large-scale IoT deployments.
Ease of Integration
Leveraging tools like the Nvidia BlueField DPU and lightweight TCP/IP Stack, the implementation retains compatibility with existing broker applications without requiring source code modifications.
Implementation
The implementation of this Proof of Concept evolved through three significant iterations, each building upon the previous to address challenges and improve performance. Initially, the approach focused on TCP monitoring, where the protocol offloads intercepted packets between the broker and its clients. To maintain synchronization, sequence number translation was introduced, ensuring that the offload could handle packet distribution without disrupting existing connections. While this method successfully reduced latency, it encountered limitations in scaling effectively under complex configurations with higher traffic loads.
To overcome these challenges, the second iteration introduced an embedded lightweight IP network stack within the protocol offload. This integration replaced the earlier TCP monitoring mechanism, resolving a significant number of issues faced in the previous TCP monitoring solution. The new architecture facilitated improved performance by acting as a separate endpoint and not facing the same issues as the previous iteration. As a result, it delivered significant scalability improvements, particularly in scenarios involving numerous publishers and subscribers.
In the final stage, the implementation adopted a socket interception approach to further optimize operations. This method utilized a shim library to intercept API calls from the broker, selectively redirecting relevant traffic to the protocol offload. By bypassing kernel-level TCP connections and relying instead on the lightweight IP stack, this iteration significantly reduced system overhead. The new design not only maintained high performance but also removed the requirement for code modifications in the broker, simplifying integration with existing systems.
Potential value for you
This PoC demonstrates the potential of hardware offloading to address the growing demands of IoT applications. By reducing latency and CPU load while ensuring scalability, it offers a compelling solution for organizations aiming to improve their network performance. The modular design also allows firms to adopt the technology with minimal disruption to their existing infrastructure.
Want to learn more?
Discover the full details of this Proof of Concept in our webinar on Hardware TCP offloading . Watch now to gain even more valuable insights into the process, results, and implications for your business. Prefer reading? Download our free ebook for a more in-depth look.
Offloading eBPF to FPGA in Cloud Environments
Overview
This Proof of Concept (PoC) explores offloading the extended Berkeley Packet Filter (eBPF) functionality to a field-programmable gate array (FPGA) in the cloud. eBPF, commonly used in Linux Kernel environments for network packet filtering, allows for efficient handling of network traffic before it enters the stack. This project demonstrates how FPGA devices can further improve eBPF’s performance by offloading its workload, enabling faster and more efficient data processing.
This PoC utilized Amazon Web Services (AWS) and their FPGA-based F1 instances to implement this offloading, showcasing the potential for integrating hardware acceleration into cloud services. This opens new opportunities for companies looking to explore FPGA’s programmability and performance for resource-intensive tasks without requiring on-premise hardware.
Key Benefits
Better Performance
By offloading eBPF to FPGA, this PoC highlights improved packet filtering efficiency, as operations are handled directly by hardware rather than through the Kernel. This reduction in processing load optimizes overall system performance.
Flexibility and Programability
FPGA offers programmable hardware capabilities, enabling specific use cases such as real-time network packet processing or even complex tasks like DNA sequencing. This flexibility allows organizations to customize hardware solutions to their needs.
Simpler with Cloud Integration
Using AWS’s FPGA instances simplifies the adoption process, as it includes necessary development tools and licenses. Companies can test hardware-accelerated solutions without significant upfront investment, reducing barriers to entry.
Scalable
The PoC demonstrates a foundation for scaling, with opportunities to run multiple instances of FPGA modules or directly interface with network cards to bypass CPU involvement, paving the way for even more efficient architectures.
Implementation
This PoC focused on using the hBPF project, a hardware implementation of eBPF. Using high-level synthesis tools like LiteX, our team generated hardware description language (HDL) files to design a custom FPGA configuration.
These files were deployed within the AWS F1 environment. The workflow involved several key steps, from generating HDL files and synthesizing them into bitstreams to deploying and testing the configuration on FPGA.
Once deployed, the system integrated with test environments using tools like DPDK’s testpmd application to simulate real-world network packet filtering scenarios. Packets were processed by the FPGA’s custom logic and routed back to the host for verification, proving the concept’s viability in handling real network traffic.
Want to know more about the technical details?
Check out our webinar on offloading eBPF to FPGA in the cloud to learn more about the implementation, challenges, and possibilities for scaling this technology in your projects.
Generating Packets with SmartNICs and P4
Overview
This PoC explores the possibility of using SmartNICs programmed with the P4 language to generate packets for testing and analyzing network infrastructure. This novel solution offers a lightweight alternative to traditional traffic generators by focusing on packet creation, receipt, and comparison, providing new insight into your network performance.
Using P4’s event-based architecture along with stateful features like recirculation, cloning, and registers, our PoC demonstrates a new approach to packet generation that eliminates reliance on host resources. By simplifying infrastructure testing this concept enables more efficient use of hardware resources in data centers.
Key Benefits
Optimized Host Resource Usage
By shifting the workload to SmartNICs, this solution reduces the strain on the host system, leading to better performance without compromising traffic generation.
Simplified and Scalable Packet Generation
Using P4, this system creates a continuous packet stream through recirculation and cloning. These methods allow the system to repeatedly process and modify packets without relying on resource-heavy traditional looping constructs, making it more efficient and scalable.
Hardware Offloading and Efficiency
This concept demonstrates the possibility of offloading tasks like packet hash calculation to a SmartNIC, taking advantage of hardware speed and efficiency.
Portability Across SmartNICs
Programming in P4 provides a layer of abstraction, making the solution portable across various SmartNIC models and reducing the need for vendor-specific low-level programming. This adaptability simplifies deployments across diverse environments.
Implementation
This PoC utilized P4 to program a SmartNIC for packet generation, with the P4 code running on BMv2 (a reference software tool for P4 development). This setup was controlled by a host application written in Rust. The Rust application sends control packets to BMv2, which includes directives like destination port, number of packets to send, and expected hash values. P4 handles the core packet generation logic, making use of features like cloning and recirculation to simulate a continuous packet stream.
Registers and counters are used to manage state and track metrics such as the number of packets sent, received, and hashed. While the PoC focused on core functionalities, the groundwork has been laid for extending this solution to handle more complex traffic patterns or higher-speed requirements in the future.
Interested in the full technical details?
Watch our webinar on generating packets with SmartNICs and P4 for an in-depth look at the implementation and its potential applications in networking.
P4-Programmable SmartNIC Controlled by ONOS
Overview
This Proof of Concept (PoC) focuses on controlling a P4-programmable SmartNIC using the ONOS Software-Defined Networking (SDN) controller. By integrating a SmartNIC with ONOS, this solution highlights the potential for centralizing control over network devices, such as programmable switches and SmartNICs, within a unified SDN environment.
The goal was to showcase the feasibility of deploying network functions like firewalls or load balancers directly on the SmartNIC. This approach minimizes reliance on traditional network equipment, optimizes resource use, and improves the management of diverse data plane components.
Key Benefits
Greater Network Efficiency
Offloading network functions to the SmartNIC reduces the processing burden on other network devices and servers.
Centralized Management
With ONOS controlling both traditional network elements and the SmartNIC, the PoC demonstrates a unified management system.
Scalable and Flexible Design
The SmartNIC’s programmability, coupled with ONOS’s modular nature, allows for quick deployment of various network functions.
Improved Interoperability
This PoC bridges gaps between different protocols and devices using a custom proxy, enabling smooth communication between ONOS and the SmartNIC despite compatibility issues with existing standards.
Implementation
The architecture of this PoC includes a P4-programmable SmartNIC integrated with a leaf-spine fabric built using Open vSwitch (OVS). The SmartNIC, functioning as the primary data plane component, was programmed with a P4-based firewall application. ONOS acted as the control plane, managing the SmartNIC and OVS fabric using different protocols. A custom-built proxy facilitated the translation between ONOS and the SmartNIC, ensuring problem-free operation despite protocol mismatches.
The setup involved deploying all components across two bare-metal servers. Docker containers hosted emulated hosts, network devices, and ONOS applications, while Ansible playbooks automated the setup and configuration process. This approach demonstrated how to achieve complex configurations with minimal manual intervention.
The PoC tested various scenarios, such as applying firewall rules based on Layer 2 and Layer 3 parameters, blocking specific traffic patterns, and integrating the SmartNIC with broader network setups. These tests underscored the versatility of P4 programming in enabling precise control over network behavior.
Want to see it in action?
Come and see the full technical details and demonstrations in our webinar on controlling P4-programmable SmartNICs with ONOS. Learn how this approach can transform network infrastructure in your organization.
These four PoCs demonstrate the potential of hardware acceleration and programmable devices in addressing the most pressing challenges in modern networking.
By making full use of emerging technologies like SmartNICs, FPGA, and P4 we’re helping organizations achieve new levels of performance, scalability, and efficiency. CodiLime is committed to pushing boundaries in networking technologies and exploring new ideas to stay ahead of the curve.