FPGA programming has been gaining momentum lately, as it offers considerable benefits. It allows you to offload resource hungry-tasks to hardware and thus increase performance. FPGAs can be programmed and reprogrammed according to current needs, which is very cost effective in the long run. In this article, I explain what FPGA is, how it can be programmed and where it can be used.
The acronym FPGA stands for Field Programmable Gate Array. It is an integrated circuit that can be programmed by a user for a specific use after it has been manufactured. Contemporary FPGAs contain adaptive logic modules (ALMs) and logic elements (LEs) connected via programmable interconnects. These blocks create a physical array of logic gates that can be customized to perform specific computing tasks. This makes them very different from other types of microcontrollers or Central Processing Units (CPUs), whose configuration is set and sealed by a manufacturer and cannot be modified.
Fig. 1 An overview of FPGA (source)
The first programmable circuits were very simple and contained only logic gates. This was enough to perform many logical functions where zeros and ones were the inputs and outputs. With time, programmable circuits were becoming more and more powerful. In programmable circuits, you program logic cells that can work as registers, adders, multiplexers or lookup tables. How the cells work and its structure can both be changed while the circuit is working. A circuit can be reprogrammed to perform different functions, for example, that of a processor in the ARM architecture, a network interface card or a video encoder, to name three.
Fig. 2 Adaptive Logic Module of an Altera/Intel FPGA (source)
FPGAs consist of logical modules connected by routing channels. Each module is made up of a programmable lookup table that is used to control the elements that each cell consists of and to perform logical functions of the elements that make up the cell. In addition to the lookup table, each cell contains cascaded adders enabling addition to be done. Subtraction can also be done by changing the logical states of the input. Beyond these, there are also registers (logical elements used to perform the simplest memory functions) and multiplexers (switching elements).
FPGAs can also include static and dynamic on-chip memories, depending on the specific manufacturer model. In addition, in FPGAs you can find ready components, such as CPU cores, memory controllers, USB controllers or network cards. These components are so popular that there is no need to implement them in the FPGA structure. Instead, you can use an already manufactured component.
FPGAs are mainly used to design application-specific integrated circuits (ASICs). First, you design the architecture of such a circuit. Then, you use an FPGA to build and check its prototype. Errors can be corrected. Once the prototype works as expected, an ASIC project is created and manufactured based on the FPGA design. This allows you to save time, as manufacturing an integrated circuit can be a very complex and time-consuming process. It also saves money, as one FPGA can be used to prepare many iterations of the same project. In this context it is worth mentioning that Modern Tensor Processing Units (TPUs) or cryptocurrency miners were first designed as FPGAs and manufactured only after maturing.
FPGAs are also used in real-time systems where response time plays a crucial role. In standard CPUs, response time is not set and you do not know precisely when you will receive a response after the initial signal appears. To minimize or keep it within a given range, real-time operating systems are used. Still, in the scenarios where a fast response time (under milliseconds) is necessary, this falls short. To solve this problem, the requested algorithm needs to be implemented in FPGA using combinational or sequential logic to ensure a response time that is always the same and under milliseconds. Such a real-time system implemented in FPGA can be modified and moved into manufacturing once it is ready. An integrated circuit created in this way will be much faster and more energy-efficient.
Apart from that, FPGAs are used in projects where hardware configuration is subject to change and a circuit that can be adjusted to these changes is called for. In case you change your hardware suppliers and the new hardware does not have the required interface, FPGA becomes a natural choice.
The notion of “FPGA programming” may be a little misleading. After all, there is no real program to run sequentially, such as CPUs or GPUs both have. FPGA programming consists in creating hardware architecture that will execute a requested algorithm and describe it in a hardware description language (HDL). Consequently, the building blocks of this algorithm will not be a memory register and a set of operations to be performed, as with standard programs executed by CPUs or GPUs. An “FPGA program” will consist of low-level elements including logic gates, adders, registers and multiplexers.
This offers you a great deal of flexibility. For example, if you have a 20-bit data type, you can use exactly 20-bit instructions to perform operations. In the CPU world, you have only manufacturer-set registers and instructions, which can’t be changed. In FPGAs, on the other hand, you can adjust to the data type because you are designing hardware architecture yourself.
You can also implement operations which are either very complex or time-consuming for general-purpose CPUs. Block ciphers and cryptographic functions, for example, are performed by CPUs in many cycles, requiring much more time than FPGAs do.
As I mentioned above, with FPGAs, hardware architecture is designed to perform specific tasks. In general-purpose CPUs, the architecture, memory and instructions are set and sealed by the manufacturer. You need to adjust to what you already have, as these elements cannot be modified.
To program FPGAs, you use specific languages such as VHDL or Verilog. The VHDL’s syntax is more similar to Pascal than C, making the programming different than with typical high-level languages. Verilog, however, is similar to C, which should make it more intuitive and easy to use for people who have no prior experience with low-level programming.
VHDL is a somewhat archaic language with certain pitfalls, one being that checking if the architecture works as expected is very hard. In some projects, to make our life easier, Python is used to generate parts of the code. Of course, everything could be written in VHDL, but doing it in Python is easier.
The HDL simulator is the key tool for building hardware architecture. It allows you to run the simulation of how the architecture works when given sample input data. This in turn enables you to see how the data flows. The HDL simulator is also crucial because the process of compiling a given hardware description into an FPGA board and programming the board itself, can be very time-consuming, even for a simple program. The simulator allows you to thoroughly verify the algorithm you want to implement into an FPGA board.
Consider the cumbersome task of debugging hardware architecture. When designing such an architecture, you need to make it debuggable. It may happen that hardware architecture in the configuration ready to be launched can be implemented on an FPGA board, but in the configuration for debugging it cannot. Such an architecture will have worse time characteristics. Therefore, before implementing any algorithm into an FPGA board, run simulations to check whether it works as expected or not.
Hardware acceleration is a main FPGA use case. In a nutshell, repetitive and compute-intensive tasks are offloaded from a computer or a server to dedicated hardware such as FPGAs. Tasks that usually fall to CPUs are offloaded to hardware.
Enabling display graphics, Graphic Processing Units (GPUs) are the most popular and widely used hardware for this type of operation. Of course, they can also be used to perform computations, but only of a specific type.
Acceleration with FPGAs works in the same way as hardware acceleration. The only difference lies in the implementation. From the server’s point of view, both types of acceleration are the same. The main advantage of FPGA technology is its flexibility. It is easy to change the hardware acceleration even for hardware currently in use. You can also release updates or have many implementations that work on the same board.
When it comes to integrating hardware with software, GPIO is a good example here. In this case, you join appropriate pins of a CPU with those of the FPGA board. You can then program hardware to interpret the signals sent via these pins in a given way.
You can also implement communication protocols such as Quad SPI. In the case of the servers with Intel-based architecture, a PCI controller will be the best communication method. In such a case, you implement PCI support and the CPU can write into appropriate registers using PCI. Additionally, you can add specific addresses to the FPGA board and it will write via DMA to the host’s memory. Simple communication protocols like UART are already implemented into an FPGA board. Furthermore, some FPGAs contain PCI controllers, which makes them even easier to use.
SmartNIC is a specific type of FPGA application. In a nutshell, SmartNIC is an intelligent network interface card that allows you to perform advanced operations on packets such as tunnel termination, applying sophisticated flow classification and filtering mechanisms on packets, metering and shaping. These functions are usually performed by CPUs, but SmartNIC allows you to offload them, thereby freeing up the server resources to focus on its primary tasks. Additionally, SmartNICs offer flexibility, as their behavior can be reprogrammed while they are running. SmartNICs are a very good example of hardware acceleration.
Software-Defined Networking (SDN) is a typical SmartNIC use case. The basic concept behind SDN is the separation of the control plane—the layer where the network behavior is defined and managed—from the data plane, the layer where the packets are processed. This approach allows you to control the entire network from a single control point and have a good view of the network topology. This all means better decision-making (e.g. about more efficient load balancing or better traffic distribution), not only minimizing the risk of mistakes being made but also saving time, as you don’t have to manually configure hundreds of devices.
When using SmartNIC in an SDN solution, you can offload some virtualized network functions (VNF), such as firewalls, to hardware. The control plane defines policies, while the data plane, i.e. SmartNICs, applies these policies to packets and decides which of them should be blocked or passed. IPsec (packet encryption) and VPN are other use cases for SmartNICs. But you’ll gain the most impressive range of options by offloading virtual switches and routers. These include implementing all of the above functions including firewalls, tunneling, encryption, as well as more complex tasks like routing and NAT.
Going forward, the FPGA market is set to expand. Major manufacturers of standard CPUs are expanding their product portfolio by acquiring companies specializing in FPGAs. In 2015, Intel bought Altera, a US-based manufacturer of programmable logic devices (PLDs), while last year AMD acquired Xilinx, the company that invented FPGA architecture.
FPGAs will also be more widely used in networking. Apart from programmable logic cells, they will contain highly specialized silicon elements, i.e. network interface controllers. You can also expect network-specific circuits to be developed.
From the developer’s point of view, FPGA circuits will contain more logic gates, allowing us to implement more complex functionalities. You will be able to put more network functionalities in a single circuit and piece of hardware equipment. Of course, this will make the entire implementation more complex too.