Blog>>Networks>>Linux networking>>Packet dropping method: Iptables vs. alternatives in Linux

Packet dropping method: Iptables vs. alternatives in Linux

Have you ever wondered about alternatives to iptables for dropping packets in Linux OS? Well, there are a few methods to do this. In this blog post we want to share them with you. These methods are not restricted to just firewall rules and can be divided into the below categories:

  • ebtables - the same as above, but mostly focused on layer 2 (the comparison between ISO/OSI and TCP/IP models is presented in our infographic)

  • nftables - successor of iptables+ebtables

  • ip rule - a tool designed to build advanced routing policies

  • IP routing - transferring packets according to the routing table

    • BGP Flow Spec (how to deploy iptables’ rules using BGP protocol)
  • QOS - using the tc filter command design for QOS filtering

  • DPDK - using OVS working outside of kernel with the help of DPDK

  • HW filter - dropping a packet before

  • eBPF with a little help from XDP

  • filtration on OSI layer 7 using a user space application

Before we begin: just a quick reminder about packet flow in the Linux kernel:

Fig.1: Packet flow in the Linux kernelSource: Wikipedia      link-icon
packet flow in Netfilter and General Networking

If you are experiencing network issues in your Linux environment, remember that our guide on Linux network troubleshooting can help you diagnose and resolve the problems with your network configuration.

Packet dropping with iptables

iptables is the most popular method when it comes to processing packets in Linux. Filtering rules can be divided into 2 types which differ considerably: stateful and stateless. Using stateful filtration allows the packet to be analyzed in the context of the session status, e.g. whether the connection has already been established or not (the packet initiates a new connection). However, easy state-tracking has its price - performance. This feature is considerably slower than the stateless solution, but it allows more actions to be performed. The rule below is an example of dropping packets based on their state (here: a new connection):

# iptables -A INPUT -m state --state NEW -j DROP

By default, every packet in iptables is processed as stateful. To make an exception, it is necessary to perform a dedicated action in a special table “raw”:

# iptables -t raw -I PREROUTING -j NOTRACK

It is always good to remember that the iptables method allows you to perform classifications on multiple layers of the OSI model, starting from layer 3 (source and target IP address) and finishing on layer 7 (project l7-filters that is, sadly, not developed any more).

When handling stateful packets, it is also vital to remember that the conntrack module for iptables uses only a 5-tuple which consist of:

  • source and target IP address
  • source and target port (for TCP/UDP/SCTP and ICMP where other fields take over the role of the ports)
  • protocol

This module does not analyze an input/output interface. So, if a packet that has already been processed goes (in other VRF) to the IP stack once again, a new state will not be created. There is, however, a solution for this issue which involves using zones      link-icon in the conntrack module that allow packets interface $X to be assigned to the zone $Y.

# iptables -t raw -A PREROUTING -i $X -j CT --zone $Y
# iptables -t raw -A OUTPUT -o $X -j CT --zone $Y

To sum up: due to its many features the iptables drop method is slow. It is possible to speed it up by switching off the tracking of session states, but the performance increase (in terms of PPS) will be small. In terms of new connections/seconds, the gain will be bigger. More on this subject can be found here      link-icon.

In the long run it is planned to migrate the iptables method to Berkeley Packet Filter (BPF), then the major speed/performance increase can be expected.

Services Network professional services

ebtables

In case we want to go lower than layer 3, we have to switch the tool to ebtables as an alternative to iptables, allowing us to work on from layer 2 up to layer 4. For example, if we want to drop packets where the MAC address for IP 172.16.1.4 is different than 00:11:22:33:44:55, we can use the rule below:

# ebtables -A FORWARD -p IPv4 --ip-src 172.16.1.4 -s ! 00:11:22:33:44:55 -j DROP

It is important to remember that the packets passing through Linux bridge are analyzed by FW rules. This is managed by the below sysctl parameters:

  • net.bridge.bridge-nf-call-arptables
  • net.bridge.bridge-nf-call-ip6tables
  • net.bridge.bridge-nf-call-iptables

More info can be found in this article      link-icon. If we want to gain performance it is advised to disable those calls.

nftables for dropping packets

The aim of nftables (introduced by 3.13 kernel) is to replace certain parts of netfilter (ip(6)tables/arptables/ebtables), while keeping and reusing most of it. The expected advantages of nftables are:

  • less code duplication
  • better performance
  • one tool to work on all layers

The performance increase test      link-icon was done by RedHat. The following diagram shows the performance drop in correlation with the number of blocked IP addresses:

Fig.2: Diagram of a performance drop in correlation with the number of blocked IP addresses
Diagram of a performance drop in correlation with the number of blocked IP addresses

nftables are configured via the nft utility placed in the user space. To drop a TCP packet, it is necessary to run the following commands (the first two are required, as the nftables do not come with default tables/chains):

# nft add table ip filter
# nft add chain ip filter in-chain { type filter hook input priority 0\; }
# nft add rule ip filter in-chain tcp dport 1234 drop

Note: whenever both nftables and iptables are used on the same system, the following rules apply:

nftEmptyAcceptAcceptBlockBlank
iptablesEmptyEmptyBlockAcceptAccept
ResultsPassPassUnreachableUnreachablePass

Sometimes nft can become slow, especially when dealing with larger amounts of rules, complicated chains, or when running on low performance hardware such as ARM devices. For such scenarios, one can accelerate performance by caching actions assigned for specific flows (ACCEPT, DROP, NAT, etc.) and bypass further checking of ntf rules if a packet can be assigned to an already existing flow. The following diagram shows this idea in detail:

Fig.3: Example performance acceleration of nftSource: Example performance acceleration of nft      link-icon
Example performance acceleration of nft

Performance gain strongly depends on existing complexity but in the case of weak processors (such as ARM64) and complicated nft rules (such as in VyOS) one can expect a 2-3x throughput increase.

Enabling such features is relatively easy - one has only to add an extra rule:

table inet x {
        flowtable f {
                hook ingress priority 0; devices = { eth0, eth1 };
        }
        chain y {
                type filter hook forward priority 0; policy accept;
                ip protocol tcp flow offload @f
                counter packets 0 bytes 0
        }
}

Each packet hitting ingress will be checked against flow table “f”. Each TCP packet traversing chain “y” will be offloaded. For more information one can check the kernel nft documentation at the following link      link-icon.

ip rule

The ip rule tool is a lesser known iptables alternative method. In general it can be used to create advanced routing policies. As soon as the packet passes through the firewall rules, it is up to the routing logic to decide if the packet should be forwarded (and where), dropped or do something else entirely. There are multiple possible actions, the stateless NAT is one of them (not a commonly known fact) however, the “blackhole” is the one:

# ip rule add blackhole iif eth0 from 10.0.0.0/25 dport 400-500

ip rule is a fast, stateless filter used frequently to reject DDOS traffic. Unfortunately, it has a drawback: it allows us to work only on IP SRC/DST (layer 3), TCP/UDP ports (layer 4) and is based on the input interface.

Keep in mind that the interface loopback (lo) plays an important role in the ip rulesets. Whenever it is used as an input interface parameter (iif lo), it decides whether the rule is applied to the transit traffic or to the outgoing traffic coming from the host on which this rule is being configured. For example, if we want to drop transit packets destined to the address 8.8.8.8, the following rule can be used:

# ip rule add prohibit iif not lo to 8.8.8.8/32

IP routing

Another alternative method of filtration is to use routing policies. While it is true that this method works only for layer 3 and in the case of target addresses, this filtration method can be used simultaneously on many machines thanks to routing protocols like BGP. The simplest example of dropping the traffic directed to the 8.8.8.8 address is the below rule:

# ip route add blackhole 8.8.8.8

When talking about routing policies, it is worth mentioning the following kernel configuration parameter (which is often forgotten):

# sysctl net.ipv4.conf.all.rp_filter

If by default it is set to “1”, it checks the reverse path of every packet before moving it further to the routing stack.

So, if a packet from A.B.C.D address appears in the eth0 interface while the routing table says that a path to A.B.C.D address leads via eth1 interface (we have a case of asymmetric routing here), such a packet will be dropped, if the rp_filter parameter* (global or for that interface) is not equal to 0.

BGP Flow Spec

When describing the BGP routing protocol, it is compulsory to say a few words about BGP Flow Spec. This extension is described in RFC 5575 and is used for packet filtering. Linux (with a little help from FRR daemon) supports the above feature by translating the received NLRI to the rules saved in ipset/iptables. Currently, the following filtering features are supported:

  • Network source/destination (can be one or the other, or both)
  • Layer 4 information for UDP/TCP: source port, destination port, or any port
  • Layer 4 information for ICMP type and ICMP code
  • Layer 4 information for TCP Flags
  • Layer 3 information: DSCP value, Protocol type, packet length, fragmentation
  • Misc layer 4 TCP flags

Thanks to this approach it is possible to dynamically configure FW rules on many servers at the same time. More info can be found in the FRR documentation here      link-icon.

Dropping packets with QOS

It is not commonly known that it is also possible to perform packet filtration at the level of QOS filters. The tc filter command, responsible for classifying the traffic, is used in this case. It allows us to filter traffic on L3 and L4 statelessly. For example, to drop the GRE traffic (protocol 47) coming to the eth0 interface, the following commands can be used:

# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 parent ffff: protocol ip prio 1 u32 match ip protocol 47 0x47 action drop

Netem

When talking about packet-dropping and QOS, one must mention a queuing discipline, Netem, that is a separate category of QOS. In Netem it is possible to simulate problems with network topology by defining actions like:

  • loss
  • jitter
  • reorder

For example, if we want to emulate 3% losses on the output interface eth0, we can use the following command:

# tc qdisc add dev eth0 root netem loss 3%

By default netem works for outgoing traffic. However You can apply it also for incoming traffic as well with a little help of IFB (Intermediate Functional Block pseudo-device). Please check this web page for detailed example: How can I use netem on incoming traffic      link-icon

eBPF

In short, eBPF is a specific virtual machine that runs user-created programs attached to specific hooks in the kernel. Such programs are created/compiled at the userspace and injected into the kernel. They can classify and perform actions upon network packets.

Currently, there are four popular ways in which the user can attach eBPF bytecode into the running system in order to parse network traffic:

  • Using QOS action attach to: tc filter add dev eth0 ingress bpf object-file compiled_ebpf.o section simple direct-action
  • Using firewall module: iptables w/  -m bpf --bytecode (which is not of any interest to us, as the BPF would be used only for packet classification)
  • Using eXpress Data Path (xdp) - with a little help of the ip route command
  • Using the CLI tool named bpfilter (in early stage of development)

The second approach based on tc filter is explained well in Jans Erik’s blog      link-icon. We will concentrate on the XDP approach, as it has the best performance. To drop a UDP packet destined for 1234 port, we first need to compile the following BPF code:

#include <linux/bpf.h>
#include <linux/in.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/udp.h>
 
#define SEC(NAME) __attribute__((section(NAME), used))
 
#define htons(x) ((__be16)___constant_swab16((x)))
 
SEC("udp drop")
int udp_drop(struct xdp_md *ctx) {
int eth_off = 0;
 
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;
 
struct ethhdr *eth = data;
 
eth_off = sizeof(*eth);
 
struct iphdr *ip = data + eth_off;
struct udphdr *udph = data + eth_off + sizeof(struct iphdr);
 
eth_off += sizeof(struct iphdr);
if (data + eth_off > data_end) {
return XDP_PASS;
}
 
if (ip->protocol == IPPROTO_UDP && udph->dest == htons(1234)) {
return XDP_DROP;
}
 
return XDP_PASS;
}
 
char _license[] SEC("license") = "GPL";

With the following command:

$ clang -I/usr/include/x86_64-linux-gnu -O2 -target bpf -c udp_drop.c -o udp_drop.o

Then we can attach the bytecode to kernel using this CLI:

# ip link set dev eth0 xdp obj udp_drop.o sec .text

At the end we can stop the XDP program just by unloading it:

# ip link set dev eth0 xdp off

Note: code in this section was based on the Lorenzo Fontana udp-bpf example located here      link-icon.

DPDK 

DPDK is an alternative to eBPF - the idea is to process the packet by bypassing the kernel network stack entirely or partially. In DPDK packets are processed fully in userspace. While this might sound slower than processing in the kernel, it isn’t. Processing a packet in the kernel means passing all the chains and hooks described earlier. We don't always need that. If we just need to do the routing or routing and filtering, we skip QOS, RPF, l2 filter, stateful FW, etc. For example, we can use OVS in DPDK mode and configure only the things that we need - nothing else. In OVS (as soon as we set everything up) we can drop a packet with the following rule:

# ovs-ofctl add-flow br0 "table=0, in_port=eth0,tcp,nw_src=10.100.0.1,tp_dst=80, actions=drop"

From now on, for any packet incoming on port eth0 matching source IP 10.100.0.1, the TCP protocol and HTTP port will be dropped. Setting up OVS might be tricky but the performance increase is huge making this process worthwhile. To gain even more speed some NICs (especially smartNICs) can offload many OVS tasks and as a result drop packets before they reach the CPU.

An alternative to OVS is VPP - this is even faster in some scenarios so it might be worth taking a look – link here      link-icon.

Hardware filter

Dropping packets consumes resources. First, the packet must be received and memory allocated, then the packet must be matched to one or more rules, and after all that, a decision can be made to drop the frame. If we have spare CPU cycles, then that is fine. However, when dealing with 100 Gbit/s network cards, even a powerful CPU can be overwhelmed by malicious traffic such as DDoS attacks (if such traffic isn’t adequately filtered before reaching our endpoint). And even if this is not the case, we can always use those CPU cores for more productive tasks such as service hosting. In such cases we can use hardware for the task.

Intel

For Intel NIC/SmartNICs, we can use a feature called "Intel Flow Director". The list of compatible Intel NIC/SmartNIC cards is shown below:

  • 500 series - Niantic
  • 700 series - Fortville 
  • 800 series - Columbiaville

To check if a card is supported, issue the following  commands:

# ethtool -K eth0 ntuple on
# ethtool -k eth0 | grep ntuple
ntuple-filters: on

If those commands were successful then you can start creating filters with action drop:

# ethtool -U eth0 flow-type udp4 src-ip 10.0.0.0 m 0.0.0.255 dst-port 123 action -1 loc 42

Here we are creating a filter that will:

  • apply the rule on the eth0 interface
  • match only UDPv4 traffic
  • match traffic from 10.0.0.0/24 (note that an inverted mask is used here)
  • match traffic towards the NTP (123) port
  • apply action DROP (-1 means drop)
  • place the rule as the 42nd entry in the NIC card (entries are processed in numerical order just like in FW ACLs) 

We can confirm whether the above action was successful by using:

# ethtool -u eth0  
16 RX rings available
Total 1 rules
Filter: 42
       Rule Type: UDP over IPv4
       Src IP addr: 10.0.0.0 mask: 0.0.0.255
       Dest IP addr: 0.0.0.0 mask: 255.255.255.255
       TOS: 0x0 mask: 0xff
       Src port: 0 mask: 0xffff
       Dest port: 123 mask: 0x0
       Action: Drop

If it’s no longer needed, the rule can be removed with:

# ethtool -U eth0 delete 42

Nvidia

For Nvidia cards such as ConnectX-5/6/7 (Intel 800 series cards with the latest FW should be supported as well) we can use a different method called TC Flower which can be offloaded to  NIC hardware. The procedure here is similar to the QOS chapter:

# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 ingress protocol ip prio 42 flower ip_proto udp src_ip 10.0.0.0/24 dst_port 123 action drop

If everything went fine, the following command should show us the status along with some statistics:

# tc -s filter show dev eth0 ingress
filter protocol ip pref 1 u32 chain 0  
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1  
filter protocol ip pref 1 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid ??? not_in_hw  
 match 00070000/00470000 at 8
       action order 1: gact action drop
        random type none pass val 0
        index 1 ref 1 bind 1 installed 281 sec used 281 sec
       Action statistics:
       Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)  
       backlog 0b 0p requeues 0

filter protocol ip pref 42 flower chain 0  
filter protocol ip pref 42 flower chain 0 handle 0x1  
 eth_type ipv4
 ip_proto udp
 src_ip 10.0.0.0/24
 dst_port 123
 not_in_hw
       action order 1: gact action drop
        random type none pass val 0
        index 2 ref 1 bind 1 installed 153 sec used 153 sec
       Action statistics:
       Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)  
       backlog 0b 0p requeues 0

Note: in our case, the rule was not offloaded due to the lack of proper hardware. On supported NICs, one should expect to see the following in the above output:

[...]
skip_sw
in_hw
[...]

Since we are dealing with hardware here the number of rules is limited. However, this approach ensures no performance degradation and zero CPU usage related to network traffic filtering.

L7 Filtering

Kernel modules support packets (with a few exceptions like BFP, dedicated kernel modules, NFQUEUE action in iptables) in layers 2-4. In the case of layer 7, it is necessary to use dedicated user space applications. The most popular are SNORT and SQUID. SNORT IPS can work to analyze traffic, searching for known exploits in two modes: TAP and INLINE.

TAP mode:

Fig.4: TAP mode
TAP mode

In the TAP mode (Snort is only listening to the traffic), a packet can be dropped only by sending a “TCP RST” packet (for TCP session) or “ICMP Admin Prohibited” for the UDP one. However, this approach will not prevent the forbidden packet from being sent, but it should prevent further transmission.

INLINE Mode:

Fig.5: INLINE mode
INLINE Mode

For INLINE mode (Snort is transmitting the traffic in question) we have more flexibility and the following actions are possible:

  • drop - block and log the packet
  • reject - block the packet, log it, and then send a TCP reset if the protocol is TCP or an ICMP port unreachable message if the protocol is UDP
  • sdrop - block the packet but do not log it

You can find more about filtering rules in SNORT in its technical documentation here      link-icon.

SQUID (designed to forward/cache HTTP traffic only), on the other hand, allows web pages to be filtered, e.g. based on the host contained in HTTP headers or using SNI in the case of HTTPS protocol. For example, if an ACL containing domain names like *.yahoo.com *.google.com is defined:

acl access_to_search_engines dstdomain .yahoo.com .google.com
acl access_to_search_engines_ssl ssl::server_name .yahoo.com .google.com

and action DENY is assigned to them:

http_access deny access_to_search_engines
ssl_bump terminate access_to_search_engines_ssl

...then every client which is using that HTTP proxy service, while trying to open the google.com or yahoo.com webpage will encounter an error.

Of course, the list of apps to filter the traffic based on layer 7 is much longer

and contains such positions as:

  • mod_rewrite in apache configuration files
  • query_string in NGINX vhost config
  • ACLS in haproxy
Ebook SONiC

GPU

Sometimes you might want to filter packets by using a very long ACL (e.g. an ACL built from 10k rules) while dealing with  traffic containing a huge number of flows (>100k simultaneous sessions) at the same time. In such cases, OVS or NFT flowtable might not be enough as neither of them are suited for such scale. In the case of OVS/NFT you will overflow the hashing table and when you disable the hashing table/stateful FW, the length of the ACL will kill performance. You might consider offloading this task to a smartNIC or DPU but such devices are rare or expensive.

At Codilime, we tried to take a bit of a different approach to such problems and we decided to use a common Nvidia GPU. While this was a purely R&D project (not yet ready for a production environment) you still might find it helpful. More details along with the presentation can be found here      link-icon.

Summary

As you can see, there are quite a few alternatives to iptables allowing you to intentionally drop packets in Linux. All of them are built for specific purposes and I hope that you can use them in practice. What is missing here is the performance of these methods, but this is a topic for another blog post.


Original post date 08/08/2019, update date 08/07/24.

Kułagowski Adam

Adam Kułagowski

Principal Network Engineer

Adam is a seasoned Principal Network Engineer with nearly two decades of experience in the realm of networking. Passionate about the intricacies of data transmission, he constantly strives to optimize network performance, pushing the boundaries of speed and efficiency. With a strong foundation in networking...Read about author >

Read also

Get your project estimate

For businesses that need support in their software or network engineering projects, please fill in the form and we’ll get back to you within one business day.

For businesses that need support in their software or network engineering projects, please fill in the form and we’ll get back to you within one business day.

We guarantee 100% privacy.

Trusted by leaders:

Cisco Systems
Palo Alto Services
Equinix
Jupiter Networks
Nutanix