Packet capture is useful from a general debugging standpoint, and is useful in particular in debugging BPF programs that do packet processing. For general debugging, being able to initiate arbitrary packet capture from kprobes and tracepoints is highly valuable (e.g. what do the packets that reach kfree_skb() - representing error codepaths - look like?). Arbitrary packet capture is distinct...
Multipath TCP (MPTCP) is an increasingly popular protocol that members of the kernel community are actively working to upstream. A Linux kernel fork implementing the protocol has been developed and maintained since March 2009. While there are some large MPTCP deployments using this custom kernel, an upstream implementation will make the protocol available on Linux devices of all...
At Netconf 2019 we have presented a BPF-based alternative to steering
packets into sockets with iptables and TPROXY extension. A mechanism
which is of interest to us because it allows (1) services to share a
port number when their IP address ranges don't overlap, and (2) reverse
proxies to listen on all available port numbers.
The solution adds a new BPF program type BPF_INET_LOOKUP, which...
It is well known that batching can often improve software performance. This is
mainly because it utilizes the instruction cache in a more efficient way.
From the networking perspective, the size of driver's packet processing
pipeline is larger than the sizes of instruction caches. Even though NAPI
batches packets over the full stack and driver execution, they are processed
one by one by many...
Link Aggregation (LAG) is traditionally served by bonding driver. Linux bonding driver supports all LAG modes on almost any LAN drivers - in the software. However modern hardware features like SR-IOV-based virtualization and state full offloads such as RDMA are currently not well supported by this model. One of possible options to solve that is to implement LAG functionality entirely in NIC's...
With the advent of the the flow rule and flow block API, ethtool_rx, netfilter and tc can share the same infrastructure to represent hardware offloads.
This presentation discusses the reuse of the existing infrastructure originally implemented by tc, such as the netdev_ops->ndo_setup_tc() interface and the TC_SETUP_CLSFLOWER classifier.
The Linux kernel VxLan driver supports two ways of handling flooded traffic to multiple remote VxLan termination end points (VTEPS):
(a) Head end replication: where the VxLan driver sends a copy of the packet to each participating remote VTEPs
(b) Use of multicast routing to forward to participating remote VTEPs
(b) is generally preferred for both hardware and software VTEP deployments...
Linux has a nice SW bridge implementation which provides most of the classic
Ethernet switching features. DSA and SwitchDev frameworks allow us to
represent HW switch devices in Linux and potentially offload the SW forwarding
But the offloading facilities are not perfect, and there seem to be room for
- Limiting the flooding of L2-Multicast traffic. IGMP snooping...
IPv4's success story was in carrying unicast packets
Service sites still need IPv4 addresses for everything,
since the majority of Internet client nodes don't yet
have IPv6 addresses. IPv4 addresses now cost 15 to 20
dollars apiece (times the size of your network!) and
the price is rising.
The IPv4 address space includes hundreds of millions of
addresses reserved for obscure (the...
In this talk, we will present a scalable re-implementation of the Kubernetes service abstraction with the help of eBPF. We will discuss recent changes in the kernel which made the implementation possible, and some changes in the future which would simplify the implementation.
Kubernetes is an open-source container orchestration multi-component distributed system. It provides mechanisms for...
XDP (the eXpress Data Path) is a new method in Linux to process
packets at L2 and L3 with really high performance. XDP has already
been deployed for use cases involving ingress packet filtering, or
transmission back through the ingress interface, are already well
supported today. However, as we expand the use cases that involve the
XDP_REDIRECT action, e.g., to send packets to other devices,...
Providing encryption in dynamic environments where nodes are added and removed on-the-fly and services spin-up and are then torn-down frequently, such as Kubernetes, has numerous challenges. Cilium, an open source software package for providing and transparently securing network connectivity, leverages BPF and the Linux encryption capabilities to provide L3/L7 encryption and authentication at...
Cilium is an open source project which implements the Container Network
Interface (CNI) to provide networking and security functions in modern
application environments. The primary focus of the Cilium community recently
has been on scaling these functions to support thousands of nodes and hundreds
of thousands of containers. Such environments impose a high rate of churn as
containers and nodes...
Application workloads are becoming increasingly diverse in terms of their network resource requirements and performance characteristics. As opposed to long running monoliths deployed in virtual machines, containerized workloads can be as short lived as few seconds. Today, container orchestrators that schedule these workloads primarily consider their CPU and memory resource requirements since...
It goes without saying that XDP is wanted more and more by everyone. Of course, the Linux distributions want to bring to users what they want and need. Even better if it can be delivered in a polished package with as few surprises as possible: receiving bug reports stemming from users' misunderstanding and from their wrong expectations does not make good experience neither for the users nor...
Host Bandwidth Manager (HBM) is a BPF based framework for managing per-cgroupv2 egress and ingress bandwidths in order to provide a better experience to workloads/services coexisting within a host. In particular, HBM allows us to divide a host's egress and ingress bandwidth among workloads residing in different v2 cgroups. Note that although sample BPF programs are included in the BPF patches,...
Route entries in a FIB tend to be very redundant with respect to nexthop configuration with many routes using the same gateway, device and potentially encapsulations such as MPLS. The legacy API for inserting routes into the kernel requires the nexthop data to be included with each route specification leading to duplicate processing verifying the nexthop data, an effect that is magnified as...
Working for a networking hardware vendor can be an extremely rewarding experience for a kernel developer. The rate at which new features are accepted in the kernel also provides lots of motivation to develop new features that showcase hardware capabilities. This could be done by adding new support for dataplane offloads via cls flower, netfilter, or switchdev (if we still think it exists!). ...
Many Ethernet PHYs contain hardware to perform diagnostics of the
Ethernet cable. Breaks in the cable and shorts within a twisted pair
or to other pairs can be detected, and an estimate to the length along
the cable to the fault can be made. The talk will explain, at a high
level, how such diagnostics work, sending pulses down the cables and
looking for reflections. There is no standardization...