One of the early design decisions made in OVN was to only support tunnel encapsulation protocols that provided the ability to include additional metadata beyond what fits in the VNI field of a VXLAN header. OVN mostly uses the Geneve protocol and only uses VXLAN for integration with TOR switches that support the hardware_vtep OVSDB schema to use as L2 gateways between logical and physical networks.
Many people wonder when they first learn of this design decision, “why not VXLAN?” In particular, what about performance? Some hardware has VXLAN offload capabilities. Are we going to suffer a performance hit when using Geneve?
These are very good questions, so I set off to come up with a good answer.
One of the key implementation details of OVN is Logical Flows. Instead of programming new features using OpenFlow, we primarily use Logical Flows. This makes feature development easier because we don’t have to worry about the physical location of resources on the network when writing flows. We are able to write flows as if the entire deployment was one giant switch instead of 10s, 100s, or 1000s of switches.
Part of the implementation of this is that in addition to passing a network ID over a tunnel, we also pass IDs for the logical source and destination ports. With Geneve, OVN will identify the network using the VNI field and will use an additional 32-bit TLV to specify both the source and destination logical ports.
Of course, by using an extensible protocol, we also have the capability to add more metadata for advanced features in the future.
More detail about OVN’s use of Geneve TLVs can be found in the “Tunnel Encapsulations” sub-section of “Design Decisions” in the OVN Architecture document.
Imagine a single UDP packet being sent between two VMs. The headers might look something like:
- Ethernet header
- IP header
- UDP header
- Application payload
When we encapsulate this packet in a tunnel, what gets sent over the physical network ends up looking like this:
- Outer Ethernet header
- Outer IP header
- Outer UDP header
- Geneve or VXLAN Header
- Application payload: (Inner packet from VM 1 to VM 2)
- Inner Ethernet header
- Inner IP header
- Inner UDP header
- Application payload
There are many more NIC capabilities than what’s discussed here, but I’ll focus on some key features related to tunnel performance.
Some offload capabilities are not actually VXLAN specific. For example, the commonly referred to “tx-udp_tnl-segmentation” offload applies to both VXLAN and Geneve. This is where the kernel is able to send a large amount of data to the NIC at once and the NIC breaks it up into TCP segments and then adds both the inner and outer headers. The performance boost comes from not having to do the same thing in software. This offload helps significantly with TCP throughput over a tunnel.
You can check to see if a NIC has support for “tx-udp_tnl-segmentation” with ethtool. For example, on a host that doesn’t support it:
$ ethtool -k eth0 | grep tnl-segmentation tx-udp_tnl-segmentation: off [fixed]
or on a host that does support it and has it enabled:
$ ethtool -k eth0 | grep tnl-segmentation tx-udp_tnl-segmentation: on
There is a type of offload that is VXLAN specific, and that is RSS (Receive Side Scaling). This is when the NIC is able to look inside a tunnel to identify the inner flows and efficiently distribute them among multiple receive queues (to be processed across multiple CPUs). Without this capability, a VXLAN tunnel looks like a single stream and will go into a single receive queue.
You may wonder, “does my NIC support VXLAN or Geneve RSS?” Unfortunately, there does not appear to be an easy way to check this with a command. The best method I’ve seen is to read the driver source code or dig through vendor documentation.
Since the VXLAN specific offload capability is on the receive side, it’s important to look at what other techniques can be used to improve receive side performance. One such option is RPS (Receive Packet Steering). RPS is the same concept as RSS, but done in software. Packets are distributed among CPUs in software before fully processing them.
Another optimization is that OVN enables UDP checksums on Geneve tunnels by default. Adding this checksum actually improves performance on the receive side. This is because of some more recent optimizations implemented in the kernel. When a Geneve packet is received, this outer UDP checksum will be verified by the NIC. This checksum verification will be reported to the kernel. Since the outer UDP checksum has been verified, the kernel uses this fact to skip having to calculate and verify any checksums of the inner packet. Without enabling the outer UDP checksum and letting the NIC verify it, the kernel is doing more checksum calculation in software. It’s expected that this regains significant performance on the receive side.
In the last section, we identified that there is an offload capability (RSS) that is VXLAN specific. Some NICs support RSS for VXLAN and Geneve, some for VXLAN only, and others don’t support it at all.
This raises an important question: On systems with NICs that do RSS for VXLAN only, can we match performance with Geneve?
On the surface, we expect Geneve performance to be worse. However, because of other optimizations, we really need to check to see how much RSS helps.
After some investigation of driver source code (Thanks, Lance Richardson!), we found that the following drivers had RSS support for VXLAN, but not Geneve.
- mlx4_en (Mellanox)
- mlx5_core (Mellanox)
- qlcnic (QLogic)
- be2net (HPE Emulex)
To help answer our question above, we did some testing on machines with one of these NICs.
The testing was done between two servers. Both had a Mellanox NIC using the mlx4_en driver. The NICs were connected back-to-back.
The servers had the following specs:
- HP Z220
- Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz (1 socket, 4 cores)
- Memory: 4096 MB
- Operating System: RHEL 7.3
- Kernel: 4.10.2-1.el7.elrepo.x86_64
- Used a newer kernel to ensure we had the latest optimizations available.
- Installed via a package from http://elrepo.org/tiki/kernel-ml
- OVS: openvswitch-2.6.1-4.1.git20161206.el7.x86_64
- From RDO
- tuned profile: throughput-performance
- Create two tunnels between the hosts: one VXLAN and one Geneve.
- With Geneve, add 1 TLV field to match the amount of additional metadata sent across the tunnel with OVN.
- Use pbench-uperf to run tests
- UDP (with different packet sizes, 64 and 1024 byte)
- Multiple concurrent streams (8 and 64)
- All tests are run 3 times. Results must be within 5% stddev or the 3 runs will be discarded and will run again. This ensures reasonably consistent and reliable results.
Summary of Results
- We reach line rate with both VXLAN and Geneve. Differences are observed in CPU consumption where we see Geneve consistently using less CPU.
|Average CPU Utilization Across Both Hosts|
|Scenario||VXLAN – Average CPU Utilization (Percent)||Geneve w/ UDP checksums and 1 TLV Field – Average CPU Utilization (Percent)||Average CPU Utilization Increase (Percent)|
|Average CPU Utilization Increase (Percent) Across All Scenarios||-3.96|
TCP and UDP Request/Response Rate (RR)
- We see higher CPU usage in these scenarios with Geneve, but an even higher relative amount of requests per second processed, leading us to conclude that Geneve is performing better overall in this case, as well.
|Request / Response Performance|
|Scenario||VXLAN – Requests per Second||Geneve w/ UDP checksums and 1 TLV Field – Requests Per Second||Percent Increase with Geneve|
|Average Percentage Increase with Geneve||12.82%|
|Average CPU Utilization Across Both Hosts|
|Scenario||VXLAN Average CPU Utilization (Percent)||Geneve w/ UDP checksums Average CPU Utilization (Percent)||Average CPU Utilization Increase (Percent)|
|Average CPU Utilization Increase (Percent) Across All Scenarios||2.28|
Using optimizations available in newer versions of the Linux kernel, we are seeing better performance with Geneve than VXLAN, despite this hardware having some VXLAN specific offload capabilities.
Based on these results, I feel that OVN’s reliance on Geneve as its standard tunneling protocol is acceptable. It provides additional capabilities while maintaining good performance, even on hardware that has VXLAN specific RSS support.
Adding general VXLAN support to OVN would not be trivial and would introduce a significant ongoing maintenance burden. Testing done so far does not justify that cost.