NVIDIA BlueField in ‘New World Record’ for DPU Performance

In the company’s recent testing, its BlueField-2 data processing units reached 41.5 million input/output operations per second (IOPS) - more than 4x more IOPS than any other DPU.

NVIDIA has taken advantage of the quiet news days to reveal its BlueField-2 data processing units (DPU) has set a “new world record”.

In the company’s recent testing, BlueField-2 reached 41.5 million input/output operations per second (IOPS) – more than 4x more IOPS than any other DPU.

The BlueField-2 DPU used standard networking protocols and open-source software. It reached more than five million 4KB IOPS and from seven million to over 20 million 512B IOPS for NVMe over Fabrics (NVMe-oF), a common method of accessing storage media, with TCP networking, one of the primary internet protocols.

With the storage landscape on its mind, NVIDIA explains that the vast majority of cloud and enterprise deployments require fast, distributed and networked flash storage, accessed over Ethernet or InfiniBand. Faster servers, GPUs, networks and storage media all tax server CPUs to keep up, so the firm reckons the best way to do so is to deploy storage-capable DPUs.

The potential from this testing is that BlueField-2 DPU enables a “higher performance” across the data centre for both application servers and storage appliances.

In addition, BlueField also supports hardware-accelerated encryption and decryption of both Ethernet storage traffic and the storage media itself, “helping protect against data theft or exfiltration”.

NVIDIA offers a lot more info on the testing methodology.

For example, BlueField’s performance was done as both an initiator and target, using different types of storage software libraries and different workloads to simulate real-world storage configurations.

The 41.5 million IOPS reached by BlueField was achieved by connecting two Hewlett Packard Enterprise Proliant DL380 Gen 10 Plus servers, one as the application server (storage initiator) and one as the storage system (storage target).

Each server had two Intel “Ice Lake” Xeon Platinum 8380 CPUs clocked at 2.3GHz, giving 160 hyperthreaded cores per server, along with 512GB of DRAM, 120MB of L3 cache (60MB per socket) and a PCIe Gen4 bus.

To “accelerate” networking and NVMe-oF, each server was configured with two NVIDIA BlueField-2 P-series DPU cards, each with two 100Gb Ethernet network ports, resulting in four network ports and 400Gb/s wire bandwidth between initiator and target, connected back-to-back using NVIDIA LinkX 100GbE Direct-Attach Copper (DAC) passive cables. Both servers had Red Hat Enterprise Linux (RHEL) version 8.3.

For the storage system software, both SPDK and the standard upstream Linux kernel target were tested using both the default kernel 4.18 and one of the newest kernels, 5.15. Three different storage initiators were benchmarked: SPDK, the standard kernel storage initiator, and the FIO plugin for SPDK. Workload generation and measurements were run with FIO and SPDK. I/O sizes were tested using 4KB and 512B, which are common medium and small storage I/O sizes, respectively.

The NVMe-oF storage protocol was tested with both TCP and RoCE at the network transport layer. Each configuration was tested with 100% read, 100% write and 50/50 read/write workloads with full bidirectional network utilisation.

The company adds that customers and security software vendors are using BlueField’s recently updated NVIDIA DOCA framework to run cybersecurity applications – such as a distributed firewall or security groups with micro-segmentation – on the DPU to “improve” application and network security for compute servers.

Image courtesy of NVIDIA.

Antony Peyton
Antony Peyton
Antony Peyton is the Editor of eWeek UK. He has 18 years' journalism and writing experience. His career has taken him to China, Japan and the UK - covering tech, fintech and business. Follow on Twitter @TonyFintech.

Popular Articles