A High-Throughput, Resource-Efficient Implementation of the RoCEv2 Remote DMA Protocol and its Application-Reference-Cited by-同舟云学术

A High-Throughput, Resource-Efficient Implementation of the RoCEv2 Remote DMA Protocol and its Application

Published:2022-12-22 Issue:1 Volume:16 Page:1-23
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Schelten Niklas¹^ORCID,Steinert Fritjof²^ORCID,Knapheide Justin³^ORCID,Schulte Anton⁴^ORCID,Stabernack Benno²^ORCID

Affiliation:

1. Fraunhofer Institute for Telecommunications - Heinrich Hertz Institute (HHI), Einsteinufer, Berlin, Germany

2. Fraunhofer Institute for Telecommunications - HHI, Germany and Universityof Potsdam, Potsdam, Germany

3. University of Potsdam, Germany and Fraunhofer Institute for Telecommunications - HHI, Einsteinufer, Berlin, Germany

4. Fraunhofer Institute for Telecommunications - HHI, Einsteinufer, Berlin, Germany

Abstract

The use of application-specific accelerators in data centers has been the state of the art for at least a decade, starting with the availability of General Purpose GPUs achieving higher performance either overall or per watt. In most cases, these accelerators are coupled via PCIe interfaces to the corresponding hosts, which leads to disadvantages in interoperability, scalability and power consumption. As a viable alternative to PCIe-attached FPGA accelerators this paper proposes standalone FPGAs as Network-attached Accelerators (NAAs) . To enable reliable communication for decoupled FPGAs we present an RDMA over Converged Ethernet v2 (RoCEv2) communication stack for high-speed and low-latency data transfer integrated into a hardware framework. For NAAs to be used instead of PCIe coupled FPGAs the framework must provide similar throughput and latency with low resource usage. We show that our RoCEv2 stack is capable of achieving 100 Gb/s throughput with latencies of less than 4μs while using about 10% of the available resources on a mid-range FPGA. To evaluate the energy efficiency of our NAA architecture, we built a demonstrator with 8 NAAs for machine learning based image classification. Based on our measurements, network-attached FPGAs are a great alternative to the more energy-demanding PCIe-attached FPGA accelerators.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3543176

Reference35 articles.

1. 2017. IEEE Standard for Ethernet Amendment 10: Media Access Control Parameters Physical Layers and Management Parameters for 200 Gb/s and 400 Gb/s Operation.

2. IEEE Standard for Ethernet

3. 2021. Linux RDMA. (2021). https://github.com/linux-rdma/rdma-core.

4. 2021. NVIDIA Volta Unveiled: GV100 GPU and Tesla V100 accelerator announced. (2021). https://www.anandtech.com/show/11367/nvidia-volta-unveiled-gv100-gpu-and-tesla-v100-accelerator-announced.

5. An FPGA Platform for Hyperscalers

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enabling Communication with FPGA-based Network-attached Accelerators for HPC Workloads;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12

2. FPGA-Based Network-Attached Accelerators – An Environmental Life Cycle Perspective;Architecture of Computing Systems;2023