A High-Throughput, Resource-Efficient Implementation of the RoCEv2 Remote DMA Protocol and its Application

Author:

Schelten Niklas1ORCID,Steinert Fritjof2ORCID,Knapheide Justin3ORCID,Schulte Anton4ORCID,Stabernack Benno2ORCID

Affiliation:

1. Fraunhofer Institute for Telecommunications - Heinrich Hertz Institute (HHI), Einsteinufer, Berlin, Germany

2. Fraunhofer Institute for Telecommunications - HHI, Germany and Universityof Potsdam, Potsdam, Germany

3. University of Potsdam, Germany and Fraunhofer Institute for Telecommunications - HHI, Einsteinufer, Berlin, Germany

4. Fraunhofer Institute for Telecommunications - HHI, Einsteinufer, Berlin, Germany

Abstract

The use of application-specific accelerators in data centers has been the state of the art for at least a decade, starting with the availability of General Purpose GPUs achieving higher performance either overall or per watt. In most cases, these accelerators are coupled via PCIe interfaces to the corresponding hosts, which leads to disadvantages in interoperability, scalability and power consumption. As a viable alternative to PCIe-attached FPGA accelerators this paper proposes standalone FPGAs as Network-attached Accelerators (NAAs) . To enable reliable communication for decoupled FPGAs we present an RDMA over Converged Ethernet v2 (RoCEv2) communication stack for high-speed and low-latency data transfer integrated into a hardware framework. For NAAs to be used instead of PCIe coupled FPGAs the framework must provide similar throughput and latency with low resource usage. We show that our RoCEv2 stack is capable of achieving 100 Gb/s throughput with latencies of less than 4μs while using about 10% of the available resources on a mid-range FPGA. To evaluate the energy efficiency of our NAA architecture, we built a demonstrator with 8 NAAs for machine learning based image classification. Based on our measurements, network-attached FPGAs are a great alternative to the more energy-demanding PCIe-attached FPGA accelerators.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference35 articles.

1. 2017. IEEE Standard for Ethernet Amendment 10: Media Access Control Parameters Physical Layers and Management Parameters for 200 Gb/s and 400 Gb/s Operation.

2. IEEE Standard for Ethernet

3. 2021. Linux RDMA. (2021). https://github.com/linux-rdma/rdma-core.

4. 2021. NVIDIA Volta Unveiled: GV100 GPU and Tesla V100 accelerator announced. (2021). https://www.anandtech.com/show/11367/nvidia-volta-unveiled-gv100-gpu-and-tesla-v100-accelerator-announced.

5. An FPGA Platform for Hyperscalers

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Enabling Communication with FPGA-based Network-attached Accelerators for HPC Workloads;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12

2. FPGA-Based Network-Attached Accelerators – An Environmental Life Cycle Perspective;Architecture of Computing Systems;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3