Low-latency remote-offloading system for accelerator-Reference-Cited by-同舟云学术

Low-latency remote-offloading system for accelerator

Published:2023-11-03 Issue: Volume: Page:
ISSN:0003-4347
Container-title:Annals of Telecommunications
language:en
Short-container-title:Ann. Telecommun.

Author:

Saito Shogo^ORCID,Fujimoto Kei^ORCID,Shiraga Akinori^ORCID

Abstract

AbstractSpecific workloads are increasingly offloaded to accelerators such as a graphic processing unit (GPU) and field-programmable gate array (FPGA) for real-time processing and computing efficiency. Because accelerators are expensive and consume much power, it is desirable to increase the efficiency of accelerator utilization by sharing accelerators among multiple servers over a network. However, task offloading over a network has the problem of latency due to network processing overhead in remote offloading. This paper proposes a low-latency system for accelerator offloading over a network. To reduce the overhead of remote offloading, we propose a system composed of (1) fast recombination processing of chunked data with a simple protocol to reduce the number of memory copies, (2) polling-based packet receiving check to reduce overhead due to interrupts in interaction with a network interface card, and (3) a run-to-completion model in network processing and accelerator offloading to reduce overhead with context switching. We show that the system can improve performance by 66.40% compared with a simple implementation using kernel protocol stack and confirmed the performance improvement with a virtual radio access network use case as a low-latency application. Furthermore, we show that this performance can also be achieved in practical usage in data center networks.

Publisher

Springer Science and Business Media LLC

Subject

Electrical and Electronic Engineering

Link

https://link.springer.com/content/pdf/10.1007/s12243-023-00994-3.pdf

Reference37 articles.

1. Theis TN, Wong H-SP (2017) The end of Moore’s law: a new beginning for information technology. Comput Sci Eng 19(2):41–50

2. Dally WJ, Turakhia Y, Han S (2020) Domain-specific hardware accelerators. Commun ACM 63(7):48–57

3. 3GPP TS (2016) 36.302: Evolved Universal Terrestrial Radio Access (E-UTRA);Services provided by the physical layer

4. Parvez I, Rahmati A, Guvenc I, Sarwat AI, Dai H (2018) A survey on low latency towards 5G: RAN, core network and caching solutions. IEEE Commun Surv Tutor 20(4):3098–3130

5. Foukas X, Radunovic B (2021) Concordia: teaching the 5G vRAN to share compute. In: Proceedings of the ACM SIGCOMM Conference, pp 580–596

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Power Saving for Hardware Accelerated Applications With Dynamical Processor Switching;IEEE Access;2024