SplitRPC: A {Control + Data} Path Splitting RPC Stack for ML Inference Serving-Reference-Cited by-同舟云学术

SplitRPC: A {Control + Data} Path Splitting RPC Stack for ML Inference Serving

Published:2023-06-26 Issue:1 Volume:51 Page:13-14
ISSN:0163-5999
Container-title:ACM SIGMETRICS Performance Evaluation Review
language:en
Short-container-title:SIGMETRICS Perform. Eval. Rev.

Author:

Kumar Adithya¹^ORCID,Sivasubramaniam Anand²^ORCID,Zhu Timothy¹^ORCID

Affiliation:

1. The Pennsylvania State University, University Park, PA, USA

2. Penn State University, University Park, PA, USA

Abstract

The growing adoption of hardware accelerators driven by their intelligent compiler and runtime system counterparts has democratized ML services and precipitously reduced their execution times. This motivates us to shift our attention to characterize the overheads imposed by the RPC mechanism (`RPC tax') when serving them on accelerators. Conventional RPC implementations implicitly assume the host CPU services the requests, and we focus on expanding such works towards accelerator-based services. While SmartNIC based solutions work well for simple applications, serving complex ML models requires a more nuanced view to optimize both the data-path and the control/orchestration of these accelerators. We program commodity network interface cards (NICs) to split the control and data paths for effective transfer of control while efficiently transferring the payload to the accelerator. As opposed to unified approaches that bundle these paths together, limiting the flexibility in each of these paths, we design and implement SplitRPC - a {control + data} path optimizing RPC mechanism for ML inference serving. SplitRPC allows us to optimize the datapath to the accelerator while simultaneously allowing the CPU to maintain full orchestration capabilities. We implement SplitRPC on both commodity NICs and SmartNICs and demonstrate that SplitRPC is effective in minimizing the RPC tax while providing significant gains in throughput and latency.

Funder

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3606376.3593571

Reference6 articles.

1. Tianqi Chen Thierry Moreau Ziheng Jiang Lianmin Zheng Eddie Yan Haichen Shen Meghan Cowan Leyuan Wang Yuwei Hu Luis Ceze etal 2018. {TVM}: An automated end-to-end optimizing compiler for deep learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). USENIX Association Boston MA 578--594. Tianqi Chen Thierry Moreau Ziheng Jiang Lianmin Zheng Eddie Yan Haichen Shen Meghan Cowan Leyuan Wang Yuwei Hu Luis Ceze et al. 2018. {TVM}: An automated end-to-end optimizing compiler for deep learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). USENIX Association Boston MA 578--594.

2. Daniel Crankshaw , Xin Wang , Guilio Zhou , Michael J Franklin , Joseph E Gonzalez , and Ion Stoica . 2017 . Clipper: A low-latency online prediction serving system . In Proceedings of the Conference on Networked Systems Design and Implementation (NSDI). USENIX Association , Boston, MA, USA, 613--627. Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J Franklin, Joseph E Gonzalez, and Ion Stoica. 2017. Clipper: A low-latency online prediction serving system. In Proceedings of the Conference on Networked Systems Design and Implementation (NSDI). USENIX Association, Boston, MA, USA, 613--627.

3. Google. 2018. GRPC Framework . https://grpc.io/. [Online ; accessed 17- Apr- 2022 ]. Google. 2018. GRPC Framework. https://grpc.io/. [Online; accessed 17-Apr-2022].

4. Anuj Kalia , Michael Kaminsky , and David Andersen . 2019 . Datacenter {RPCs} can be General and Fast . In Proceedings of the Conference on Networked Systems Design and Implementation (NSDI). USENIX Association , Boston, MA, USA, 1--16. Anuj Kalia, Michael Kaminsky, and David Andersen. 2019. Datacenter {RPCs} can be General and Fast. In Proceedings of the Conference on Networked Systems Design and Implementation (NSDI). USENIX Association, Boston, MA, USA, 1--16.

5. NVIDIA. 2022. CUDA GPUDirect RDMA. https://docs.nvidia.com/cuda/gpudirect-rdma/index.html. [Online ; accessed 17- Apr- 2022 ]. NVIDIA. 2022. CUDA GPUDirect RDMA. https://docs.nvidia.com/cuda/gpudirect-rdma/index.html. [Online; accessed 17-Apr-2022].