Prophet

Author:

Chen Quan1,Yang Hailong2,Guo Minyi1,Kannan Ram Srivatsa3,Mars Jason3,Tang Lingjia3

Affiliation:

1. Shanghai Jiao Tong University, Shanghai, China

2. Beihang University, Beijing, China

3. University of Michigan - Ann Arbor, Ann Arbor, USA

Abstract

Guaranteeing Quality-of-Service (QoS) of latency-sensitive applications while improving server utilization through application co-location is important yet challenging in modern datacenters. The key challenge is that when applications are co-located on a server, performance interference due to resource contention can be detrimental to the application QoS. Although prior work has proposed techniques to identify "safe" co-locations where application QoS is satisfied by predicting the performance interference on multicores, no such prediction technique on accelerators such as GPUs. In this work, we present Prophet, an approach to precisely predict the performance degradation of latency-sensitive applications on accelerators due to application co-location. We analyzed the performance interference on accelerators through a real system investigation and found that unlike on multicores where the key contentious resources are shared caches and main memory bandwidth, the key contentious resources on accelerators are instead processing elements, accelerator memory bandwidth and PCIe bandwidth. Based on this observation, we designed interference models that enable the precise prediction for processing element, accelerator memory bandwidth and PCIe bandwidth contention on real hardware. By using a novel technique to forecast solo-run execution traces of the co-located applications using interference models, Prophet can accurately predict the performance degradation of latency-sensitive applications on non-preemptive accelerators. Using Prophet, we can identify "safe" co-locations on accelerators to improve utilization without violating the QoS target. Our evaluation shows that Prophet can predict the performance degradation with an average prediction error 5.47% on real systems. Meanwhile, based on the prediction, Prophet achieves accelerator utilization improvements of 49.9% on average while maintaining the QoS target of latency-sensitive applications.

Funder

National Natural Science Foundation of China

National Science Foundation

National Basic Research 973 Program of China

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Reference51 articles.

1. Nvidia Multi-Process Service. https://docs.nvidia.com/deploy/pdf/CUDA\_Multi\_Process\_Service\_Overview.pdf. Nvidia Multi-Process Service. https://docs.nvidia.com/deploy/pdf/CUDA\_Multi\_Process\_Service\_Overview.pdf.

2. Profiler User's Guide. http://docs.nvidia.com/cuda/profiler-users-guide. Profiler User's Guide. http://docs.nvidia.com/cuda/profiler-users-guide.

3. The case for GPGPU spatial multitasking

4. QoS-aware dynamic resource allocation for spatial-multitasking GPUs

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. ODIN: Overcoming Dynamic Interference in iNference Pipelines;Euro-Par 2023: Parallel Processing;2023

2. WSMeter;Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems;2018-03-19

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3