Affiliation:
1. Shanghai Jiao Tong University, Shanghai, China
2. Beihang University, Beijing, China
3. University of Michigan - Ann Arbor, Ann Arbor, USA
Abstract
Guaranteeing Quality-of-Service (QoS) of latency-sensitive applications while improving server utilization through application co-location is important yet challenging in modern datacenters. The key challenge is that when applications are co-located on a server, performance interference due to resource contention can be detrimental to the application QoS. Although prior work has proposed techniques to identify "safe" co-locations where application QoS is satisfied by predicting the performance interference on multicores, no such prediction technique on accelerators such as GPUs.
In this work, we present Prophet, an approach to precisely predict the performance degradation of latency-sensitive applications on accelerators due to application co-location. We analyzed the performance interference on accelerators through a real system investigation and found that unlike on multicores where the key contentious resources are shared caches and main memory bandwidth, the key contentious resources on accelerators are instead processing elements, accelerator memory bandwidth and PCIe bandwidth. Based on this observation, we designed interference models that enable the precise prediction for processing element, accelerator memory bandwidth and PCIe bandwidth contention on real hardware. By using a novel technique to forecast solo-run execution traces of the co-located applications using interference models, Prophet can accurately predict the performance degradation of latency-sensitive applications on non-preemptive accelerators. Using Prophet, we can identify "safe" co-locations on accelerators to improve utilization without violating the QoS target. Our evaluation shows that Prophet can predict the performance degradation with an average prediction error 5.47% on real systems. Meanwhile, based on the prediction, Prophet achieves accelerator utilization improvements of 49.9% on average while maintaining the QoS target of latency-sensitive applications.
Funder
National Natural Science Foundation of China
National Science Foundation
National Basic Research 973 Program of China
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference51 articles.
1. Nvidia Multi-Process Service. https://docs.nvidia.com/deploy/pdf/CUDA\_Multi\_Process\_Service\_Overview.pdf. Nvidia Multi-Process Service. https://docs.nvidia.com/deploy/pdf/CUDA\_Multi\_Process\_Service\_Overview.pdf.
2. Profiler User's Guide. http://docs.nvidia.com/cuda/profiler-users-guide. Profiler User's Guide. http://docs.nvidia.com/cuda/profiler-users-guide.
3. The case for GPGPU spatial multitasking
4. QoS-aware dynamic resource allocation for spatial-multitasking GPUs
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. ODIN: Overcoming Dynamic Interference in iNference Pipelines;Euro-Par 2023: Parallel Processing;2023
2. WSMeter;Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems;2018-03-19