Work stealing for interactive services to meet target latency-Reference-Cited by-同舟云学术

Work stealing for interactive services to meet target latency

Published:2016-11-09 Issue:8 Volume:51 Page:1-13
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Li Jing¹,Agrawal Kunal¹,Elnikety Sameh²,He Yuxiong²,Lee I-Ting Angelina¹,Lu Chenyang¹,McKinley Kathryn S.²

Affiliation:

1. Washington University in St. Louis

2. Microsoft Research

Abstract

Interactive web services increasingly drive critical business workloads such as search, advertising, games, shopping, and finance. Whereas optimizing parallel programs and distributed server systems have historically focused on average latency and throughput, the primary metric for interactive applications is instead consistent responsiveness, i.e., minimizing the number of requests that miss a target latency. This paper is the first to show how to generalize work-stealing, which is traditionally used to minimize the makespan of a single parallel job, to optimize for a target latency in interactive services with multiple parallel requests. We design a new adaptive work stealing policy, called tail-control , that reduces the number of requests that miss a target latency. It uses instantaneous request progress, system load, and a target latency to choose when to parallelize requests with stealing, when to admit new requests, and when to limit parallelism of large requests. We implement this approach in the Intel Thread Building Block (TBB) library and evaluate it on real-world workloads and synthetic workloads. The tail-control policy substantially reduces the number of requests exceeding the desired target latency and delivers up to 58% relative improvement over various baseline policies. This generalization of work stealing for multiple requests effectively optimizes the number of requests that complete within a target latency, a key metric for interactive services.

Funder

Office of Naval Research

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3016078.2851151

Reference52 articles.

1. Adaptive scheduling with parallelism feedback

2. K. Agrawal C. E. Leiserson Y. He and W. J. Hsu. Adaptive work-stealing with parallelism feedback. ACM Transactions on Computer Systems (TOCS) 26(3):7 2008. 10.1145/1394441.1394443 K. Agrawal C. E. Leiserson Y. He and W. J. Hsu. Adaptive work-stealing with parallelism feedback. ACM Transactions on Computer Systems (TOCS) 26(3):7 2008. 10.1145/1394441.1394443

3. Apache Lucene. http://lucene.apache.org/ 2014. Apache Lucene. http://lucene.apache.org/ 2014.

4. Thread Scheduling for Multiprogrammed Multiprocessors

5. The habanero multicore software research project

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Performance Prediction Based Workload Scheduling in Co-Located Cluster;Computer Modeling in Engineering & Sciences;2024

2. WoS-CoMS: Work Stealing-Based Congestion Management Scheme for SDN Programmable Networks;Journal of Network and Systems Management;2024-01

3. Kairos: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources;Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing;2023-08-07

4. WoS-CoMS : Work Stealing-based Congestion Management Scheme for SDN programmable networks;2023-08-02

5. Adaptive scheduling of multiprogrammed dynamic-multithreading applications;Journal of Parallel and Distributed Computing;2022-04