Treadmill-Reference-Cited by-同舟云学术

Treadmill

Published:2016-10-12 Issue:3 Volume:44 Page:456-468
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Zhang Yunqi¹,Meisner David²,Mars Jason¹,Tang Lingjia¹

Affiliation:

1. University of Michigan

2. Facebook Inc.

Abstract

Managing tail latency of requests has become one of the primary challenges for large-scale Internet services. Data centers are quickly evolving and service operators frequently desire to make changes to the deployed software and production hardware configurations. Such changes demand a confident understanding of the impact on one's service, in particular its effect on tail latency (e.g., 95th- or 99th-percentile response latency of the service). Evaluating the impact on the tail is challenging because of its inherent variability. Existing tools and methodologies for measuring these effects suffer from a number of deficiencies including poor load tester design, statistically inaccurate aggregation, and improper attribution of effects. As shown in the paper, these pitfalls can often result in misleading conclusions. In this paper, we develop a methodology for statistically rigorous performance evaluation and performance factor attribution for server workloads. First, we find that careful design of the server load tester can ensure high quality performance evaluation, and empirically demonstrate the inaccuracy of load testers in previous work. Learning from the design flaws in prior work, we design and develop a modular load tester platform, Treadmill, that overcomes pitfalls of existing tools. Next, utilizing Treadmill, we construct measurement and analysis procedures that can properly attribute performance factors. We rely on statistically-sound performance evaluation and quantile regression, extending it to accommodate the idiosyncrasies of server systems. Finally, we use our augmented methodology to evaluate the impact of common server hardware features with Facebook production workloads on production hardware. We decompose the effects of these features on request tail latency and demonstrate that our evaluation methodology provides superior results, particularly in capturing complicated and counter-intuitive performance behaviors. By tuning the hardware features as suggested by the attribution, we reduce the 99th-percentile latency by 43% and its variance by 93%.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3007787.3001186

Reference55 articles.

1. Power management of online data-intensive services

2. The tail at scale

3. Tales of the Tail

4. Benchmarking cloud serving systems with YCSB

5. Power provisioning for a warehouse-sized computer

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference With Spatio-Temporal Sharing;IEEE Transactions on Parallel and Distributed Systems;2024-10

2. DeInfer: A GPU resource allocation algorithm with spatial sharing for near-deterministic inferring tasks;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12

3. Fast, Light-weight, and Accurate Performance Evaluation using Representative Datacenter Behaviors;Proceedings of the 24th International Middleware Conference on ZZZ;2023-11-27

4. Turbo: SmartNIC-enabled Dynamic Load Balancing of µs-scale RPCs;2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2023-02

5. Adrias: Interference-Aware Memory Orchestration for Disaggregated Cloud Infrastructures;2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2023-02