Agile C-states: A Core C-state Architecture for Latency Critical Applications Optimizing both Transition and Cold-Start Latency

Author:

Antoniou Georgia1ORCID,Bartolini Davide2ORCID,Volos Haris3ORCID,Kleanthous Marios3ORCID,Wang Zhe2ORCID,Kalaitzidis Kleovoulos2ORCID,Rollet Tom2ORCID,Li Ziwei4ORCID,Mutlu Onur5ORCID,Sazeides Yiannakis3ORCID,Haj Yahya Jawad2ORCID

Affiliation:

1. Computer Science, University of Cyprus, Nicosia, Cyprus

2. Computing Systems Lab, Zurich Research Center, Huawei Technologies Switzerland AG, Zurich Switzerland

3. Computer Science, University of Cyprus, Nicosia Cyprus

4. Huawei Technologies Co Ltd, Shenzhen, China

5. Information Technology and Electrical Engineering, ETH Zurich, Zurich Switzerland

Abstract

Latency critical applications running in modern datacenters exhibit irregular request arrival patterns and are implemented using multiple services with strict latency requirements (30 μs –250 μs ). These characteristics render existing energy saving idle CPU sleep states ineffective due to the performance overhead caused by the state’s transition latency. Besides the state transition latency, another important contributor to the performance overhead of sleep states is the cold-start latency, or in other words, the time required to warm-up microarchitectural state (e.g., cache contents, branch predictor metadata) that is flushed or discarded when transitioning to a lower-power state. Both the transition latency and cold-start latency can be particularly detrimental to the performance of latency critical applications with short execution times. While prior work focuses on mitigating the effects of transition and cold-start latency by optimizing request scheduling, in this work, we propose a redesign of the Core C-state architecture for latency-critical applications. In particular, we introduce C6Awarm a new Agile Core C-state that drastically reduces the performance overhead caused by idle sleep state transition latency and cold-start latency, while maintaining significant energy savings. C6Awarm achieves its goals by implementing 1) medium-grained power gating, 2) preserving the microarchitectural state of the core and 3) by keeping the clock generator and PLL active and locked. Our analysis for a set of microservices based on an Intel Skylake server, shows that C6Awarm manages to reduce the energy consumption by up to \(70\% \) with limited performance degradation (at-most \(2\% \) ).

Publisher

Association for Computing Machinery (ACM)

Reference99 articles.

1. Charbel J. Akl, Rafic A. Ayoubi, and Magdy A. Bayoumi. 2009. An effective staggered-phase damping technique for suppressing power-gating resonance noise during mode transition. In 2009 10th International Symposium on Quality Electronic Design. IEEE, Washington, DC, USA, 116–119. https://doi.org/10.1109/ISQED.2009.4810280

2. Hrishikesh Amur, Ripal Nathuji, Mrinmoy Ghosh, Karsten Schwan, and Hsien-Hsin Lee. 2008. Idlepower: Application-Aware Management of Processor Idle States. MMCS08: Workshop on Managed Many-Core Systems (2008).

3. Georgia Antoniou, Haris Volos, Davide B. Bartolini, Tom Rollet, Yiannakis Sazeides, and Jawad Haj Yahya. 2022. AgilePkgC: An Agile System Idle State Architecture for Energy Proportional Datacenter Servers. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE/ACM, Washington, DC, USA, 851–867. https://doi.org/10.1109/MICRO56248.2022.00065

4. Manish Arora, Srilatha Manne, Indrani Paul, Nuwan Jayasena, and Dean M. Tullsen. 2015. Understanding idle behavior and power gating mechanisms in the context of modern benchmarks on CPU-GPU Integrated systems. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). IEEE, Washington, DC, USA, 366–377. https://doi.org/10.1109/HPCA.2015.7056047

5. Esmail Asyabi, Azer Bestavros, Erfan Sharafzadeh, and Timothy Zhu. 2020. Peafowl: In-application CPU Scheduling to Reduce Power Consumption of In-memory Key-value Stores. In SoCC. ACM, New York, NY, USA.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3