A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness

Author:

Cook Henry1,Moreto Miquel2,Bird Sarah1,Dao Khanh1,Patterson David A.1,Asanovic Krste1

Affiliation:

1. University of California, Berkeley

2. University of California, Berkeley and Universitat Politecnica de Catalunya, Jordi Girona, Barcelona, Spain

Abstract

Computing workloads often contain a mix of interactive, latency-sensitive foreground applications and recurring background computations. To guarantee responsiveness, interactive and batch applications are often run on disjoint sets of resources, but this incurs additional energy, power, and capital costs. In this paper, we evaluate the potential of hardware cache partitioning mechanisms and policies to improve efficiency by allowing background applications to run simultaneously with interactive foreground applications, while avoiding degradation in interactive responsiveness. We evaluate these tradeoffs using commercial x86 multicore hardware that supports cache partitioning, and find that real hardware measurements with full applications provide different observations than past simulation-based evaluations. Co-scheduling applications without LLC partitioning leads to a 10% energy improvement and average throughput improvement of 54% compared to running tasks separately, but can result in foreground performance degradation of up to 34% with an average of 6%. With optimal static LLC partitioning, the average energy improvement increases to 12% and the average throughput improvement to 60%, while the worst case slowdown is reduced noticeably to 7% with an average slowdown of only 2%. We also evaluate a practical low-overhead dynamic algorithm to control partition sizes, and are able to realize the potential performance guarantees of the optimal static approach, while increasing background throughput by an additional 19%.

Funder

Nvidia

Intel Corporation

Agència de Gestió d'Ajuts Universitaris i de Recerca

Mountain Equipment Co-op

University of California

Samsung

Nokia

Microsoft

Oracle

Ministerio de Economía y Competitividad

MEC/Fulbright Fellowship

Publisher

Association for Computing Machinery (ACM)

Reference37 articles.

1. Apple Inc. iOS App Programming Guide. http://developer.apple.com/library/ios/DOCUMENTATION/iPhone/Conceptual/iPhoneOSProgrammingGuide/iPhoneAppProgrammingGuide.pdf. Apple Inc. iOS App Programming Guide. http://developer.apple.com/library/ios/DOCUMENTATION/iPhone/Conceptual/iPhoneOSProgrammingGuide/iPhoneAppProgrammingGuide.pdf.

2. The DaCapo benchmarks

3. Predictable performance in SMT processors: synergy between the OS and SMTs

Cited by 16 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. TraceUpscaler: Upscaling Traces to Evaluate Systems at High Load;Proceedings of the Nineteenth European Conference on Computer Systems;2024-04-22

2. Running Serverless Function on Resource Fragments in Data Center;Lecture Notes in Computer Science;2024

3. Component-distinguishable Co-location and Resource Reclamation for High-throughput Computing;ACM Transactions on Computer Systems;2023-11-18

4. RAPID: Enabling fast online policy learning in dynamic public cloud environments;Neurocomputing;2023-11

5. An Evaluation of Time-triggered Scheduling in the Linux Kernel;The 31st International Conference on Real-Time Networks and Systems;2023-06-07

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3