Affiliation:
1. Helsinki Institute of Physics and Aalto University, Finland
2. Helsinki Institute of Physics and University of Lausanne, Switzerland
3. Helsinki Institute of Physics, Aalto University and VTT Technical Research Centre of Finland
4. Beijing University of Posts and Telecommunications, China
Abstract
To improve energy efficiency and comply with the power budgets, it is important to be able to measure the power consumption of cloud computing servers. Intel’s Running Average Power Limit (RAPL) interface is a powerful tool for this purpose. RAPL provides power limiting features and accurate energy readings for CPUs and DRAM, which are easily accessible through different interfaces on large distributed computing systems. Since its introduction, RAPL has been used extensively in power measurement and modeling. However, the advantages and disadvantages of RAPL have not been well investigated yet. To fill this gap, we conduct a series of experiments to disclose the underlying strengths and weaknesses of the RAPL interface by using both customized microbenchmarks and three well-known application level benchmarks:
Stream
,
Stress-ng
, and
ParFullCMS
. Moreover, to make the analysis as realistic as possible, we leverage two production-level power measurement datasets from the
Taito
, a supercomputing cluster of the Finnish Center of Scientific Computing and also replicate our experiments on Amazon EC2. Our results illustrate different aspects of RAPL and document the findings through comprehensive analysis. Our observations reveal that RAPL readings are highly correlated with plug power, promisingly accurate enough, and have negligible performance overhead. Experimental results suggest RAPL can be a very useful tool to measure and monitor the energy consumption of servers without deploying any complex power meters. We also show that there are still some open issues, such as driver support, non-atomicity of register updates, and unpredictable timings that might weaken the usability of RAPL in certain scenarios. For such scenarios, we pinpoint solutions and workarounds.
Funder
Central Universities and National Natural Science Foundation of China
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture,Safety, Risk, Reliability and Quality,Media Technology,Information Systems,Software,Computer Science (miscellaneous)
Reference45 articles.
1. VMSTAT. Retrieved from http://www.linuxcommand.org/man_pages/vmstat8.html. VMSTAT. Retrieved from http://www.linuxcommand.org/man_pages/vmstat8.html.
2. Performance and power modeling in a multi-programmed multi-core environment
3. CSC. 2017. Taito supercluster. Retrieved from https://research.csc.fi/taito-supercluster. CSC. 2017. Taito supercluster. Retrieved from https://research.csc.fi/taito-supercluster.
4. RAPL
Cited by
95 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Dynamic-HDC: A Two-Stage Dynamic Inference Framework for Brain-Inspired Hyperdimensional Computing;IEEE Journal on Emerging and Selected Topics in Circuits and Systems;2023-12
2. DPS: Adaptive Power Management for Overprovisioned Systems;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2023-11-11
3. Structural Coding: A Low-Cost Scheme to Protect CNNs from Large-Granularity Memory Faults;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2023-11-11
4. Deep-Learning Model Extraction Through Software-Based Power Side-Channel;2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD);2023-10-28
5. Analyzing the Time x Energy Relation in C++ Solutions Mined from a Programming Contest Site;Proceedings of the XXVII Brazilian Symposium on Programming Languages;2023-09-25