Affiliation:
1. Facebook, Inc.
2. University of Michigan
Abstract
Data center power is a scarce resource that often goes underutilized due to conservative planning. This is because the penalty for overloading the data center power delivery hierarchy and tripping a circuit breaker is very high, potentially causing long service outages. Recently, dynamic server power capping, which limits the amount of power consumed by a server, has been proposed and studied as a way to reduce this penalty, enabling more aggressive utilization of provisioned data center power. However, no real at-scale solution for data center-wide power monitoring and control has been presented in the literature.
In this paper, we describe Dynamo -- a data center-wide power management system that monitors the entire power hierarchy and makes coordinated control decisions to safely and efficiently use provisioned data center power. Dynamo has been developed and deployed across all of Facebook's data centers for the past three years. Our key insight is that in real-world data centers, different power and performance constraints at different levels in the power hierarchy necessitate coordinated data center-wide power management.
We make three main contributions. First, to understand the design space of Dynamo, we provide a characterization of power variation in data centers running a diverse set of modern workloads. This characterization uses fine-grained power samples from tens of thousands of servers and spanning a period of over six months. Second, we present the detailed design of Dynamo. Our design addresses several key issues not addressed by previous simulation-based studies. Third, the proposed techniques and design have been deployed and evaluated in large scale data centers serving billions of users. We present production results showing that Dynamo has prevented 18 potential power outages in the past 6 months due to unexpected power surges; that Dynamo enables optimizations leading to a 13% performance boost for a production Hadoop cluster and a nearly 40% performance increase for a search cluster; and that Dynamo has already enabled an 8% increase in the power capacity utilization of one of our data centers with more aggressive power subscription measures underway.
Publisher
Association for Computing Machinery (ACM)
Reference34 articles.
1. Power provisioning for a warehouse-sized computer
2. J. Hamilton "Internet-scale Service Infrastructure Efficiency " ISCA Keynote 2009. 10.1145/1555754.1555756 J. Hamilton "Internet-scale Service Infrastructure Efficiency " ISCA Keynote 2009. 10.1145/1555754.1555756
3. Ensemble-level Power Management for Dense Blade Servers
Cited by
22 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Can Storage Devices be Power Adaptive?;Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems;2024-07-08
2. Impact of power consumption in containerized clouds: A comprehensive analysis of open-source power measurement tools;Computer Networks;2024-05
3. Expanding Datacenter Capacity with DVFS Boosting: A safe and scalable deployment experience;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1;2024-04-17
4. An End-to-End HPC Framework for Dynamic Power Objectives;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12
5. DCMigrationALG: A Power-Aware Data Center Container Migration Algorithm;Journal of Physics: Conference Series;2023-08-01