Affiliation:
1. SKLCA, ICT, CAS, China; Graduate School, CAS, Beijing, China
2. Microsoft Research, Beijing, China
3. Ghent University, Belgium
4. INRIA, Saclay, France
5. SKLCA, ICT, CAS, Beijing, China
Abstract
Iterative optimization is a simple but powerful approach that searches the best possible combination of compiler optimizations for a given workload. However, iterative optimization is plagued by several practical issues that prevent it from being widely used in practice: a large number of runs are required to find the best combination, the optimum combination is dataset dependent, and the exploration process incurs significant overhead that needs to be compensated for by performance benefits. Therefore, although iterative optimization has been shown to have a significant performance potential, it seldom is used in production compilers.
In this article, we propose iterative optimization for the data center (IODC): we show that the data center offers a context in which all of the preceding hurdles can be overcome. The basic idea is to spawn different combinations across workers and recollect performance statistics at the master, which then evolves to the optimum combination of compiler optimizations. IODC carefully manages costs and benefits, and it is transparent to the end user. To bring IODC to practice, we evaluate it in the presence of co-runners to better reflect real-life data center operation with multiple applications co-running per server. We enhance IODC with the capability to find compatible co-runners along with a mechanism to dynamically adjust the level of aggressiveness to improve its robustness in the presence of co-running applications.
We evaluate IODC using both MapReduce and compute-intensive throughput server applications. To reflect the large number of users interacting with the system, we gather a very large collection of datasets (up to hundreds of millions of unique datasets per program), for a total storage of 16.4TB and 850 days of CPU time. We report an average performance improvement of 1.48 × and up to 2.08 × for five MapReduce applications, and 1.12 × and up to 1.39 × for nine server applications. Furthermore, our experiments demonstrate that IODC is effective in the presence of co-runners, improving performance by greater than 13% compared to the worst possible co-runner schedule.
Funder
China 1000-talents program
National High Technology Research andDevelopment Program of China
International Collaboration Key Program of CAS
China 10000-talents program
Google Faculty Research Award
European Research Council under the European Community's Seventh Framework Programme (FP7/2007-2013) / ERC
National Natural Science Foundation of China (NSFC)
National High Technology Research and Development Program of China
Strategic Priority Research Program of CAS
NSFC
National Basic Research Program of China
Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI)
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Cited by
34 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献