APMT: an automatic hardware counter-based performance modeling tool for HPC applications-Reference-Cited by-同舟云学术

APMT: an automatic hardware counter-based performance modeling tool for HPC applications

Published:2020-06 Issue:2 Volume:2 Page:135-148
ISSN:2524-4922
Container-title:CCF Transactions on High Performance Computing
language:en
Short-container-title:CCF Trans. HPC

Author:

Ding Nan,Lee Victor W.,Xue Wei^ORCID,Zheng Weimin

Abstract

AbstractThe ever-growing complexity of HPC applications and the computer architectures cost more efforts than ever to learn application behaviors. In this paper, we propose the APMT, an Automatic Performance Modeling Tool, to understand and predict performance efficiently in the regimes of interest to developers and performance analysts while outperforming many traditional techniques. In APMT, we use hardware counter-assisted profiling to identify the key kernels and non-scalable kernels and build each kernel model according to our performance modeling framework. Meantime, we also provide an optional refinement modeling framework to further understand the key performance metric, cycles-per-instruction (CPI). Our evaluations show that by only performing a few small-scale profiling, APMT is able to keep the average error rate around 15% with average performance overheads of 3% in different scenarios, including NAS parallel benchmarks, dynamical core of atmosphere model of the Community Earth System Model (CESM), and the ice component of CESM on commodity clusters. APMT improve the model prediction accuracies by 25–52% in strong scaling tests comparing to the well-known analytical model and the empirical model.

Funder

National Key R&D Program of China

Center for High Performance Computing and System Simulation of Pilot National Laboratory for Marine Science and Technology (Qingdao).

Publisher

Springer Science and Business Media LLC

Subject

Community and Home Care

Link

https://link.springer.com/content/pdf/10.1007/s42514-020-00035-8.pdf

Reference49 articles.

1. Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: Hpctoolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2010)

2. Arenaz, M., Touriño, J., Doallo, R.: Xark: an extensible framework for automatic recognition of computational kernels. ACM Trans. Program Langu. Syst. (TOPLAS) 30(6), 32 (2008)

3. Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., et al. The landscape of parallel computing research: a view from berkeley. Technical report, Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)

4. Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., et al.: The nas parallel benchmarks. Int. J. High Perform. Comput. Appl. 5(3), 63–73 (1991))

5. Balaprakash, P., Tiwari, A., Wild, S.M., Carrington, L., Hovland, P.D.: Automomml: Automatic multi-objective modeling with machine learning. In International Conference on High Performance Computing, pp. 219–239 (2016)

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. HiRM: Hierarchical resource management for earth system models on many-core clusters;CCF Transactions on High Performance Computing;2024-01-05

2. Adaptive variable sampling model for performance analysis in high cache-performance computing environments;Heliyon;2023-06

3. Conquering Noise With Hardware Counters on HPC Systems;2022 IEEE/ACM Workshop on Programming and Performance Visualization Tools (ProTools);2022-11