Affiliation:
1. Northeastern University, Boston, MA, USA
Abstract
Predicting the performance of an application running on parallel computing platforms is increasingly becoming important because of its influence on development time and resource management. However, predicting the performance with respect to parallel processes is complex for iterative and multi-stage applications. This research proposes a performance approximation approach
FiM
to predict the calculation time with FiM-Cal and communication time with FiM-Com of an application running on a distributed framework. FiM-Cal consists of two key components that are coupled with each other: (1) a Stochastic Markov Model to capture non-deterministic runtime that often depends on parallel resources, e.g., number of processes, and (2) a machine-learning model that extrapolates the parameters for calibrating our Markov model when we have changes in application parameters such as dataset. Along with the parallel calculation time, parallel computing platforms consume some data transfer time to communicate among different nodes. FiM-Com consists of a simulation queuing model to quickly estimate communication time. Our new modeling approach considers different design choices along multiple dimensions, namely (i) process-level parallelism, (ii) distribution of cores on multi-processor platform, (iii) application related parameters, and (iv) characteristics of datasets. The major contribution of our prediction approach is that FiM can provide an accurate prediction of parallel processing time for the datasets that have a much larger size than that of the training datasets. We evaluate our approach with NAS Parallel Benchmarks and real iterative data processing applications. We compare the predicted results (e.g., end-to-end execution time) with actual experimental measurements on a real distributed platform. We also compare our work with an existing prediction technique based on machine learning. We rank the number of processes according to the actual and predicted results from FiM and calculate the correlation between the actual and predicted rankings. Our results show that FiM obtains a high correlation in the range of 0.80 to 0.99, which indicates considerable accuracy of our technique. Such prediction provides data analysts a useful insight of optimal configuration of parallel resources (e.g., number of processes and number of cores) and also helps system designers to investigate the impact of changes in application parameters on system performance.
Funder
AFOSR
National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,Modeling and Simulation
Reference40 articles.
1. {n.d.}. Information Technology Services—Research Computing. Retrieved from https://www.northeastern.edu/rc/. {n.d.}. Information Technology Services—Research Computing. Retrieved from https://www.northeastern.edu/rc/.
2. {n.d.}. NASA Advanced Supercomputing Division NAS Parallel Benchmarks. Retrieved from http://www.nas.nasa.gov/publications/npb.html. {n.d.}. NASA Advanced Supercomputing Division NAS Parallel Benchmarks. Retrieved from http://www.nas.nasa.gov/publications/npb.html.
3. SimpleScalar: an infrastructure for computer system modeling
4. Performance Modeling: Understanding the Past and Predicting the Future
5. A regression-based approach to scalability prediction
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献