Author:
von Laszewski Gregor,Fleischer J. P.,Knuuti Robert,Fox Geoffrey C.,Kolessar Jake,Butler Thomas S.,Fox Judy
Abstract
MLCommons is an effort to develop and improve the artificial intelligence (AI) ecosystem through benchmarks, public data sets, and research. It consists of members from start-ups, leading companies, academics, and non-profits from around the world. The goal is to make machine learning better for everyone. In order to increase participation by others, educational institutions provide valuable opportunities for engagement. In this article, we identify numerous insights obtained from different viewpoints as part of efforts to utilize high-performance computing (HPC) big data systems in existing education while developing and conducting science benchmarks for earthquake prediction. As this activity was conducted across multiple educational efforts, we project if and how it is possible to make such efforts available on a wider scale. This includes the integration of sophisticated benchmarks into courses and research activities at universities, exposing the students and researchers to topics that are otherwise typically not sufficiently covered in current course curricula as we witnessed from our practical experience across multiple organizations. As such, we have outlined the many lessons we learned throughout these efforts, culminating in the need for benchmark carpentry for scientists using advanced computational resources. The article also presents the analysis of an earthquake prediction code benchmark while focusing on the accuracy of the results and not only on the runtime; notedly, this benchmark was created as a result of our lessons learned. Energy traces were produced throughout these benchmarks, which are vital to analyzing the power expenditure within HPC environments. Additionally, one of the insights is that in the short time of the project with limited student availability, the activity was only possible by utilizing a benchmark runtime pipeline while developing and using software to generate jobs from the permutation of hyperparameters automatically. It integrates a templated job management framework for executing tasks and experiments based on hyperparameters while leveraging hybrid compute resources available at different institutions. The software is part of a collection called cloudmesh with its newly developed components, cloudmesh-ee (experiment executor) and cloudmesh-cc (compute coordinator).
Reference41 articles.
1. “Hyperparameter search in machine learning,”
ClaesenM.
De MoorB.
MIC 2015: The XI Metaheuristics International Conference in Agadir2015
2. TOP 500 supercomputer sites;Dongarra;Supercomputer,1997
3. “The Green500: a ranking of the most energy-efficient supercomputers,”;Feng,2007
4. “Systems software and technology,”;Fincher,2019
5. Time series analysis of cryptocurrency prices using long short-term memory;Fleischer;Algorithms,2022