Enabling the execution of HPC applications on public clouds with HPC@Cloud toolkit

Author:

Munhoz Vanderlei1,Castro Márcio1ORCID

Affiliation:

1. Department of Informatics and Statistics (INE), Distributed Systems Research Lab (LaPeSD) Federal University of Santa Catarina (UFSC) Florianópolis Brazil

Abstract

AbstractThe advent of cloud computing has made access to computing infrastructure available to millions of users that face resource constraints. In the context of high performance computing (HPC), public cloud resources have emerged as a cost‐effective alternative to expensive on‐premises clusters. However, there are several challenges and limitations in adopting this approach. This paper proposes HPC@Cloud , a provider‐agnostic open‐source software toolkit that facilitates the migration, testing, and execution of HPC applications in public clouds. The toolkit takes advantage of various fault tolerance technologies to enable the use of inexpensive transient cloud infrastructure, commonly known as “spot” instances. Also, it features integration with singularity containers, allowing users to run complex applications on virtual HPC clusters in a portable and reproducible way. Finally, it provides a data‐based empirical approach to estimating cloud infrastructure costs for HPC workloads. The results obtained on two public cloud providers (AWS and Vultr) show that: (i) HPC@Cloud can efficiently build virtual HPC clusters on the cloud; (ii) the new adaptive fault tolerance strategy outperforms other existing strategies based on blocking restoration; (iii) the integration of singularity containers into HPC@Cloud improves the portability of HPC applications to public clouds with negligible performance penalty to the applications; (iv) the proposed cost prediction approach can estimate the cost of running the applications on AWS and Vultr with up to 93% accuracy on average.

Funder

Amazon Web Services

Conselho Nacional de Desenvolvimento Científico e Tecnológico

Publisher

Wiley

Subject

Computational Theory and Mathematics,Computer Networks and Communications,Computer Science Applications,Theoretical Computer Science,Software

Reference29 articles.

1. The NIST definition of cloud computing

2. A Manifesto for Future Generation Cloud Computing

3. Gartner.Gartner Says Worldwide IaaS Public Cloud Services Market Grew 41.4% in 2021;2022.https://www.gartner.com/en/newsroom/press‐releases/2022‐06‐02‐gartner‐says‐worldwide‐iaas‐public‐cloud‐services‐market‐grew‐41‐percent‐in‐2021

4. HPC Cloud for Scientific and Business Applications

5. HPC@Cloud: A Provider-Agnostic Software Framework for Enabling HPC in Public Cloud Platforms

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3