Accelerating science: The usage of commercial clouds in ATLAS Distributed Computing

Author:

Barreiro Megino Fernando,Borodin Mikhail,De Kaushik,Elmsheuser Johannes,Di Girolamo Alessandro,Hartmann Nikolai,Heinrich Lukas,Klimentov Alexei,Lassnig Mario,Lin FaHui,Maeno Tadashi,Marshall Zachary,Merino Gonzalo,Nilsson Paul,Sandesara Jay,Serfon Cedric,South David,Singh Harinder

Abstract

The ATLAS experiment at CERN is one of the largest scientific machines built to date and will have ever growing computing needs as the Large Hadron Collider collects an increasingly larger volume of data over the next 20 years. ATLAS is conducting R&D projects on Amazon Web Services and Google Cloud as complementary resources for distributed computing, focusing on some of the key features of commercial clouds: lightweight operation, elasticity and availability of multiple chip architectures. The proof of concept phases have concluded with the cloud-native, vendoragnostic integration with the experiment’s data and workload management frameworks. Google Cloud has been used to evaluate elastic batch computing, ramping up ephemeral clusters of up to O(100k) cores to process tasks requiring quick turnaround. Amazon Web Services has been exploited for the successful physics validation of the Athena simulation software on ARM processors. We have also set up an interactive facility for physics analysis allowing endusers to spin up private, on-demand clusters for parallel computing with up to 4 000 cores, or run GPU enabled notebooks and jobs for machine learning applications. The success of the proof of concept phases has led to the extension of the Google Cloud project, where ATLAS will study the total cost of ownership of a production cloud site during 15 months with 10k cores on average, fully integrated with distributed grid computing resources and continue the R&D projects.

Publisher

EDP Sciences

Reference22 articles.

1. Worldwide LHC Computing Grid, URL, http://cern.ch/lcg [accessed 2023-06-07]

2. Elmsheuser J. et al., Seamless integration of commercial clouds with ATLAS Distributed Computing, https://doi.org/10.1051/epjconf/202125102005

3. Barisits M. et al., Rucio - Scientific data management, Comput. Softw. Big Sci. 3 (2019) no.1, 11

4. Lassnig M. et al., Extending Rucio with modern cloud storage support: Experiences from ATLAS, SKA and ESCAPE, Proc. CHEP Conf. (2023) - in these proceedings

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3