Big data and extreme-scale computing

Author:

Asch M1,Moore T1,Badia R1,Beck M1,Beckman P1,Bidot T1,Bodin F1,Cappello F1,Choudhary A1,de Supinski B1,Deelman E1,Dongarra J1,Dubey A1,Fox G1,Fu H1,Girona S1,Gropp W1,Heroux M1,Ishikawa Y1,Keahey K1,Keyes D1,Kramer W1,Lavignon J-F1,Lu Y1,Matsuoka S1,Mohr B1,Reed D1,Requena S1,Saltz J1,Schulthess T1,Stevens R1,Swany M1,Szalay A1,Tang W1,Varoquaux G1,Vilotte J-P1,Wisniewski R1,Xu Z1,Zacharov I1

Affiliation:

1. Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, USA

Abstract

Over the past four years, the Big Data and Exascale Computing (BDEC) project organized a series of five international workshops that aimed to explore the ways in which the new forms of data-centric discovery introduced by the ongoing revolution in high-end data analysis (HDA) might be integrated with the established, simulation-centric paradigm of the high-performance computing (HPC) community. Based on those meetings, we argue that the rapid proliferation of digital data generators, the unprecedented growth in the volume and diversity of the data they generate, and the intense evolution of the methods for analyzing and using that data are radically reshaping the landscape of scientific computing. The most critical problems involve the logistics of wide-area, multistage workflows that will move back and forth across the computing continuum, between the multitude of distributed sensors, instruments and other devices at the networks edge, and the centralized resources of commercial clouds and HPC centers. We suggest that the prospects for the future integration of technological infrastructures and research ecosystems need to be considered at three different levels. First, we discuss the convergence of research applications and workflows that establish a research paradigm that combines both HPC and HDA, where ongoing progress is already motivating efforts at the other two levels. Second, we offer an account of some of the problems involved with creating a converged infrastructure for peripheral environments, that is, a shared infrastructure that can be deployed throughout the network in a scalable manner to meet the highly diverse requirements for processing, communication, and buffering/storage of massive data workflows of many different scientific domains. Third, we focus on some opportunities for software ecosystem convergence in big, logically centralized facilities that execute large-scale simulations and models and/or perform large-scale data analytics. We close by offering some conclusions and recommendations for future investment and policy review.

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Cited by 94 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Practicable live container migrations in high performance computing clouds: Diskless, iterative, and connection-persistent;Journal of Systems Architecture;2024-07

2. A survey of compute nodes with 100 TFLOPS and beyond for supercomputers;CCF Transactions on High Performance Computing;2024-05-23

3. Fox Prey Optimisation: A Novel Multi-Objective Approach for Congestion Control in Wired Computer Networks;2024 Conference on Information Communications Technology and Society (ICTAS);2024-03-07

4. Shaping the Future of Data Ecosystem Research—What Is Still Missing?;IEEE Access;2024

5. End-to-End Workflows for Climate Science: Integrating HPC Simulations, Big Data Processing, and Machine Learning;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3