Data Handling Optimization in Russian Data Lake Prototype
-
Published:2023-02-01
Issue:1
Volume:2438
Page:012021
-
ISSN:1742-6588
-
Container-title:Journal of Physics: Conference Series
-
language:
-
Short-container-title:J. Phys.: Conf. Ser.
Author:
Alekseev A,Kiryanov A,Klimentov A,Korchuganova T,Mitsyn V,Oleynik D,Petrosyan A,Smirnov S,Zarochentsev A
Abstract
Abstract
CERN experiments are preparing for the HL-LHC era, which will bring an unprecedented volume of scientific data. These data will need to be stored and processed by thousands of physicists, but expected resource growth is nowhere near the extrapolated requirements of existing models, in terms of both storage volume and compute power. Opportunistic CPU resources such as HPCs and university clusters can provide extra CPU cycles, but there is no opportunistic storage. In this article, we will present the main architectural ideas, deployment details, and test results, with emphasis on our research to build a prototype of a distributed data processing and storage system with a focus on optimizing the efficiency of resources by reducing overhead costs for accessing the data. The described prototype was built using the geographically distributed WLCG sites and university clusters in Russia.
Subject
Computer Science Applications,History,Education