Author:
Fu Shiyuan,Xu Qi,Cheng Yaodong,Chen Gang
Abstract
High energy physics (HEP) experiments, such as LHAASO, produce a large amount of data, which is usually stored and processed on distributed sites. Nowadays, the distributed data management system faces some challenges such as global file namespace, efficient data access and storage. Focusing on those problems, this paper proposed a cross-domain data access file system (CDFS), applying data deduplication and compression as the storage-optimized engine, aiming at dynamically building an aggregate view of multiple distributed storages and accessing data in a fast and efficient way. The test based on the raw data of LHAASO experiment showed that the CDFS could present a unique repository based on distributed sites in LHAASO. And the storage-optimized engine reduces the storage consumption of the raw data by more than 50%.