Abstract
AbstractDuring the last years, the amount of data has skyrocketed. As a consequence, the data has become more expensive to store than to generate. The storage needs for astronomical data are also following this trend. Storage systems in Astronomy contain redundant copies of data such as identical files or within sub-file regions. We propose the use of the Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy. HD2FS is a deduplication storage system that was created to improve data storage capacity and efficiency in distributed file systems without compromising Input/Output performance. HD2FS can be developed by modifying existing storage system environments such as the Hadoop Distributed File System. By taking advantage of deduplication technology, we can better manage the underlying redundancy of data in astronomy and reduce the space needed to store these files in the file systems, thus allowing for more capacity per volume.
Publisher
Cambridge University Press (CUP)
Subject
Astronomy and Astrophysics,Space and Planetary Science
Reference3 articles.
1. Bartus, P. , & Arzuaga, E. 2017, Using file-aware deduplication to Improve capacity in storage systems., IEEE Colombian Conference on Communications and Computing (COLCOM), pages 1–6.
2. GDedup: Distributed File System Level Deduplication for Genomic Big Data
3. Bartus, P. 2018, Using Deduplication to Improve Storage Efficiency in Distributed File Systems, PhD Dissertation, University of Puerto Rico, Mayaguez Campus