Abstract
Lossy compression has become essential an important technique to reduce data size in many domains. This type of compression is especially valuable for large-scale scientific data, whose size ranges up to several petabytes. Although Autoencoder-based models have been successfully leveraged to compress images and videos, such neural networks have not widely gained attention in the scientific data domain. Our work presents a neural network that not only significantly compresses large-scale scientific data, but also maintains high reconstruction quality. The proposed model is tested with scientific benchmark data available publicly and applied to a large-scale high-resolution climate modeling data set. Our model achieves a compression ratio of 140 on several benchmark data sets without compromising the reconstruction quality. 2D simulation data from the High-Resolution Community Earth System Model (CESM) Version 1.3 over 500 years are also being compressed with a compression ratio of 200 while the reconstruction error is negligible for scientific analysis.
Reference54 articles.
1. SLAC National Accelerator Laboratory. 2023. Linac coherent light source (lcls-ii) [Online]https://lcls.slac.stanford.edu/
2. The National Radio Astronomy Observatory. 2023. The very large array radio telescope [Online]https://public.nrao.edu/
3. An Unprecedented Set of High‐Resolution Earth System Simulations for Understanding Multiscale Interactions in Climate Variability and Change
4. Data compression for the exascale computing era-survey;Son;Supercomputing frontiers and innovations,2014
5. Gzip file format specification version 4.3.3;Deutsch;Tech. Rep.