Z-checker: A framework for assessing lossy compression of scientific data

Author:

Tao Dingwen1,Di Sheng2,Guo Hanqi2,Chen Zizhong13,Cappello Franck24

Affiliation:

1. Department of Computer Science and Engineering, University of California, Riverside, CA, USA

2. Division of Computer Science and Mathematics, Argonne National Laboratory, Lemont, IL, USA

3. Beijing University of Technology, Beijing, China

4. Parallel Computing Institute, University of Illinois Urbana–Champaign, Champaign, IL, USA

Abstract

Because of the vast volume of data being produced by today’s scientific simulations and experiments, lossy data compressor allowing user-controlled loss of accuracy during the compression is a relevant solution for significantly reducing the data size. However, lossy compressor developers and users are missing a tool to explore the features of scientific data sets and understand the data alteration after compression in a systematic and reliable way. To address this gap, we have designed and implemented a generic framework called Z-checker. On the one hand, Z-checker combines a battery of data analysis components for data compression. On the other hand, Z-checker is implemented as an open-source community tool to which users and developers can contribute and add new analysis components based on their additional analysis demands. In this article, we present a survey of existing lossy compressors. Then, we describe the design framework of Z-checker, in which we integrated evaluation metrics proposed in prior work as well as other analysis tools. Specifically, for lossy compressor developers, Z-checker can be used to characterize critical properties (such as entropy, distribution, power spectrum, principal component analysis, and autocorrelation) of any data set to improve compression strategies. For lossy compression users, Z-checker can detect the compression quality (compression ratio and bit rate) and provide various global distortion analysis comparing the original data with the decompressed data (peak signal-to-noise ratio, normalized mean squared error, rate–distortion, rate-compression error, spectral, distribution, and derivatives) and statistical analysis of the compression error (maximum, minimum, and average error; autocorrelation; and distribution of errors). Z-checker can perform the analysis with either coarse granularity (throughout the whole data set) or fine granularity (by user-defined blocks), such that the users and developers can select the best fit, adaptive compressors for different parts of the data set. Z-checker features a visualization interface displaying all analysis results in addition to some basic views of the data sets such as time series. To the best of our knowledge, Z-checker is the first tool designed to assess lossy compression comprehensively for scientific data sets.

Funder

Advanced Scientific Computing Research

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Cited by 41 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Multifacets of lossy compression for scientific data in the Joint-Laboratory of Extreme Scale Computing;Future Generation Computer Systems;2025-02

2. CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2;Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing;2024-06-03

3. GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data;Proceedings of the 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures;2024-06-03

4. CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction;2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2024-05-27

5. High-performance Effective Scientific Error-bounded Lossy Compression with Auto-tuned Multi-component Interpolation;Proceedings of the ACM on Management of Data;2024-03-12

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3