Affiliation:
1. HP Labs, Palo Alto, CA
Abstract
Structured serial data is used in many scientific fields; such data sets consist of a series of records, and are typically written once, read many times, chronologically ordered, and read sequentially. In this paper we introduce DataSeries, an on-disk format, run-time library and set of tools for storing and analyzing structured serial data. We identify six key properties of a system to store and analyze this type of data, and describe how DataSeries was designed to provide these properties. We quantify the benefits of DataSeries through several experiments. In particular, we demonstrate that DataSeries exceeds the performance of common trace formats by at least a factor of two.
Publisher
Association for Computing Machinery (ACM)
Reference26 articles.
1. Cluster I/O with River
2. bzip2 compression library http://www.bzip.org/ accessed September 2007. bzip2 compression library http://www.bzip.org/ accessed September 2007.
3. Empirical evaluation of multi-level buffer cache collaboration for storage systems
4. The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets
5. http://tesla.hpl.hp.com/opensource/DataSeries-tr-snapshot.pdf. http://tesla.hpl.hp.com/opensource/DataSeries-tr-snapshot.pdf.
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Re-Animator;Proceedings of the 13th ACM International Systems and Storage Conference;2020-05-30
2. Using a linked table-based structure to encode self-describing multiparameter spatiotemporal data;FACETS;2018-10-01
3. Using data transformations for low-latency time series analysis;Proceedings of the Sixth ACM Symposium on Cloud Computing;2015-08-27
4. Analysis of Workload Behavior in Scientific and Historical Long-Term Data Repositories;ACM Transactions on Storage;2012-05
5. LazyBase;ACM SIGOPS Operating Systems Review;2010-03-12