Affiliation:
1. Ashland, Inc., Wilmington, DE
Abstract
Data sequences generated by on-line sensors can become arbitrarily large and must, therefore, be pared down to fit into available memory. For situations where only the most recent data is of interest, this problem can be solved with optimal efficiency by a simple circular buffer: it fills each memory location with useful data, and requires just one write to memory per update. The algorithm presented here provides essentially the same efficiency, but while maintaining a continuously updated, fixed-size, compressed representation of the entire data sequence. Each value in these compressed sequences represents a statistic (an average, maximum, random sample, etc.) computed over a contiguous chunk of the original sequence. Compressing circular buffers gain their efficiency by using an alternative indexing sequence, based on well-known principles of elementary number theory, to ensure that each newly written value gets stored in the unoccupied location created when the two oldest sequential over-sampled values are compressed into one. The associated Java implementation supports a variety of aggregating statistics and is used to compare the algorithm's performance with a more obvious approach (doubling).
Publisher
Association for Computing Machinery (ACM)
Subject
Applied Mathematics,Software