Affiliation:
1. Mentor Graphics Corp., Wilsonville, Ohio
2. Bell-Labs Research, Bangalore, India
3. University of California, Santa Barbara, Santa Barbara, California
Abstract
We consider the following problem: given an on-line, possibly unbounded stream of two-dimensional (2D) points, how can we summarize its spatial distribution or
shape
using a small, bounded amount of memory? We propose a novel scheme, called
ClusterHull
, which represents the shape of the stream as a dynamic collection of convex hulls, with a total of at most
m
vertices, where
m
is the size of the memory. The algorithm dynamically adjusts both the number of hulls and the number of vertices in each hull to best represent the stream using its fixed-memory budget. This algorithm addresses a problem whose importance is increasingly recognized, namely, the problem of summarizing real-time data streams to enable on-line analytical processing. As a motivating example, consider habitat monitoring using wireless sensor networks. The sensors produce a steady stream of geographic data, namely, the locations of objects being tracked. In order to conserve their limited resources (power, bandwidth, and storage), the sensors can compute, store, and exchange ClusterHull summaries of their data, without losing important geometric information. We are not aware of other schemes specifically designed for capturing shape information in geometric data streams and so we compare ClusterHull with some of the best general-purpose clustering schemes, such as CURE,
k
-medians, and LSEARCH. We show through experiments that ClusterHull is able to represent the shape of two-dimensional data streams more faithfully and flexibly than the stream versions of these clustering algorithms.
Funder
Division of Information and Intelligent Systems
Publisher
Association for Computing Machinery (ACM)
Subject
Theoretical Computer Science
Reference39 articles.
1. Approximating extent measures of points
2. The space complexity of approximating the frequency moments
3. Quadtrees and octrees. In Handbook of Data Structures and Applications, D. P. Mehta and S. Sahni, Eds. CRC Press LLC, Boca Raton;Aluru S.;FL. Chapter,2005
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Cluster Summarization with Dense Region Detection;Communications in Computer and Information Science;2015