What's hot and what's not: tracking most frequent items dynamically-Reference-Cited by-同舟云学术

What's hot and what's not: tracking most frequent items dynamically

Published:2005-03 Issue:1 Volume:30 Page:249-278
ISSN:0362-5915
Container-title:ACM Transactions on Database Systems
language:en
Short-container-title:ACM Trans. Database Syst.

Author:

Cormode Graham¹,Muthukrishnan S.²

Affiliation:

1. Rutgers University, Murray Hill, NJ

2. Rutgers University, Piscataway, NJ

Abstract

Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the “hot items” in the relation: those that appear many times (most frequently, or more than some threshold). For example, end-biased histograms keep the hot items as part of the histogram and are used in selectivity estimation. Hot items are used as simple outliers in data mining, and in anomaly detection in many applications.We present new methods for dynamically determining the hot items at any time in a relation which is undergoing deletion operations as well as inserts. Our methods maintain small space data structures that monitor the transactions on the relation, and, when required, quickly output all hot items without rescanning the relation in the database. With user-specified probability, all hot items are correctly reported. Our methods rely on ideas from “group testing.” They are simple to implement, and have provable quality, space, and time guarantees. Previously known algorithms for this problem that make similar quality and performance guarantees cannot handle deletions, and those that handle deletions cannot make similar guarantees without rescanning the database. Our experiments with real and synthetic data show that our algorithms are accurate in dynamically tracking the hot items independent of the rate of insertions and deletions.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/1061318.1061325

Reference33 articles.

1. Aho A. V. Hopcroft J. E. and Ullman J. D. 1987. Data structures and algorithms. Addison-Wesley Reading MA. Aho A. V. Hopcroft J. E. and Ullman J. D. 1987. Data structures and algorithms. Addison-Wesley Reading MA.

2. Tracking join and self-join sizes in limited storage

3. The Space Complexity of Approximating the Frequency Moments

4. Distributed top-k monitoring

Cited by 154 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimal Dorfman Group Testing for Symmetric Distributions;SIAM Journal on Mathematics of Data Science;2024-08-09

2. Conditional heavy hitter monitoring and application of heterogeneous graph streams based on sketches;Information Processing & Management;2024-07

3. Local Differentially Private Heavy Hitter Detection in Data Streams with Bounded Memory;Proceedings of the ACM on Management of Data;2024-03-12

4. Improved Lower Bound for Estimating the Number of Defective Items;Combinatorial Optimization and Applications;2023-12-09

5. Compact Frequency Estimators in Adversarial Environments;Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security;2023-11-15