Analysis of False Negative Rates for Recycling Bloom Filters (Yes, They Happen!)-Reference-Cited by-同舟云学术

Analysis of False Negative Rates for Recycling Bloom Filters (Yes, They Happen!)

Published:2024-05-21 Issue:2 Volume:8 Page:1-34
ISSN:2476-1249
Container-title:Proceedings of the ACM on Measurement and Analysis of Computing Systems
language:en
Short-container-title:Proc. ACM Meas. Anal. Comput. Syst.

Author:

Dozier Kahlil¹^ORCID,Salamatian Loqman¹^ORCID,Rubenstein Dan¹^ORCID

Affiliation:

1. Columbia University, New York, USA

Abstract

Bloom Filters are a desirable data structure for distinguishing new values in sequences of data (i.e., messages), due to their space efficiency, their low false positive rates (incorrectly classifying a new value as a repeat), and never producing false negatives (classifying a repeat value as new). However, as the Bloom Filter's bits are filled, false positive rates creep upward. To keep false positive rates below a reasonable threshold, applications periodically "recycle" the Bloom Filter, clearing the memory and then resuming the tracking of data. After a recycle point, subsequent arrivals of recycled messages are likely to be misclassified as new; recycling induces false negatives. Despite numerous applications of recycling, the corresponding false negative rates have never been analyzed. In this paper, we derive approximations, upper bounds, and lower bounds of false negative rates for several variants of recycling Bloom Filters. These approximations and bounds are functions of the size of memory used to store the Bloom Filter and the distributions on new arrivals and repeat messages, and can be efficiently computed on conventional hardware. We show, via comparison to simulation, that our upper bounds and approximations are extremely tight, and can be efficiently computed for megabyte-sized Bloom Filters on conventional hardware.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3656005

Reference27 articles.

1. Pay for a Sliding Bloom Filter and Get Counting, Distinct Elements, and Entropy for Free

2. Don't thrash

3. Space/time trade-offs in hash coding with allowable errors

4. On the false-positive rate of Bloom filters

5. A new analysis of the false positive rate of a Bloom filter