Affiliation:
1. Technical University of Munich
2. Centrum Wiskunde & Informatica
Abstract
We define the concept of performance-optimal filtering to indicate the Bloom or Cuckoo filter configuration that best accelerates a particular task. While the space-precision tradeoff of these filters has been well studied, we show how to pick a filter that maximizes the performance for a given workload. This choice might be "suboptimal" relative to traditional space-precision metrics, but it will lead to better performance in practice. In this paper, we focus on high-throughput filter use cases, aimed at avoiding CPU work, e.g., a cache miss, a network message, or a local disk I/O - events that can happen at rates of millions to hundreds per second. Besides the false-positive rate and memory footprint of the filter, performance optimality has to take into account the absolute cost of the filter lookup as well as the saved work per lookup that filtering avoids; while the actual rate of negative lookups in the workload determines whether using a filter improves overall performance at all. In the course of the paper, we introduce new filter variants, namely the register-blocked and cache-sectorized Bloom filters. We present new implementation techniques and perform an extensive evaluation on modern hardware platforms, including the wide-SIMD Skylake-X and Knights Landing. This experimentation shows that in high-throughput situations, the lower lookup cost of blocked Bloom filters allows them to overtake Cuckoo filters.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
36 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Beyond Bloom: A Tutorial on Future Feature-Rich Filters;Companion of the 2024 International Conference on Management of Data;2024-06-09
2. Simple, Efficient, and Robust Hash Tables for Join Processing;Proceedings of the 20th International Workshop on Data Management on New Hardware;2024-06-09
3. GRF: A Global Range Filter for LSM-Trees with Shape Encoding;Proceedings of the ACM on Management of Data;2024-05-29
4. Wormhole Filters: Caching Your Hash on Persistent Memory;Proceedings of the Nineteenth European Conference on Computer Systems;2024-04-22
5. Sieve: A Learned Data-Skipping Index for Data Analytics;Proceedings of the VLDB Endowment;2023-07