Multiple Set Matching with Bloom Matrix and Bloom Vector-Reference-Cited by-同舟云学术

Multiple Set Matching with Bloom Matrix and Bloom Vector

Published:2020-04-30 Issue:2 Volume:14 Page:1-21
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Concas Francesco¹,Xu Pengfei¹,Hoque Mohammad A.¹^ORCID,Lu Jiaheng¹,Tarkoma Sasu¹

Affiliation:

1. University of Helsinki, Finland

Abstract

Bloom Filter is a space-efficient probabilistic data structure for checking the membership of elements in a set. Given multiple sets, a standard Bloom Filter is not sufficient when looking for the items to which an element or a set of input elements belong. An example case is searching for documents with keywords in a large text corpus, which is essentially a multiple set matching problem where the input is single or multiple keywords, and the result is a set of possible candidate documents. This article solves the multiple set matching problem by proposing two efficient Bloom Multifilters called Bloom Matrix and Bloom Vector, which generalize the standard Bloom Filter. Both structures are space-efficient and answer queries with a set of identifiers for multiple set matching problems. The space efficiency can be optimized according to the distribution of labels among multiple sets: Uniform and Zipf. Bloom Vector efficiently exploits the Zipf distribution of data for further space reduction. Indeed, both structures are much more space-efficient compared with the state-of-the-art, Bloofi. The results also highlight that a L ookup operation on Bloom Matrix is significantly faster than on Bloom Vector and Bloofi.

Funder

Business Finland 5G-FORCE research project

Academy of Finland

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3372409

Reference28 articles.

1. Space/time trade-offs in hash coding with allowable errors

2. Beyond bloom filters

3. On the false-positive rate of Bloom filters

4. Reprint of: The anatomy of a large-scale hypertextual web search engine

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Framework for Scalable Object Storage and Retrieval Considering Privacy Concerns: A Case Study on the Signature Detection;2023 9th International Conference on Web Research (ICWR);2023-05-03

2. A Stateful Bloom Filter for Per-Flow State Monitoring;IEEE Transactions on Network Science and Engineering;2021-04-01

3. A Survey on Bloom Filter for Multiple Sets;Modeling, Simulation and Optimization;2021