Affiliation:
1. University of Helsinki, Finland
Abstract
Bloom Filter is a space-efficient probabilistic data structure for checking the membership of elements in a set. Given multiple sets, a standard Bloom Filter is not sufficient when looking for the items to which an element or a set of input elements belong. An example case is searching for documents with keywords in a large text corpus, which is essentially a multiple set matching problem where the input is single or multiple keywords, and the result is a set of possible candidate documents. This article solves the multiple set matching problem by proposing two efficient Bloom Multifilters called Bloom Matrix and Bloom Vector, which generalize the standard Bloom Filter. Both structures are space-efficient and answer queries with a set of identifiers for multiple set matching problems. The space efficiency can be optimized according to the distribution of labels among multiple sets: Uniform and Zipf. Bloom Vector efficiently exploits the Zipf distribution of data for further space reduction. Indeed, both structures are much more space-efficient compared with the state-of-the-art, Bloofi. The results also highlight that a L
ookup
operation on Bloom Matrix is significantly faster than on Bloom Vector and Bloofi.
Funder
Business Finland 5G-FORCE research project
Academy of Finland
Publisher
Association for Computing Machinery (ACM)
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献