Affiliation:
1. University of Salzburg, Salzburg, Austria
2. Aarhus University, Aarhus, Denmark
Abstract
Set similarity joins compute all pairs of similar sets from two collections of sets. We conduct extensive experiments on seven state-of-the-art algorithms for set similarity joins. These algorithms adopt a filter-verification approach. Our analysis shows that verification has not received enough attention in previous works. In practice, efficient verification inspects only a small, constant number of set elements and is faster than some of the more sophisticated filter techniques. Although we can identify three winners, we find that most algorithms show very similar performance. The key technique is the prefix filter, and AllPairs, the first algorithm adopting this techniques is still a relevant competitor. We repeat experiments from previous work and discuss diverging results. All our claims are supported by a detailed analysis of the factors that determine the overall runtime.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
44 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Proper Material Tracking for a Continuous Aluminum Production Process;Key Engineering Materials;2023-12-06
2. A Two-Level Signature Scheme for Stable Set Similarity Joins;Proceedings of the VLDB Endowment;2023-07
3. Feedforward-Aided Course Designs for Similarity Search;Proceedings of the 2nd International Workshop on Data Systems Education: Bridging education practice with education research;2023-06-23
4. FINEX: A Fast Index for Exact & Flexible Density-Based Clustering;Proceedings of the ACM on Management of Data;2023-05-26
5. Benchmarking Filtering Techniques for Entity Resolution;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04