Data structure set-trie for storing and querying sets: Theoretical and empirical analysis-Reference-Cited by-同舟云学术

Data structure set-trie for storing and querying sets: Theoretical and empirical analysis

Published:2021-02-10 Issue:2 Volume:16 Page:e0245122
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Savnik Iztok,Akulich Mikita,Krnc Matjaž^ORCID,Škrekovski Riste

Abstract

Set containment operations form an important tool in various fields such as information retrieval, AI systems, object-relational databases, and Internet applications. In the paper, a set-trie data structure for storing sets is considered, along with the efficient algorithms for the corresponding set containment operations. We present the mathematical and empirical study of the set-trie. In the mathematical study, the relevant upper-bounds on the efficiency of its expected performance are established by utilizing a natural probabilistic model. In the empirical study, we give insight into how different distributions of input data impact the efficiency of set-trie. Using the correct parameters for those randomly generated datasets, we expose the key sources of the input sensitivity of set-trie. Finally, the empirical comparison of set-trie with the inverted index is based on the real-world datasets containing sets of low cardinality. The comparison shows that the running time of set-trie consistently outperforms the inverted index by orders of magnitude.

Funder

Javna Agencija za Raziskovalno Dejavnost RS

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference52 articles.

1. Introduction to Information Retrieval

2. Inverted files for text search engines;J Zobel;ACM computing surveys (CSUR),2006

3. Deppisch U. S-tree: A Dynamic Balanced Signature Index for Office Retrieval. In: Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR’86. New York, NY, USA: ACM; 1986. p. 77–87.

4. Inverted files versus signature files for text indexing;J Zobel;ACM Transactions on Database Systems (TODS),1998

5. A performance study of four index structures for set-valued attributes of low cardinality;S Helmer;The VLDB Journal,2003

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Trie and LOUDS hybrid model for efficient e-commerce processing in cloud environment;Simulation Modelling Practice and Theory;2024-07

2. Connected Components for Scaling Partial-order Blocking to Billion Entities;Journal of Data and Information Quality;2024-03-19

3. Encapsulation structure and dynamics in hypergraphs;Journal of Physics: Complexity;2023-11-22

4. Fast Maximal Quasi-clique Enumeration: A Pruning and Branching Co-Design Approach;Proceedings of the ACM on Management of Data;2023-11-13

5. Multiset-Trie Data Structure;Algorithms;2023-03-20