Abstract
Trie is one of the most common data structures for string storage and retrieval. As a fast and efficient implementation of trie, double array (DA) can effectively compress strings to reduce storage spaces. However, this method suffers from the problem of low index construction efficiency. To address this problem, we design a two-level partition (TLP) framework in this paper. We first divide the dataset is into smaller lower-level partitions, and then we merge these partitions into bigger upper-level partitions using a min-heap based greedy merging algorithm (MH-GMerge). TLP has an excellent characteristic of load balancing and can be easily parallelized. We implemented two efficient parallel partitioned DAs based on TLP. Extensive experiments were carried out, and the results showed that the proposed methods can significantly improve the construction efficiency of DA and can achieve a better trade-off between construction and retrieval performance than the existing state-of-the-art methods.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference33 articles.
1. Service-Oriented Computing and System Integration: Software, IoT, Big Data, and AI as Services;Yinong,2017
2. Using Trie Structures to Efficiently Identify Similarities among Topical Subjects;Bharti,2019
3. Artificial Intelligence–Making an Intelligent personal assistant;Bhatia;Indian J. Comput. Sci. Eng.,2016
4. An enhanced dynamic hash TRIE algorithm for lexicon search
5. Mining Precise-Positioning Episode Rules from Event Sequences
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献