An Efficient Two-Level-Partitioning-Based Double Array and Its Parallelization-Reference-Cited by-同舟云学术

An Efficient Two-Level-Partitioning-Based Double Array and Its Parallelization

Published:2020-07-30 Issue:15 Volume:10 Page:5266
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Jia Lianyin,Zhang Chongde^ORCID,Li Mengjuan,Chen Yinong,Liu Yong,Ding Jiaman

Abstract

Trie is one of the most common data structures for string storage and retrieval. As a fast and efficient implementation of trie, double array (DA) can effectively compress strings to reduce storage spaces. However, this method suffers from the problem of low index construction efficiency. To address this problem, we design a two-level partition (TLP) framework in this paper. We first divide the dataset is into smaller lower-level partitions, and then we merge these partitions into bigger upper-level partitions using a min-heap based greedy merging algorithm (MH-GMerge). TLP has an excellent characteristic of load balancing and can be easily parallelized. We implemented two efficient parallel partitioned DAs based on TLP. Extensive experiments were carried out, and the results showed that the proposed methods can significantly improve the construction efficiency of DA and can achieve a better trade-off between construction and retrieval performance than the existing state-of-the-art methods.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/10/15/5266/pdf

Reference33 articles.

1. Service-Oriented Computing and System Integration: Software, IoT, Big Data, and AI as Services;Yinong,2017

2. Using Trie Structures to Efficiently Identify Similarities among Topical Subjects;Bharti,2019

3. Artificial Intelligence–Making an Intelligent personal assistant;Bhatia;Indian J. Comput. Sci. Eng.,2016

4. An enhanced dynamic hash TRIE algorithm for lexicon search

5. Mining Precise-Positioning Episode Rules from Event Sequences

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Simulation Research on Fast Matching of Big Data Based on Spark;IEEE Access;2023

2. Ext-LOUDS: A Space Efficient Extended LOUDS Index for Superset Query;Applied Sciences;2020-11-28