Fast Algorithms for Top-k Approximate String Matching-Reference-Cited by-同舟云学术

Fast Algorithms for Top-k Approximate String Matching

Published:2010-07-05 Issue:1 Volume:24 Page:1467-1473
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Yang Zhenglu,Yu Jianjun,Kitsuregawa Masaru

Abstract

Top-k approximate querying on string collections is an important data analysis tool for many applications, and it has been exhaustively studied. However, the scale of the problem has increased dramatically because of the prevalence of the Web. In this paper, we aim to explore the efficient top-k similar string matching problem. Several efficient strategies are introduced, such as length aware and adaptive q-gram selection. We present a general q-gram based framework and propose two efficient algorithms based on the strategies introduced. Our techniques are experimentally evaluated on three real data sets and show a superior performance.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Detection of advanced persistent threats using hashing and graph-based learning on streaming data;Applied Intelligence;2024-04

2. Scalable Similarity Joins for Fast and Accurate Record Deduplication in Big Data;Lecture Notes in Networks and Systems;2024

3. BipartiteJoin: Optimal Similarity Join for Fuzzy Bipartite Matching;Lecture Notes in Networks and Systems;2024

4. Bidirectional String Anchors for Improved Text Indexing and Top-$K$ Similarity Search;IEEE Transactions on Knowledge and Data Engineering;2023-11-01

5. DeepPRS: A Deep Learning Integrated Pattern Recognition Methodology for Secure Data in Cloud Environment;Innovations in Bio-Inspired Computing and Applications;2023