Finding Similar Documents Using Frequent Pattern Mining Methods-Reference-Cited by-同舟云学术

Finding Similar Documents Using Frequent Pattern Mining Methods

Published:2019-02 Issue:01 Volume:27 Page:73-96
ISSN:0218-4885
Container-title:International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
language:en
Short-container-title:Int. J. Unc. Fuzz. Knowl. Based Syst.

Author:

Sohrabi Mohammad Karim¹,Azgomi Hossein¹

Affiliation:

1. Department of Computer Engineering, Semnan Branch, Islamic Azad University, Semnan, Iran

Abstract

Various problems are just rising with regard to mining in massive datasets, among which finding similar documents can be pinpointed. The Shingling method converts this problem to a set-based problem. Some of existing methods have used min-hashing to compress the results already driven from the shingling method and then have exploited LSH method to find candidate pairs for similarity search from all pairs of documents. In this paper, an apriori-based method is proposed for finding similar documents based on frequent itemset mining approach. To this end, the apriori algorithm is modified and is customized for similarity search problem. Modeling the similarity search problem as a frequent pattern mining problem, using a modified version of apriori, and dynamic selection the minimum support threshold are the most important advantages of the proposed method, which lead to its appropriate execution time and high quality results. The proposed method finds similar documents in less time than the combined method and MCVM method because it generates fewer candidate pairs for finding similar documents. Furthermore, experimental results show the high quality of the answers of the proposed methods.

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Information Systems,Control and Systems Engineering,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218488519500041

Reference45 articles.

1. Automated Fuzzy System Based on Feature Extraction and Selection for Opinion Classification Across Different Domains

2. Rough Set Theory and Fuzzy Logic Based Warehousing of Heterogeneous Clinical Databases

3. http://www.jcomputers.us/vol11/jcp1102-06.pdf

4. TSGV: a table-like structure-based greedy method for materialized view selection in data warehouses

5. Enhanced visual data mining process for dynamic decision-making

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Fuel assemblies loading pattern optimization of pressurized water reactors using the trees social relations algorithm;Annals of Nuclear Energy;2023-11

2. FPGA/GPU-based Acceleration for Frequent Itemsets Mining: A Comprehensive Review;ACM Computing Surveys;2022-12-31

3. Detection of counterfeit banknotes by security components based on image processing and GoogLeNet deep learning network;Signal, Image and Video Processing;2022-01-10

4. Frequent Route Pattern Mining Technique for Route Prediction in Transportation Network;Communications in Computer and Information Science;2021

5. A novel coral reefs optimization algorithm for materialized view selection in data warehouse environments;Applied Intelligence;2019-05-19