Boolean logic algebra driven similarity measure for text based applications-Reference-Cited by-同舟云学术

Boolean logic algebra driven similarity measure for text based applications

Published:2021-07-29 Issue: Volume:7 Page:e641
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Abdalla Hassan I.¹,Amer Ali A.²^ORCID

Affiliation:

1. College of Technological Innovation, Zayed University, Abu Dhabi, Abu Dhabi, United Arab Emirates

2. Computer Science Department, Taiz University, Taiz, Yemen

Abstract

In Information Retrieval (IR), Data Mining (DM), and Machine Learning (ML), similarity measures have been widely used for text clustering and classification. The similarity measure is the cornerstone upon which the performance of most DM and ML algorithms is completely dependent. Thus, till now, the endeavor in literature for an effective and efficient similarity measure is still immature. Some recently-proposed similarity measures were effective, but have a complex design and suffer from inefficiencies. This work, therefore, develops an effective and efficient similarity measure of a simplistic design for text-based applications. The measure developed in this work is driven by Boolean logic algebra basics (BLAB-SM), which aims at effectively reaching the desired accuracy at the fastest run time as compared to the recently developed state-of-the-art measures. Using the term frequency–inverse document frequency (TF-IDF) schema, the K-nearest neighbor (KNN), and the K-means clustering algorithm, a comprehensive evaluation is presented. The evaluation has been experimentally performed for BLAB-SM against seven similarity measures on two most-popular datasets, Reuters-21 and Web-KB. The experimental results illustrate that BLAB-SM is not only more efficient but also significantly more effective than state-of-the-art similarity measures on both classification and clustering tasks.

Funder

Zayed University, UAE

Publisher

PeerJ

Subject

General Computer Science

Link

https://peerj.com/articles/cs-641.pdf

Reference41 articles.

1. Comparative analysis of various similarity measures for finding similarity of two documents;Afzali;International Journal of Database Theory and Application,2017

2. On K-means clustering-based approach for DDBSs design;Amer;Journal of Big Data,2020

3. A set theory based similarity measure for text clustering and classification;Amer;Journal of Big Data,2020

4. Enhancing recommendation systems performance using highly-effective similarity measures;Amer;Knowledge-Based Systems,2021

5. On the foundations of similarity in information access;Amigó;Information Retrieval Journal,2020

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Neighboring-Aware Hierarchical Clustering;International Journal on Semantic Web and Information Systems;2024-05-23

2. On the Impact of Jaccard Fusion with Numerical Measures for Collaborative Filtering Enhancement;2023-08-29

3. An experimental study on the performance of collaborative filtering based on user reviews for large-scale datasets;PeerJ Computer Science;2023-08-25

4. Numerical Similarity Measures Versus Jaccard for Collaborative Filtering;Proceedings of the 9th International Conference on Advanced Intelligent Systems and Informatics 2023;2023

5. The Impact of Data Normalization on KNN Rendering;Proceedings of the 9th International Conference on Advanced Intelligent Systems and Informatics 2023;2023