A new similarity measure for vector space models in text classification and information retrieval-Reference-Cited by-同舟云学术

A new similarity measure for vector space models in text classification and information retrieval

Published:2020-10-27 Issue: Volume: Page:016555152096805
ISSN:0165-5515
Container-title:Journal of Information Science
language:en
Short-container-title:Journal of Information Science

Author:

Eminagaoglu Mete¹^ORCID

Affiliation:

1. Dokuz Eylul Universitesi, Fen Fakultesi, Izmir, Turkey

Abstract

There are various models, methodologies and algorithms that can be used today for document classification, information retrieval and other text mining applications and systems. One of them is the vector space–based models, where distance metrics or similarity measures lie at the core of such models. Vector space–based model is one of the fast and simple alternatives for the processing of textual data; however, its accuracy, precision and reliability still need significant improvements. In this study, a new similarity measure is proposed, which can be effectively used for vector space models and related algorithms such as k-nearest neighbours ( k-NN) and Rocchio as well as some clustering algorithms such as K-means. The proposed similarity measure is tested with some universal benchmark data sets in Turkish and English, and the results are compared with some other standard metrics such as Euclidean distance, Manhattan distance, Chebyshev distance, Canberra distance, Bray–Curtis dissimilarity, Pearson correlation coefficient and Cosine similarity. Some successful and promising results have been obtained, which show that this proposed similarity measure could be alternatively used within all suitable algorithms and models for information retrieval, document clustering and text classification.

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Link

http://journals.sagepub.com/doi/pdf/10.1177/0165551520968055

Reference37 articles.

1. Machine learning in automated text categorization

2. Learning to crawl

3. An evaluation of statistical spam filtering techniques

Cited by 31 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Residual cosine similar attention and bidirectional convolution in dual-branch network for skin lesion image classification;Engineering Applications of Artificial Intelligence;2024-07

2. A hybrid machine learning model for sentiment analysis and satisfaction assessment with Turkish universities using Twitter data;Decision Analytics Journal;2024-06

3. Semantic deep learning and adaptive clustering for handling multimodal multimedia information retrieval;Multimedia Tools and Applications;2024-05-25

4. CTSP: Chinese Text Semantic Similarity Prediction;2024 5th International Conference on Computer Engineering and Application (ICCEA);2024-04-12

5. A Language Framework for Measuring Semantic and Syntactic Similarity for Arabic Texts;SN Computer Science;2024-03-27