A Selective LVQ Algorithm for Improving Instance Reduction Techniques and its Application for Text Classification-Reference-Cited by-同舟云学术

A Selective LVQ Algorithm for Improving Instance Reduction Techniques and its Application for Text Classification

Published:2022-08-05 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Hayel Rafa¹,Hindi Khalil El¹,Hosny Manar¹^ORCID,Alharbi Rawan¹,Alsalman Hussien¹

Affiliation:

1. King Saud University College of Computer and Information Sciences

Abstract

Abstract Instance-Based learning, such as the k Nearest Neighbor (kNN), is a simple yet effective machine learning algorithm for text classification. However, it may take long classification time and large memory requirement which motivated the development of instance reduction techniques that discard irrelevant and noisy instances. This usually comes at the expense of reducing the classification accuracy. This work proposes a Selective Learning Vector Quantization algorithm (SLVQ) and uses it to fine-tune the reduced datasets in a way that improves their representation of the full dataset. Unlike classical Learning Vector Quantization (LVQ) algorithms, SLVQ can deal with nominal attributes, while using the instances in the reduced datasets as the initial codebook vectors and using the original dataset to fine-tune them. The algorithm addresses the problem of dealing with nominal values, which is crucial, since many real-world datasets contain nominal attributes and require the use of an appropriate distance measure, such as the Value Distance Measure (VDM). SLVQ modifies the Value Difference Metric (VDM) distances between nominal values instead of modifying the weight vectors themselves. Our experimental results using four instance reduction algorithms and 17 text classification datasets demonstrate our approach’s effectiveness in improving the classification accuracy of the reduced sets.

Publisher

Research Square Platform LLC

Reference33 articles.

1. Angiulli F (2005) Fast condensed nearest neighbor rule. Proceedings of the 22nd International Conference on Machine Learning, 25–32. https://doi.org/10.1145/1102351.1102355

2. Instance selection of linear complexity for big data;Arnaiz-González Á;Knowl Based Syst,2016

3. Nearest neighbor pattern classification;Cover T;IEEE Trans Inf Theory,1967

4. Approximate statistical tests for comparing supervised classification learning algorithms;Dietterich TG;Neural Comput,1998

5. Specific-class distance measures for nominal attributes;Hindi K;AI Commun,2013