Affiliation:
1. King Saud University College of Computer and Information Sciences
Abstract
Abstract
Instance-Based learning, such as the k Nearest Neighbor (kNN), is a simple yet effective machine learning algorithm for text classification. However, it may take long classification time and large memory requirement which motivated the development of instance reduction techniques that discard irrelevant and noisy instances. This usually comes at the expense of reducing the classification accuracy. This work proposes a Selective Learning Vector Quantization algorithm (SLVQ) and uses it to fine-tune the reduced datasets in a way that improves their representation of the full dataset. Unlike classical Learning Vector Quantization (LVQ) algorithms, SLVQ can deal with nominal attributes, while using the instances in the reduced datasets as the initial codebook vectors and using the original dataset to fine-tune them. The algorithm addresses the problem of dealing with nominal values, which is crucial, since many real-world datasets contain nominal attributes and require the use of an appropriate distance measure, such as the Value Distance Measure (VDM). SLVQ modifies the Value Difference Metric (VDM) distances between nominal values instead of modifying the weight vectors themselves. Our experimental results using four instance reduction algorithms and 17 text classification datasets demonstrate our approach’s effectiveness in improving the classification accuracy of the reduced sets.
Publisher
Research Square Platform LLC