Compressed kNN: K-Nearest Neighbors with Data Compression-Reference-Cited by-同舟云学术

Compressed kNN: K-Nearest Neighbors with Data Compression

Published:2019-02-28 Issue:3 Volume:21 Page:234
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Salvador–Meneses Jaime,Ruiz–Chavez Zoila,Garcia–Rodriguez Jose

Abstract

The kNN (k-nearest neighbors) classification algorithm is one of the most widely used non-parametric classification methods, however it is limited due to memory consumption related to the size of the dataset, which makes them impractical to apply to large volumes of data. Variations of this method have been proposed, such as condensed KNN which divides the training dataset into clusters to be classified, other variations reduce the input dataset in order to apply the algorithm. This paper presents a variation of the kNN algorithm, of the type structure less NN, to work with categorical data. Categorical data, due to their nature, can be compressed in order to decrease the memory requirements at the time of executing the classification. The method proposes a previous phase of compression of the data to then apply the algorithm on the compressed data. This allows us to maintain the whole dataset in memory which leads to a considerable reduction of the amount of memory required. Experiments and tests carried out on known datasets show the reduction in the volume of information stored in memory and maintain the accuracy of the classification. They also show a slight decrease in processing time because the information is decompressed in real time (on-the-fly) while the algorithm is running.

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/21/3/234/pdf

Reference30 articles.

1. Compression, Clustering and Pattern Discovery in Very High Dimensional Discrete-Attribute Datasets;Grama;Techniques,2005

2. A label compression method for online multi-label classification

3. A Survey of Clustering Techniques

4. Discrete models for data imputation

5. K-Dependence Bayesian Classifier Ensemble

Cited by 41 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Low-Cost Microcontroller-Based System for Condition Monitoring of Permanent-Magnet Synchronous Motor Stator Windings;Electronics;2024-07-28

2. Applications of AI Techniques in Healthcare and Wellbeing;Advances in Medical Technologies and Clinical Practice;2024-07-19

3. Comparative Performance Analysis of Filling Missing Values Algorithms in PdM Systems of UAV;BRAIN. Broad Research in Artificial Intelligence and Neuroscience;2024-07-05

4. Finding fault types of BLDC motors within UAVs using machine learning techniques;Heliyon;2024-05

5. Q8KNN: A Novel 8-Bit KNN Quantization Method for Edge Computing in Smart Lighting Systems with NodeMCU;Lecture Notes in Networks and Systems;2024