A Large-Scale <math xmlns="http://www.w3.org/1998/Math/MathML" id="M1"> <mi>k</mi> </math>-Nearest Neighbor Classification Algorithm Based on Neighbor Relationship Preservation-Reference-Cited by-同舟云学术

A Large-Scale $k$ -Nearest Neighbor Classification Algorithm Based on Neighbor Relationship Preservation

Published:2022-01-07 Issue: Volume:2022 Page:1-11
ISSN:1530-8677
Container-title:Wireless Communications and Mobile Computing
language:en
Short-container-title:Wireless Communications and Mobile Computing

Author:

Song Yunsheng¹^ORCID,Kong Xiaohan¹,Zhang Chao¹

Affiliation:

1. College of Information Science and Engineering, Shandong Agricultural University, Tai’an, 271018, China

Abstract

Owing to the absence of hypotheses of the underlying distributions of the data and the strong generation ability, the

k

-nearest neighbor (kNN) classification algorithm is widely used to face recognition, text classification, emotional analysis, and other fields. However, kNN needs to compute the similarity between the unlabeled instance and all the training instances during the prediction process; it is difficult to deal with large-scale data. To overcome this difficulty, an increasing number of acceleration algorithms based on data partition are proposed. However, they lack theoretical analysis about the effect of data partition on classification performance. This paper has made a theoretical analysis of the effect using empirical risk minimization and proposed a large-scale

k

-nearest neighbor classification algorithm based on neighbor relationship preservation. The process of searching the nearest neighbors is converted to a constrained optimization problem. Then, it gives the estimation of the difference on the objective function value under the optimal solution with data partition and without data partition. According to the obtained estimation, minimizing the similarity of the instances in the different divided subsets can largely reduce the effect of data partition. The minibatch

k

-means clustering algorithm is chosen to perform data partition for its effectiveness and efficiency. Finally, the nearest neighbors of the test instance are continuously searched from the set generated by successively merging the candidate subsets until they do not change anymore, where the candidate subsets are selected based on the similarity between the test instance and cluster centers. Experiment results on public datasets show that the proposed algorithm can largely keep the same nearest neighbors and no significant difference in classification accuracy as the original kNN classification algorithm and better results than two state-of-the-art algorithms.

Funder

Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province

Publisher

Hindawi Limited

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Information Systems

Link

http://downloads.hindawi.com/journals/wcmc/2022/7409171.pdf

Reference34 articles.

1. Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation

2. Locality constrained representation-based K-nearest neighbor classification

3. Cosine K-Nearest Neighbor in Milkfish Eye Classification

4. MCENN: A variant of extended nearest neighbor method for pattern recognition

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi-scale patch fuzzy decision for face recognition with category information;International Journal of Machine Learning and Cybernetics;2024-04-27

2. Ventilation diagnosis of minigrinders using thermal images;Expert Systems with Applications;2024-03

3. Monkeypox diagnosis based on Dynamic Recursive Gray wolf (DRGW) optimization;Biomedical Signal Processing and Control;2024-01

4. Predicting compressive strength of cement-stabilized earth blocks using machine learning models incorporating cement content, ultrasonic pulse velocity, and electrical resistivity;Nondestructive Testing and Evaluation;2023-07-24

5. Disease Diagnosis Based on Improved Gray Wolf Optimization (IGWO) and Ensemble Classification;Annals of Biomedical Engineering;2023-07-14