Improving k-Nearest Neighbors Algorithm for Imbalanced Data Classification

Author:

Shi Zhan

Abstract

Abstract The k-Nearest Neighbors (k-NN) algorithm is a classic non-parametric method that has wide applications in data classification and prediction. Like many other machine learning schemes, the performance of k-NN classifiers will be significantly impacted by the imbalanced class distributions of data. That is, the data instances in the majority class tend to dominate the prediction of the test instances. In this paper, we look into the data pre-processing techniques that can be used to rebalance the training data and enhance the performance of k-NN classifiers in imbalanced data sets. We conduct extensive experiments on 14 real-world data sets collected from different application domains. We also perform statistical tests to verify the significance of different data pre-processing techniques in terms of boosting k-NN classification precision.

Publisher

IOP Publishing

Subject

General Medicine

Reference21 articles.

1. Addressing the Class Imbalance Problem in Twitter Spam Detection Using Ensemble Learning;Liu;Computers & Security,2017

2. An insight into imbalanced Big Data classification: outcomes and challenges;Fernández;Complex Intell. Syst.,2017

3. Statistical Features Based Real-time Detection of Drifted Twitter Spam;Chen;IEEE TIFS,2017

4. Internet Traffic Clustering with Side Information;Wang;Journal of Computer and System Sciences,2014

5. Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties;Fix;International Statistical Review,1989

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Handling the Class Imbalance Problem With an Improved Sine Cosine Algorithm for Optimal Instance Selection;IEEE Access;2024

2. A Comprehensive Study of the Performances of Imbalanced Data Learning Methods with Different Optimization Techniques;Communications in Computer and Information Science;2024

3. Data Balance Optimization of Fraud Classification for E-Commerce Transaction;2022 Seventh International Conference on Informatics and Computing (ICIC);2022-12-08

4. Deep Recurrent Encoder Network and Spark Model for Angiographic Disease Risk Classification;International Journal of Pattern Recognition and Artificial Intelligence;2022-03-30

5. Classification of limb movements using different predictive analysis algorithms;International Journal of System Assurance Engineering and Management;2021-11-09

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3