Affiliation:
1. Department of Computer Science and Technology, Shandong Agricultural University, China
2. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, China
Abstract
nearest neighbor (
NN) is a simple and widely used classifier; it can achieve comparable performance with more complex classifiers including decision tree and artificial neural network. Therefore,
NN has been listed as one of the top 10 algorithms in machine learning and data mining. On the other hand, in many classification problems, such as medical diagnosis and intrusion detection, the collected training sets are usually class imbalanced. In class imbalanced data, although positive examples are heavily outnumbered by negative ones, positive examples usually carry more meaningful information and are more important than negative examples. Similar to other classical classifiers,
NN is also proposed under the assumption that the training set has approximately balanced class distribution, leading to its unsatisfactory performance on imbalanced data. In addition, under a class imbalanced scenario, the global resampling strategies that are suitable to decision tree and artificial neural network often do not work well for
NN, which is a local information-oriented classifier. To solve this problem, researchers have conducted many works for
NN over the past decade. This paper presents a comprehensive survey of these works according to their different perspectives and analyzes and compares their characteristics. At last, several future directions are pointed out.
Funder
Natural Science Foundation of Shandong Province
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Information Systems
Cited by
21 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献