Abstract
Abstract
In many applications, classification plays an indispensable role due to its powerful detection and diagnosis function. Especially in real data on disease, the detection of important factors and the diagnosis of the result usually bring huge contributions to patients. Simultaneously, complex problems in real data such as imbalanced data and missing data also lead to more challenges and difficulties. The ignorance of missing data will undermine study efficiency, and sometimes introduce substantial bias. Imbalanced data tends to be overwhelmed by the majority classes and ignoring the minority ones. The paper develops new support vector machine classifiers using k-nearest neighbors’ information (KNN-SVM), to impute missing data by calculating k-nearest neighbors’ statistical characteristic values and to interpolate some new samples between k-nearest minority class examples. As comparisons, the paper uses different kernel functions in KNN-SVM classifiers to show the different performances in disease diagnosis accuracy.
Reference11 articles.
1. Machine Learning for the Detection of Oil Spills in Satellite Radar Images;Miroslav;Machine Learning,1998
2. A Two Modifications of Cnn;Ivan;IEEE Transactions on Systems, Man and Cybernetics,1976
3. The Condensed Nearest Neighbor Rule;Peter;the IEEE Transactions on Information Theory,1968
4. Asymptotic Properties of Nearest Neighbor Rules Using Edited Data;Wilson;IEEE Transactions on Systems, Man and Cybernetics,1972
5. Smote: Synthetic Minority over-Sampling Technique;Nitesh;Journal of Artificial Intelligence Research,2002