Utilizing Nearest-Neighbor Clustering for Addressing Imbalanced Datasets in Bioengineering
-
Published:2024-03-31
Issue:4
Volume:11
Page:345
-
ISSN:2306-5354
-
Container-title:Bioengineering
-
language:en
-
Short-container-title:Bioengineering
Author:
Huang Chih-Ming1, Lin Chun-Hung1ORCID, Hung Chuan-Sheng1ORCID, Zeng Wun-Hui1, Zheng You-Cheng12ORCID, Tsai Chih-Min13ORCID
Affiliation:
1. Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 833, Taiwan 2. Division of Cardiology, Department of Internal Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung 833, Taiwan 3. Department of Pediatrics, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung 833, Taiwan
Abstract
Imbalance classification is common in scenarios like fault diagnosis, intrusion detection, and medical diagnosis, where obtaining abnormal data is difficult. This article addresses a one-class problem, implementing and refining the One-Class Nearest-Neighbor (OCNN) algorithm. The original inter-quartile range mechanism is replaced with the K-means with outlier removal (KMOR) algorithm for efficient outlier identification in the target class. Parameters are optimized by treating these outliers as non-target-class samples. A new algorithm, the Location-based Nearest-Neighbor (LBNN) algorithm, clusters one-class training data using KMOR and calculates the farthest distance and percentile for each test data point to determine if it belongs to the target class. Experiments cover parameter studies, validation on eight standard imbalanced datasets from KEEL, and three applications on real medical imbalanced datasets. Results show superior performance in precision, recall, and G-means compared to traditional classification models, making it effective for handling imbalanced data challenges.
Reference25 articles.
1. Learning from Imbalanced Data;He;IEEE Trans. Knowl. Data Eng.,2009 2. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework;Fernandez;J. Mult.-Valued Log. Soft Comput.,2011 3. Sun, W., Hu, W., Xue, Z., and Cao, J. (2019, January 19–21). Overview of one-class classification. Proceedings of the 2019 IEEE 4th International Conference on Signal and Image Processing, Wuxi, China. 4. Outlier Detection: Methods, Models, and Classification;Boukerche;ACM Comput. Surv. CSUR,2020 5. Breunig, M.M., Kriegel, H.P., Ng, R.T., and Sander, J. (2000, January 16–18). LOF: Identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA.
|
|