Author:
Wang Shujuan,Dai Yuntao,Shen Jihong,Xuan Jingxue
Abstract
AbstractWith the development of artificial intelligence, big data classification technology provides the advantageous help for the medicine auxiliary diagnosis research. While due to the different conditions in the different sample collection, the medical big data is often imbalanced. The class-imbalance problem has been reported as a serious obstacle to the classification performance of many standard learning algorithms. SMOTE algorithm could be used to generate sample points randomly to improve imbalance rate, but its application is affected by the marginalization generation and blindness of parameter selection. Focusing on this problem, an improved SMOTE algorithm based on Normal distribution is proposed in this paper, so that the new sample points are distributed closer to the center of the minority sample with a higher probability to avoid the marginalization of the expanded data. Experiments show that the classification effect is better when use proposed algorithm to expand the imbalanced dataset of Pima, WDBC, WPBC, Ionosphere and Breast-cancer-wisconsin than the original SMOTE algorithm. In addition, the parameter selection of the proposed algorithm is analyzed and it is found that the classification effect is the best when the distribution characteristics of the original data was maintained best by selecting appropriate parameters in our designed experiments.
Publisher
Springer Science and Business Media LLC
Reference35 articles.
1. Qinghua, H., Gui Changqing, Xu. & Jie, L. G. A generalized method to predict the compressive strength of high-performance concrete by improved random forest algorithm. Constr. Build. Mater. 226(30), 734–742 (2019).
2. Verbiest, N., Ramentol, E., Cornelis, C. & Herrera, F. Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection. Appl. Soft Comput. 22, 511–517 (2014).
3. Huang, L. et al. Improvement of maximum variance weight partitioning particle filter in urban computing and intelligence. IEEE Access 7, 106527–106535 (2019).
4. Huang, L., Fu, Q., He, M., Jiang, D. & Hao, Z. Detection algorithm of safety helmet wearing based on deep learning. Concurr. Comput. Pract. Exp. 33(13), e6234 (2021).
5. Yu, M. et al. Hand medical monitoring system based on machine learning and optimal EMG feature set. Pers. Ubiquit. Comput. https://doi.org/10.1007/s00779-019-01285-2 (2019).
Cited by
75 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献