Affiliation:
1. Key Laboratory of OptoElectronic Science and Technology for Medicine Ministry of Education, Fujian Provincial Key Laboratory for Photonics Technology Fujian Normal University Fuzhou Fujian 350117 China
2. Department of Thoracic Surgery Fujian Medical University Union Hospital Fuzhou Fujian Province 350001 China
3. Department of Breast Surgery Department of General Surgery Fujian Medical University Union Hospital Breast Cancer Institute Fujian Medical University Fuzhou Fujian Province 350001 China
Abstract
Surface‐enhanced Raman spectroscopy (SERS) has shown highly promising for existing cancer screening. However, previous “proof‐of‐concept” studies ignored the natural imbalance of cancer types in the population, leading the model to be biased toward learning more features in majority class during the learning process at the expense of ignoring minority class. Herein, a power‐law‐based synthetic minority oversampling technique (PL‐SMOTE) method is proposed to guide the resampling of multiclass serum SERS data by analyzing the long‐tailed (power‐law) distribution of cancer prevalence in the population. The proposed PL‐SMOTE method balances the number of minorities to resample and the number of overlaps between classes by introducing modulating factor. Modeling on resampled datasets synthesized by PL‐SMOTE verifies the effectiveness of proposed PL‐SMOTE method. After further fine‐tuning, the parameters of the deep neural network model and PL‐SMOTE method, an optimal cancer screening model with an optimal macroaveraged Recall score of 97.24% and an optimal macroaveraged F2‐Score of 97.38% is obtained. A new method for multiclass imbalanced resampling is provided, which has significant improvement on model performance in terms of SERS cancer screening. The method also inspires in other multiclass imbalanced scenario, such as biological medicine, abnormal detection, and disaster prediction.
Funder
National Natural Science Foundation of China
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献