Affiliation:
1. School of Science, Hubei University of Technology, Wuhan 430068, China
2. School of Computer Science and Technology, Wuhan University of Bioengineering, Wuhan 430415, China
Abstract
In order to improve the classification effect of the logistic regression (LR) model for breast cancer prediction, a new hybrid feature selection method is proposed to process the data, using the Pearson correlation test and the iterative random forest algorithm based on out-of-bag estimation (RF-OOB) to screen the optimal 17 features as inputs to the model. Secondly, the LR is optimized using the batch gradient descent (BGD-LR) algorithm to train the loss function of the model to minimize the loss. In order to protect the privacy of breast cancer patients, a differential privacy protection technology is added to the BGD-LR model, and an LR optimization model based on differential privacy with batch gradient descent (BDP-LR) is constructed. Finally, experiments are carried out on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. Meanwhile, accuracy, precision, recall, and F1-score are selected as the four main evaluation indicators. Moreover, the hyperparameters of each model are determined by the grid search method and the cross-validation method. The experimental results show that after hybrid feature selection, the optimal results of the four main evaluation indicators of the BGD-LR model are 0.9912, 1, 0.9886, and 0.9943, in which the accuracy, recall, and F1-scores are increased by 2.63%, 3.41%, and 1.76%, respectively. For the BDP-LR model, when the privacy budget ε is taken as 0.8, the classification performance and privacy protection effect of the model reach an effective balance. At the same time, the four main evaluation indicators of the model are 0.9721, 0.9975, 0.9664, and 0.9816, which are improved by 1.58%, 0.26%, 1.81%, and 1.07%, respectively. Comparative analysis shows that the models of BGD-LR and BDP-LR constructed in this paper perform better than other classification models.
Funder
National Natural Science Foundation of China
teaching and research project of Hubei Provincial Department of Education
doctoral startup fund of Hubei University of Technology
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference51 articles.
1. Overview and countermeasures of cancer burden in China;Wang;Sci. China Life Sci.,2023
2. An efficient transfer learning based cross model classification (TLBCM) technique for the prediction of breast cancer;Jakkaladiki;PeerJ Comput. Sci.,2023
3. Classification Prediction of Breast Cancer Based on Machine Learning;Chen;Comput. Intell. Neurosci.,2023
4. Xiao, X. (2021). A Study of the Correlation between the Pathologic, Ultrasound, and MRI Manifestations of Breast Cancer and Localized Intravascular Cancerous Emboli. [Master’s Thesis, University of South China].
5. Sonoporation: Applications for Cancer Therapy;Qin;Adv. Exp. Med. Biol.,2016