Affiliation:
1. Department of Radiation Oncology, Medical Faculty of Osmangazi University, Eskişehir, Turkey
2. Eskisehir Osmangazi University Center of Research and Application for Computer Aided Diagnosis and Treatment in Health, Eskisehir, Turkey
3. Department of Chest Diseases, Medical Faculty of Osmangazi University, Eskişehir, Turkey
4. Department of Mathematics-Computer, Eskisehir Osmangazi University, Eskişehir, Turkey
Abstract
Background: Radiation pneumonitis (RP) is a dose-limiting toxicity in lung cancer radiotherapy (RT). As risk factors in the development of RP, patient and tumor characteristics, dosimetric parameters, and treatment features are intertwined, and it is not always possible to associate RP with a single parameter. This study aimed to determine the algorithm that most accurately predicted RP development with machine learning. Methods: Of the 197 cases diagnosed with stage III lung cancer and underwent RT and chemotherapy between 2014 and 2020, 193 were evaluated. The CTCAE 5.0 grading system was used for the RP evaluation. Synthetic minority oversampling technique was used to create a balanced data set. Logistic regression, artificial neural networks, eXtreme Gradient Boosting (XGB), Support Vector Machines, Random Forest, Gaussian Naive Bayes and Light Gradient Boosting Machine algorithms were used. After the correlation analysis, a permutation-based method was utilized for as a variable selection. Results: RP was seen in 51 of the 193 cases. Parameters affecting RP were determined as, total(t)V5, ipsilateral lung Dmax, contralateral lung Dmax, total lung Dmax, gross tumor volume, number of chemotherapy cycles before RT, tumor size, lymph node localization and asbestos exposure. LGBM was found to be the algorithm that best predicted RP at 85% accuracy (confidence interval: 0.73-0.96), 97% sensitivity, and 50% specificity. Conclusion: When the clinical and dosimetric parameters were evaluated together, the LGBM algorithm had the highest accuracy in predicting RP. However, in order to use this algorithm in clinical practice, it is necessary to increase data diversity and the number of patients by sharing data between centers.