Abstract
Abstract
Objectives
To develop and validate machine learning models to distinguish between benign and malignant bone lesions and compare the performance to radiologists.
Methods
In 880 patients (age 33.1 ± 19.4 years, 395 women) diagnosed with malignant (n = 213, 24.2%) or benign (n = 667, 75.8%) primary bone tumors, preoperative radiographs were obtained, and the diagnosis was established using histopathology. Data was split 70%/15%/15% for training, validation, and internal testing. Additionally, 96 patients from another institution were obtained for external testing. Machine learning models were developed and validated using radiomic features and demographic information. The performance of each model was evaluated on the test sets for accuracy, area under the curve (AUC) from receiver operating characteristics, sensitivity, and specificity. For comparison, the external test set was evaluated by two radiology residents and two radiologists who specialized in musculoskeletal tumor imaging.
Results
The best machine learning model was based on an artificial neural network (ANN) combining both radiomic and demographic information achieving 80% and 75% accuracy at 75% and 90% sensitivity with 0.79 and 0.90 AUC on the internal and external test set, respectively. In comparison, the radiology residents achieved 71% and 65% accuracy at 61% and 35% sensitivity while the radiologists specialized in musculoskeletal tumor imaging achieved an 84% and 83% accuracy at 90% and 81% sensitivity, respectively.
Conclusions
An ANN combining radiomic features and demographic information showed the best performance in distinguishing between benign and malignant bone lesions. The model showed lower accuracy compared to specialized radiologists, while accuracy was higher or similar compared to residents.
Key Points
• The developed machine learning model could differentiate benign from malignant bone tumors using radiography with an AUC of 0.90 on the external test set.
• Machine learning models that used radiomic features or demographic information alone performed worse than those that used both radiomic features and demographic information as input, highlighting the importance of building comprehensive machine learning models.
• An artificial neural network that combined both radiomic and demographic information achieved the best performance and its performance was compared to radiology readers on an external test set.
Funder
Technische Universität München
Publisher
Springer Science and Business Media LLC
Subject
Radiology, Nuclear Medicine and imaging,General Medicine
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献