Author:
Wulaningsih Wahyu,Akram Abdullah,Benemile Janella,Kathyrn Ruth,Croce Filippo,Watkins Johnathan
Abstract
AbstractImportanceThere has been growing interest in the use of artificial intelligence (deep learning) to help achieve early diagnosis of prevalent diseases. None moreso than in lung cancer, where a combination of factors, including the high prevalence of nodules, the low prevalence of malignant nodules, and the indeterminacy of many nodules mean that it is fertile ground for the deployment of accurate, high-throughput deep learning (DL)-based tools.ObjectiveTo survey the landscape of externally validated DL-based computer-aided diagnostic (CADx) models, and assess their diagnostic performance for predicting the risk of malignancy in computed tomography (CT)-detected pulmonary nodules.Data sourcesAn electronic search was performed in the MEDLINE (PubMed), EMBASE, Science Citation Index, Cochrane Library databases (from inception to 10 April 2023).Study selectionStudies were deemed eligible if they were peer-reviewed experimental or observational articles that analysed the diagnostic performance of externally validated DL-based CADx models for the prediction of malignancy risk, with a direct comparison to models widely used in clinical practice.Data extraction and synthesisPRISMA guidelines were followed for the identification, screening, and selection process. A bivariate random-effect approach for the meta-analysis on the included studies was used. Quality Assessment of Diagnosis Accuracy Studies 2 (QUADAS-2) was used to assess risk of bias and applicability.Main outcomes and measuresMain outcomes included sensitivity, specificity, and area under the curve (AUC).ResultsAfter screening, 20 studies were included, comprising 7,664 participants and 10,128 nodules, of which 2,126 were malignant. DL-based CADx models were 15.8% more sensitive than physician judgement alone, and 35.4% more than clinical risk models alone. They had a similar pooled specificity as physician judgement alone (0.77 [95% CI: 0.69 –0.84]v0.80 [95% CI: 0.71 –0.86], respectively), but were 5.5% more specific than clinical risk models alone. Accounting for threshold effects, DL-based CADx models had superior summary areas under the receiver operating characteristic curve (sAUROC), with relative sAUROCs of 1.06 (95% CI: 1.03–1.08) and 1.22 (95% CI: 1.19–1.24), as compared to physician judgement and clinical risk models alone, respectively.Conclusions and relevanceDL-based models show superior or comparable diagnostic performance when externally validated against widely used methods, such as the Brock and Mayo models. They have the potential to fulfil an unmet clinical-management need alongside experienced physician image readers. The included studies reported a high degree of heterogeneity, with threshold effects particularly prominent. Future research may consider more prospective studies and human-experimental studies.Key pointsQuestionHow effective are image-based, computer-aided diagnostic models that use deep learning methods to predict the malignancy risk of pulmonary nodules as compared with other methods used in clinical practice?FindingsThis systematic review and meta-analysis identified 20 observational studies (7,664 participants; 10,128 pulmonary nodules) from which pooled analyses found deep learning-based models to have a sensitivity of 0.88, specificity of 0.77, and summary area under the curve of 0.90 in predicting malignancy in pulmonary nodules. This was superior or comparable to other methods routinely used in clinical practice.MeaningDeep learning-based models are already being used in clinical practice in certain settings for nodule management. The results show their diagnostic performance justifies wider and more routine deployment.
Publisher
Cold Spring Harbor Laboratory