Affiliation:
1. Hannover Medical School
2. University Hospital Schleswig-Holstein
3. Hospital Grosshansdorf
4. MR Application Predevelopment, Siemens Healthcare GmbH
Abstract
Abstract
Purpose
To compare the performance of two AI-based software tools for detection, quantification and categorization of pulmonary nodules in a lung cancer screening (LCS) program in Northern Germany (HANSE-trial).
Method
946 low-dose baseline CT-examinations were analyzed by two AI software tools regarding lung nodule detection, quantification and categorization and compared to the final radiologist read. The relationship between detected nodule volumes by both software tools was assessed by Pearson correlation (r) and tested for significance using Wilcoxon signed-rank test. The consistency of Lung-RADS classifications was evaluated by Cohen’s kappa (κ) and percentual agreement (PA).
Results
1032 (88%) and 782 (66%) of all (n = 1174, solid, semi-solid and ground-glass) lung nodules (volume ≥ 34mm3) were detected by Software tool 1 (S1) and Software tool 2 (S2), respectively. Although, the derived volumes of true positive nodules were strongly correlated (r > 0.95), the volume derived by S2 was significantly higher than by S1 (P < 0.0001, mean difference: 6mm3). Moderate PA (62%) between S1 and S2 was found in the assignment of Lung-RADS classification (κ = 0.45). The PA of Lung-RADS classification to final read was 75% and 55% for S1 and S2.
Conclusion
Participant management depends on the assigned Lung Imaging Reporting and Data System (Lung-RADS) category, which is based on reliable detection and volumetry of pulmonary nodules. Significant nodule volume differences between AI software tools lead to different Lung-RADS scores in 38% of cases, which may result in altered participant management. Therefore, high performance and agreement of accredited AI software tools are necessary for a future national LCS program.
Publisher
Research Square Platform LLC