Abstract
Background
Temporal bone computed tomography (CT) helps diagnose chronic otitis media (COM). However, its interpretation requires training and expertise. Artificial intelligence (AI) can help clinicians evaluate COM through CT scans, but existing models lack transparency and may not fully leverage multidimensional diagnostic information.
Objective
We aimed to develop an explainable AI system based on 3D convolutional neural networks (CNNs) for automatic CT-based evaluation of COM.
Methods
Temporal bone CT scans were retrospectively obtained from patients operated for COM between December 2015 and July 2021 at 2 independent institutes. A region of interest encompassing the middle ear was automatically segmented, and 3D CNNs were subsequently trained to identify pathological ears and cholesteatoma. An ablation study was performed to refine model architecture. Benchmark tests were conducted against a baseline 2D model and 7 clinical experts. Model performance was measured through cross-validation and external validation. Heat maps, generated using Gradient-Weighted Class Activation Mapping, were used to highlight critical decision-making regions. Finally, the AI system was assessed with a prospective cohort to aid clinicians in preoperative COM assessment.
Results
Internal and external data sets contained 1661 and 108 patients (3153 and 211 eligible ears), respectively. The 3D model exhibited decent performance with mean areas under the receiver operating characteristic curves of 0.96 (SD 0.01) and 0.93 (SD 0.01), and mean accuracies of 0.878 (SD 0.017) and 0.843 (SD 0.015), respectively, for detecting pathological ears on the 2 data sets. Similar outcomes were observed for cholesteatoma identification (mean area under the receiver operating characteristic curve 0.85, SD 0.03 and 0.83, SD 0.05; mean accuracies 0.783, SD 0.04 and 0.813, SD 0.033, respectively). The proposed 3D model achieved a commendable balance between performance and network size relative to alternative models. It significantly outperformed the 2D approach in detecting COM (P≤.05) and exhibited a substantial gain in identifying cholesteatoma (P<.001). The model also demonstrated superior diagnostic capabilities over resident fellows and the attending otologist (P<.05), rivaling all senior clinicians in both tasks. The generated heat maps properly highlighted the middle ear and mastoid regions, aligning with human knowledge in interpreting temporal bone CT. The resulting AI system achieved an accuracy of 81.8% in generating preoperative diagnoses for 121 patients and contributed to clinical decision-making in 90.1% cases.
Conclusions
We present a 3D CNN model trained to detect pathological changes and identify cholesteatoma via temporal bone CT scans. In both tasks, this model significantly outperforms the baseline 2D approach, achieving levels comparable with or surpassing those of human experts. The model also exhibits decent generalizability and enhanced comprehensibility. This AI system facilitates automatic COM assessment and shows promising viability in real-world clinical settings. These findings underscore AI’s potential as a valuable aid for clinicians in COM evaluation.
Trial Registration
Chinese Clinical Trial Registry ChiCTR2000036300; https://www.chictr.org.cn/showprojEN.html?proj=58685