Author:
Athalye Chinmayee,van Nisselrooij Amber,Rizvi Sara,Haak Monique,Moon-Grady Anita J.,Arnaout Rima
Abstract
AbstractObjectiveCongenital heart defects (CHD) are still missed despite nearly universal prenatal ultrasound screening programs, which may result in severe morbidity or even death. Deep machine learning (DL) can automate image recognition from ultrasound. The aim of this study was to apply a previously developed DL model trained on images from a tertiary center, to fetal ultrasound images obtained during the second-trimester standard anomaly scan in a low-risk population.MethodsAll pregnancies with isolated severe CHD in the Northwestern region of the Netherlands between 2015 and 2016 with available stored images were evaluated, as well as a sample of normal fetuses’ examinations from the same region. We compared initial clinical diagnostic accuracy (made in real time), model accuracy, and performance of blinded human experts with access only to the stored images (like the model). We analyzed performance by study characteristics such as duration, quality (independently scored by study investigators), number of stored images, and availability of screening views.ResultsA total of 42 normal fetuses and 66 cases of isolated CHD at birth were analyzed. Of the abnormal cases, 31 were missed and 35 were detected at the time of the clinical anatomy scan (sensitivity 53 percent). Model sensitivity and specificity was 91 and 93 percent, respectively. Blinded human experts (n=3) achieved sensitivity and specificity of 55±10 percent (range 47-67 percent) and 71±13 percent (range 57-83 percent), respectively. There was a statistically significant difference in model correctness by expert-grader quality score (p=0.04). Abnormal cases included 19 lesions the model had not encountered in its training; the model’s performance (15/19 correct) was not statistically significantly different on previously encountered vs. never before seen lesions (p=0.07).ConclusionsA previously trained DL algorithm out-performed human experts in detecting CHD in a cohort in which over 50 percent of CHD cases were initially missed clinically. Notably, the DL algorithm performed well on community-acquired images in a low-risk population, including lesions it had not been previously exposed to. Furthermore, when both the model and blinded human experts had access to stored images alone, the model outperformed expert humans. Together, these findings support the proposition that use of DL models can improve prenatal detection of CHD.
Publisher
Cold Spring Harbor Laboratory