Development and external validation of automated detection, classification, and localization of ankle fractures: inside the black box of a convolutional neural network (CNN)

Author:

Prijs JasperORCID,Liao Zhibin,To Minh-Son,Verjans Johan,Jutte Paul C.,Stirler Vincent,Olczak Jakub,Gordon Max,Guss Daniel,DiGiovanni Christopher W.,Jaarsma Ruurd L.,IJpma Frank F. A.,Doornberg Job N.,Aksakal Kaan,Barvelink Britt,Beuker Benn,Bultra Anne Eva,Oliviera Luisa e Carmo,Colaris Joost,de Klerk Huub,Duckworth Andrew,ten Duis Kaj,Fennema Eelco,Harbers Jorrit,Hendrickx Ran,Heng Merilyn,Hoeksema Sanne,Hogervorst Mike,Jadav Bhavin,Jiang Julie,Karhade Aditya,Kerkhoffs Gino,Kuipers Joost,Laane Charlotte,Langerhuizen David,Lubberts Bart,Mallee Wouter,Mhmud Haras,El Moumni Mostafa,Nieboer Patrick,Nijhuis Koen Oude,van Ooijen Peter,Oosterhoff Jacobien,Rawat Jai,Ring David,Schilstra Sanne,Schwab Jospeph,Sprague Sheila,Stufkens Sjoerd,Tijdens Elvira,van der Bekerom Michel,van der Vet Puck,de Vries Jean- Paul,Wendt Klaus,Wijffels Matthieu,Worsley David,

Abstract

Abstract Purpose Convolutional neural networks (CNNs) are increasingly being developed for automated fracture detection in orthopaedic trauma surgery. Studies to date, however, are limited to providing classification based on the entire image—and only produce heatmaps for approximate fracture localization instead of delineating exact fracture morphology. Therefore, we aimed to answer (1) what is the performance of a CNN that detects, classifies, localizes, and segments an ankle fracture, and (2) would this be externally valid? Methods The training set included 326 isolated fibula fractures and 423 non-fracture radiographs. The Detectron2 implementation of the Mask R-CNN was trained with labelled and annotated radiographs. The internal validation (or ‘test set’) and external validation sets consisted of 300 and 334 radiographs, respectively. Consensus agreement between three experienced fellowship-trained trauma surgeons was defined as the ground truth label. Diagnostic accuracy and area under the receiver operator characteristic curve (AUC) were used to assess classification performance. The Intersection over Union (IoU) was used to quantify accuracy of the segmentation predictions by the CNN, where a value of 0.5 is generally considered an adequate segmentation. Results The final CNN was able to classify fibula fractures according to four classes (Danis-Weber A, B, C and No Fracture) with AUC values ranging from 0.93 to 0.99. Diagnostic accuracy was 89% on the test set with average sensitivity of 89% and specificity of 96%. External validity was 89–90% accurate on a set of radiographs from a different hospital. Accuracies/AUCs observed were 100/0.99 for the ‘No Fracture’ class, 92/0.99 for ‘Weber B’, 88/0.93 for ‘Weber C’, and 76/0.97 for ‘Weber A’. For the fracture bounding box prediction by the CNN, a mean IoU of 0.65 (SD ± 0.16) was observed. The fracture segmentation predictions by the CNN resulted in a mean IoU of 0.47 (SD ± 0.17). Conclusions This study presents a look into the ‘black box’ of CNNs and represents the first automated delineation (segmentation) of fracture lines on (ankle) radiographs. The AUC values presented in this paper indicate good discriminatory capability of the CNN and substantiate further study of CNNs in detecting and classifying ankle fractures. Level of evidence II, Diagnostic imaging study.

Publisher

Springer Science and Business Media LLC

Subject

Critical Care and Intensive Care Medicine,Orthopedics and Sports Medicine,Emergency Medicine,Surgery

Cited by 10 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3