Development and external validation of automated detection, classification, and localization of ankle fractures: inside the black box of a convolutional neural network (CNN)
-
Published:2022-11-14
Issue:
Volume:
Page:
-
ISSN:1863-9933
-
Container-title:European Journal of Trauma and Emergency Surgery
-
language:en
-
Short-container-title:Eur J Trauma Emerg Surg
Author:
Prijs JasperORCID, Liao Zhibin, To Minh-Son, Verjans Johan, Jutte Paul C., Stirler Vincent, Olczak Jakub, Gordon Max, Guss Daniel, DiGiovanni Christopher W., Jaarsma Ruurd L., IJpma Frank F. A., Doornberg Job N., Aksakal Kaan, Barvelink Britt, Beuker Benn, Bultra Anne Eva, Oliviera Luisa e Carmo, Colaris Joost, de Klerk Huub, Duckworth Andrew, ten Duis Kaj, Fennema Eelco, Harbers Jorrit, Hendrickx Ran, Heng Merilyn, Hoeksema Sanne, Hogervorst Mike, Jadav Bhavin, Jiang Julie, Karhade Aditya, Kerkhoffs Gino, Kuipers Joost, Laane Charlotte, Langerhuizen David, Lubberts Bart, Mallee Wouter, Mhmud Haras, El Moumni Mostafa, Nieboer Patrick, Nijhuis Koen Oude, van Ooijen Peter, Oosterhoff Jacobien, Rawat Jai, Ring David, Schilstra Sanne, Schwab Jospeph, Sprague Sheila, Stufkens Sjoerd, Tijdens Elvira, van der Bekerom Michel, van der Vet Puck, de Vries Jean- Paul, Wendt Klaus, Wijffels Matthieu, Worsley David,
Abstract
Abstract
Purpose
Convolutional neural networks (CNNs) are increasingly being developed for automated fracture detection in orthopaedic trauma surgery. Studies to date, however, are limited to providing classification based on the entire image—and only produce heatmaps for approximate fracture localization instead of delineating exact fracture morphology. Therefore, we aimed to answer (1) what is the performance of a CNN that detects, classifies, localizes, and segments an ankle fracture, and (2) would this be externally valid?
Methods
The training set included 326 isolated fibula fractures and 423 non-fracture radiographs. The Detectron2 implementation of the Mask R-CNN was trained with labelled and annotated radiographs. The internal validation (or ‘test set’) and external validation sets consisted of 300 and 334 radiographs, respectively. Consensus agreement between three experienced fellowship-trained trauma surgeons was defined as the ground truth label. Diagnostic accuracy and area under the receiver operator characteristic curve (AUC) were used to assess classification performance. The Intersection over Union (IoU) was used to quantify accuracy of the segmentation predictions by the CNN, where a value of 0.5 is generally considered an adequate segmentation.
Results
The final CNN was able to classify fibula fractures according to four classes (Danis-Weber A, B, C and No Fracture) with AUC values ranging from 0.93 to 0.99. Diagnostic accuracy was 89% on the test set with average sensitivity of 89% and specificity of 96%. External validity was 89–90% accurate on a set of radiographs from a different hospital. Accuracies/AUCs observed were 100/0.99 for the ‘No Fracture’ class, 92/0.99 for ‘Weber B’, 88/0.93 for ‘Weber C’, and 76/0.97 for ‘Weber A’. For the fracture bounding box prediction by the CNN, a mean IoU of 0.65 (SD ± 0.16) was observed. The fracture segmentation predictions by the CNN resulted in a mean IoU of 0.47 (SD ± 0.17).
Conclusions
This study presents a look into the ‘black box’ of CNNs and represents the first automated delineation (segmentation) of fracture lines on (ankle) radiographs. The AUC values presented in this paper indicate good discriminatory capability of the CNN and substantiate further study of CNNs in detecting and classifying ankle fractures.
Level of evidence
II, Diagnostic imaging study.
Publisher
Springer Science and Business Media LLC
Subject
Critical Care and Intensive Care Medicine,Orthopedics and Sports Medicine,Emergency Medicine,Surgery
Reference42 articles.
1. Adams M, Chen W, Holcdorf D, McCusker MW, Howe PD, Gaillard F. Computer vs human: Deep learning versus perceptual training for the detection of neck of femur fractures. J Med Imaging Radiat Oncol. 2019;63(1):27–32. 2. Badgeley MA, Zech JR, Oakden-Rayner L, Glicksberg BS, Liu M, Gale W, McConnell MV, Percha B, Snyder TM, Dudley JT. Deep learning predicts hip fracture using confounding patient and healthcare variables. NPJ Digit Med. 2019;2:31. 3. Oliveira ECL, van den Merkhof A, Olczak J, Gordon M, Jutte PC, Jaarsma RL, Ijpma FFA, Doornberg JN, Prijs J. An increasing number of convolutional neural networks for fracture recognition and classification in orthopaedics: are these externally validated and ready for clinical application? Bone Jt Open. 2021;2(10):879–85. 4. Choi JW, Cho YJ, Lee S, Lee J, Lee S, Choi YH, Cheon J-E, Ha JY. Using a dual-input convolutional neural network for automated detection of pediatric supracondylar fracture on conventional radiography. Invest Radiol. 2020;55(2):101–10. 5. Chung SW, Han SS, Lee JW, Oh KS, Kim NR, Yoon JP, Kim JY, Moon SH, Kwon J, Lee HJ, Noh YM, Kim Y. Automated detection and classification of the proximal humerus fracture by using deep learning algorithm. Acta Orthop. 2018;89(4):468–73.
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|