Author:
Ndiaye Amadou Diaw,Gasqui Marie Agnès,Millioz Fabien,Perard Matthieu,Leye Benoist Fatou,Grosgogeat Brigitte
Abstract
<b><i>Introduction:</i></b> A growing number of studies on diagnostic imaging show superior efficiency and accuracy of computer-aided diagnostic systems compared to those of certified dentists. This methodological systematic review aimed to evaluate the different methodological approaches used by studies focusing on machine learning and deep learning that have used radiographic databases to classify, detect, and segment dental caries. <b><i>Methods:</i></b> The protocol was registered in PROSPERO before data collection (CRD42022348097). Literature research was performed in MEDLINE, Embase, IEEE Xplore, and Web of Science until December 2022, without language restrictions. Studies and surveys using a dental radiographic database for the classification, detection, or segmentation of carious lesions were sought. Records deemed eligible were retrieved and further assessed for inclusion by two reviewers who resolved any discrepancies through consensus. A third reviewer was consulted when any disagreements or discrepancies persisted between the two reviewers. After data extraction, the same reviewers assessed the methodological quality using the CLAIM and QUADAS-AI checklists. <b><i>Results:</i></b> After screening 325 articles, 35 studies were eligible and included. The bitewing was the most commonly used radiograph (<i>n</i> = 17) at the time when detection (<i>n</i> = 15) was the most explored computer vision task. The sample sizes used ranged from 95 to 38,437, while the augmented training set ranged from 300 to 315,786. Convolutional neural network was the most commonly used model. The mean completeness of CLAIM items was 49% (SD ± 34%). The applicability of the CLAIM checklist items revealed several weaknesses in the methodology of the selected studies: most of the studies were monocentric, and only 9% of them used an external test set when evaluating the model’s performance. The QUADAS-AI tool revealed that only 43% of the studies included in this systematic review were at low risk of bias concerning the standard reference domain. <b><i>Conclusion:</i></b> This review demonstrates that the overall scientific quality of studies conducted to feed artificial intelligence algorithms is low. Some improvement in the design and validation of studies can be made with the development of a standardized guideline for the reproducibility and generalizability of results and, thus, their clinical applications.