Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System-Reference-Cited by-同舟云学术

Inter- and Intra-Observer Agreement When Using a Diagnostic Labeling Scheme for Annotating Findings on Chest X-rays—An Early Step in the Development of a Deep Learning-Based Decision Support System

Published:2022-12-09 Issue:12 Volume:12 Page:3112
ISSN:2075-4418
Container-title:Diagnostics
language:en
Short-container-title:Diagnostics

Author:

Li Dana,Pehrson Lea Marie,Tøttrup Lea,Fraccaro Marco,Bonnevie Rasmus,Thrane Jakob,Sørensen Peter Jagd^ORCID,Rykkje Alexander,Andersen Tobias Thostrup^ORCID,Steglich-Arnholm Henrik,Stærk Dorte Marianne Rohde^ORCID,Borgwardt Lotte,Hansen Kristoffer Lindskov,Darkner Sune,Carlsen Jonathan Frederik^ORCID,Nielsen Michael Bachmann^ORCID

Abstract

Consistent annotation of data is a prerequisite for the successful training and testing of artificial intelligence-based decision support systems in radiology. This can be obtained by standardizing terminology when annotating diagnostic images. The purpose of this study was to evaluate the annotation consistency among radiologists when using a novel diagnostic labeling scheme for chest X-rays. Six radiologists with experience ranging from one to sixteen years, annotated a set of 100 fully anonymized chest X-rays. The blinded radiologists annotated on two separate occasions. Statistical analyses were done using Randolph’s kappa and PABAK, and the proportions of specific agreements were calculated. Fair-to-excellent agreement was found for all labels among the annotators (Randolph’s Kappa, 0.40–0.99). The PABAK ranged from 0.12 to 1 for the two-reader inter-rater agreement and 0.26 to 1 for the intra-rater agreement. Descriptive and broad labels achieved the highest proportion of positive agreement in both the inter- and intra-reader analyses. Annotating findings with specific, interpretive labels were found to be difficult for less experienced radiologists. Annotating images with descriptive labels may increase agreement between radiologists with different experience levels compared to annotation with interpretive labels.

Funder

Innovation Fund Denmark

Publisher

MDPI AG

Subject

Clinical Biochemistry

Link

https://www.mdpi.com/2075-4418/12/12/3112/pdf

Reference40 articles.

1. Performance Analysis Team, NHS England (2020/2021). Diagnostic Imaging Dataset Statistical Release, NHS. Available online: https://www.england.nhs.uk/statistics/statistical-work-areas/diagnostic-imaging-dataset/diagnostic-imaging-dataset-2021-22-data/.

2. Does this patient have community-acquired pneumonia? Diagnosing pneumonia by history and physical examination;Metlay;JAMA,1997

3. Kent, C. (2021). Can Tech Solve the UK Radiology Staffing Shortage?, Medical Device Network.

4. Sánchez-Marrè, M. (2022). Intelligent Decision Support Systems, Springer Nature Swtizerland AG.

5. Li, D., Mikela Vilmun, B., Frederik Carlsen, J., Albrecht-Beste, E., Ammitzbol Lauridsen, C., Bachmann Nielsen, M., and Lindskov Hansen, K. (2019). The Performance of Deep Learning Algorithms on Automatic Pulmonary Nodule Detection and Classification Tested on Different Datasets That Are Not Derived from LIDC-IDRI: A Systematic Review. Diagnostics, 9.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Ground Truth Or Dare: Factors Affecting The Creation Of Medical Datasets For Training AI;Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society;2023-08-08

2. Performance and Agreement When Annotating Chest X-ray Text Reports—A Preliminary Step in the Development of a Deep Learning-Based Prioritization and Detection System;Diagnostics;2023-03-11