Improving reference standards for validation of AI-based radiography-Reference-Cited by-同舟云学术

Improving reference standards for validation of AI-based radiography

Published:2021-07-01 Issue:1123 Volume:94 Page:20210435
ISSN:0007-1285
Container-title:The British Journal of Radiology
language:en
Short-container-title:BJR

Author:

Duggan Gavin E¹^ORCID,Reicher Joshua J¹,Liu Yun¹,Tse Daniel¹,Shetty Shravya¹

Affiliation:

1. Google Health (G.E.D., Y.L., D.T., S.S.), Stanford Health Care and Palo Alto Veterans Affairs (J.J.R.), California, California, USA

Abstract

Objective: Demonstrate the importance of combining multiple readers' opinions, in a context-aware manner, when establishing the reference standard for validation of artificial intelligence (AI) applications for, e.g. chest radiographs. By comparing individual readers, majority vote of a panel, and panel-based discussion, we identify methods which maximize interobserver agreement and label reproducibility. Methods: 1100 frontal chest radiographs were evaluated for 6 findings: airspace opacity, cardiomegaly, pulmonary edema, fracture, nodules, and pneumothorax. Each image was reviewed by six radiologists, first individually and then via asynchronous adjudication (web-based discussion) in two panels of three readers to resolve disagreements within each panel. We quantified the reproducibility of each method by measuring interreader agreement. Results: Panel-based majority vote improved agreement relative to individual readers for all findings. Most disagreements were resolved with two rounds of adjudication, which further improved reproducibility for some findings, particularly reducing misses. Improvements varied across finding categories, with adjudication improving agreement for cardiomegaly, fractures, and pneumothorax. Conclusion: The likelihood of interreader agreement, even within panels of US board-certified radiologists, must be considered before reads can be used as a reference standard for validation of proposed AI tools. Agreement and, by extension, reproducibility can be improved by applying majority vote, maximum sensitivity, or asynchronous adjudication for different findings, which supports the development of higher quality clinical research. Advances in knowledge: A panel of three experts is a common technique for establishing reference standards when ground truth is not available for use in AI validation. The manner in which differing opinions are resolved is shown to be important, and has not been previously explored.

Publisher

British Institute of Radiology

Subject

Radiology, Nuclear Medicine and imaging,General Medicine

Link

https://www.birpublications.org/doi/pdf/10.1259/bjr.20210435

Reference24 articles.

1. Current perspectives in medical image perception

2. Supervised learning from multiple experts

3. Fleischner Society: Glossary of Terms for Thoracic Imaging

4. Interobserver Variability in the Interpretation of Chest Roentgenograms of Patients With Possible Pneumonia

5. Measuring Performance in Chest Radiography

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Transparency in Medicine: How eXplainable AI is Revolutionizing Patient Care;2023 International Conference on Network, Multimedia and Information Technology (NMITCON);2023-09-01

2. Critical Appraisal of Artificial Intelligence–Enabled Imaging Tools Using the Levels of Evidence System;American Journal of Neuroradiology;2023-04-20

3. Hurdles to Artificial Intelligence Deployment: Noise in Schemas and “Gold” Labels;Radiology: Artificial Intelligence;2023-03-01

4. Multicentre external validation of a commercial artificial intelligence software to analyse chest radiographs in health screening environments with low disease prevalence;European Radiology;2023-01-10

5. Simplified Transfer Learning for Chest Radiography Models Using Less Data;Radiology;2022-11