The assessment of fundus image quality labeling reliability among graders with different backgrounds

Author:

Laurik-Feuerstein Kornélia Lenke,Sapahia Rishav,Cabrera DeBuc Delia,Somfai Gábor MárkORCID

Abstract

Purpose For the training of machine learning (ML) algorithms, correctly labeled ground truth data are inevitable. In this pilot study, we assessed the performance of graders with different backgrounds in the labeling of retinal fundus image quality. Methods Color fundus photographs were labeled using a Python-based tool using four image categories: excellent (E), good (G), adequate (A) and insufficient for grading (I). We enrolled 8 subjects (4 with and 4 without medical background, groups M and NM, respectively) to whom a tutorial was presented on image quality requirements. We randomly selected 200 images from a pool of 18,145 expert-labeled images (50/E, 50/G, 50/A, 50/I). The performance of the grading was timed and the agreement was assessed. An additional grading round was performed with 14 labels for a more objective analysis. Results The median time (interquartile range) for the labeling task with 4 categories was 987.8 sec (418.6) for all graders and 872.9 sec (621.0) vs. 1019.8 sec (479.5) in the M vs. NM groups, respectively. Cohen’s weighted kappa showed moderate agreement (0.564) when using four categories that increased to substantial (0.637) when using only three by merging the E and G groups. By the use of 14 labels, the weighted kappa values were 0.594 and 0.667 when assigning four or three categories, respectively. Conclusion Image grading with a Python-based tool seems to be a simple yet possibly efficient solution for the labeling of fundus images according to image quality that does not necessarily require medical background. Such grading can be subject to variability but could still effectively serve the robust identification of images with insufficient quality. This emphasizes the opportunity for the democratization of ML-applications among persons with both medical and non-medical background. However, simplicity of the grading system is key to successful categorization.

Funder

NIH Clinical Center

Research to Prevent Blindness

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference25 articles.

1. data.oecd.org [Internet]. OECD iLibrary, indicators; c2016-2020 [cited on 2021 Sept 03]. Available from: https://data.oecd.org/healthcare/magnetic-resonance-imaging-mri-exams.htm#indicator-chart

2. Clinically applicable deep learning for diagnosis and referral in retinal disease;J De Fauw;Nat Med,2018

3. Accelerating ophthalmic artificial intelligence research: the role of an open access data repository;A Kras;Curr Opin Ophthalmol,2020

4. “Yes, but will it work for my patients?” Driving clinically relevant research with benchmark datasets;T Panch;npj Digit Med,2020

5. A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability;SM Khan;Lancet Digit Health,2021

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3