Estimating the Irreducible Uncertainty in Visual Diagnosis: Statistical Modeling of Skill Using Response Models

Author:

Pusic Martin V.1ORCID,Rapkiewicz Amy2,Raykov Tenko3,Melamed Jonathan2

Affiliation:

1. Department Pediatrics and Emergency Medicine, Harvard Medical School, Boston, MA, USA

2. Department of Pathology, NYU Long Island School of Medicine, New York, NY, USA

3. College of Education, Michigan State University. East Lansing, MI, USA

Abstract

Background For the representative problem of prostate cancer grading, we sought to simultaneously model both the continuous nature of the case spectrum and the decision thresholds of individual pathologists, allowing quantitative comparison of how they handle cases at the borderline between diagnostic categories. Methods Experts and pathology residents each rated a standardized set of prostate cancer histopathological images on the International Society of Urological Pathologists (ISUP) scale used in clinical practice. They diagnosed 50 histologic cases with a range of malignancy, including intermediate cases in which clear distinction was difficult. We report a statistical model showing the degree to which each individual participant can separate the cases along the latent decision spectrum. Results The slides were rated by 36 physicians in total: 23 ISUP pathologists and 13 residents. As anticipated, the cases showed a full continuous range of diagnostic severity. Cases ranged along a logit scale consistent with the consensus rating (Consensus ISUP 1: mean −0.93 [95% confidence interval {CI} −1.10 to −0.78], ISUP 2: −0.19 logits [−0.27 to −0.12]; ISUP 3: 0.56 logits [0.06–1.06]; ISUP 4 1.24 logits [1.10–1.38]; ISUP 5: 1.92 [1.80–2.04]). The best raters were able to meaningfully discriminate between all 5 ISUP categories, showing intercategory thresholds that were quantifiably precise and meaningful. Conclusions We present a method that allows simultaneous quantification of both the confusability of a particular case and the skill with which raters can distinguish the cases. Implications The technique generalizes beyond the current example to other clinical situations in which a diagnostician must impose an ordinal rating on a biological spectrum. Highlights Question: How can we quantify skill in visual diagnosis for cases that sit at the border between 2 ordinal categories—cases that are inherently difficult to diagnose? Findings: In this analysis of pathologists and residents rating prostate biopsy specimens, decision-aligned response models are calculated that show how pathologists would be likely to classify any given case on the diagnostic spectrum. Decision thresholds are shown to vary in their location and precision. Significance: Improving on traditional measures such as kappa and receiver-operating characteristic curves, this specialization of item response models allows better individual feedback to both trainees and pathologists, including better quantification of acceptable decision variation.

Publisher

SAGE Publications

Subject

Health Policy

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3