iQC: machine-learning-driven prediction of surgical procedure uncovers systematic confounds of cancer whole slide images in specific medical centers

Author:

Schaumberg Andrew J.ORCID,Lewis Michael S.ORCID,Nazarian RaminORCID,Wadhwa Ananta,Kane Nathanael,Turner Graham,Karnam Purushotham,Devineni Poornima,Wolfe Nicholas,Kintner Randall,Rettig Matthew B.ORCID,Knudsen Beatrice S.ORCID,Garraway Isla P.ORCID,Pyarajan SaijuORCID

Abstract

AbstractProblemThe past decades have yielded an explosion of research using artificial intelligence for cancer detection and diagnosis in the field of computational pathology. Yet, an often unspoken assumption of this research is that a glass microscopy slide faithfully represents the underlying disease. Here we show systematic failure modes may dominate the slides digitized from a given medical center, such that neither the whole slide images nor the glass slides are suitable for rendering a diagnosis.MethodsWe quantitatively define high quality data as a set of whole slide images where the type of surgery the patient received may be accurately predicted by an automated system such as ours, called “iQC”. We find iQC accurately distinguished biopsies from nonbiopsies, e.g. prostatectomies or transurethral resections (TURPs, a.k.a. prostate chips), only when the data qualitatively appeared to be high quality, e.g. vibrant histopathology stains and minimal artifacts.Crucially, prostate needle biopsies appear as thin strands of tissue, whereas prostatectomies and TURPs appear as larger rectangular blocks of tissue. Therefore, when the data are of high quality, iQC (i) accurately classifies pixels as tissue, (ii) accurately generates statistics that describe the distribution of tissue in a slide, and (iii)accurately predicts surgical procedure from said statistics.We additionally compare our “iQC” to “HistoQC”, both in terms of how many slides are excluded and how much tissue is identified in the slides.ResultsWhile we do not control any medical center’s protocols for making or storing slides, we developed the iQC tool to hold all medical centers and datasets to the same objective standard of quality. We validate this standard across five Veterans Affairs Medical Centers (VAMCs) and the Automated Gleason Grading Challenge (AGGC) 2022 public dataset. For our surgical procedure prediction task, we report an Area Under Receiver Operating Characteristic (AUROC) of 0.9966-1.000 at the VAMCs that consistently produce high quality data and AUROC of 0.9824 for the AGGC dataset. In contrast, we report an AUROC of 0.7115 at the VAMC that consistently produced poor quality data. An attending pathologist determined poor data quality was likely driven by faded histopathology stains and protocol differences among VAMCs. Corroborating this, iQC’s novel stain strength statistic finds this institution has significantly weaker stains (p <2.2×1016, two-tailed Wilcoxon rank-sum test) than the VAMC that contributed the most slides, and this stain strength difference is a large effect (Cohen’sd= 1.208).In addition to accurately detecting the distribution of tissue in slides, we find iQC recommends only 2 of 3736 VAMC slides (0.005%) be reviewed for inadequate tissue. With its default configuration file, HistoQC excluded 89.9% of VAMC slides because tissue was not detected in these slides. With our customized configuration file for HistoQC, we reduced this to 16.7% of VAMC slides. Strikingly, the default configuration of HistoQC included 94.0% of the 1172 prostate cancer slides from The Cancer Genome Atlas (TCGA), which may suggest HistoQC defaults were calibrated against TCGA data but this calibration did not generalize well to non-TCGA datasets. For VAMC and TCGA, we find a negligible to small degree of agreement in the include/exclude status of slides, which may suggest iQC and HistoQC are not equivalent.ConclusionOur surgical procedure prediction AUROC may be a quantitative indicator positively associated with high data quality at a medical center or for a specific dataset. We find iQC accurately identifies tissue in slides and excludes few slides, unless the data are poor quality. To produce high quality data, we recommend producing slides using robotics or other forms of automation whenever possible. We recommend scanning slides digitally before the glass slide has time to develop signs of age, e.g faded stains and acrylamide bubbles. We recommend using high-quality reagents to stain and mount slides, which may slow aging. We recommend protecting stored slides from ultraviolet light, from humidity, and from changes in temperature. To our knowledge, iQC is the first automated system in computational pathology that validates data quality against objective evidence, e.g. surgical procedure data available in the EHR or LIMS, which requires zero efforts or annotations from anatomic pathologists. Please seehttps://github.com/schaumba/iqcandhttps://doi.org/10.17605/OSF.IO/AVD3Zfor instructions and updates.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3