Quantifying Interpretation Reproducibility in Vision Transformer Models with TAVAC

Author:

Zhao Yue,Agyemang Dylan,Liu Yang,Mahoney Matt,Li ShengORCID

Abstract

AbstractThe use of deep learning algorithms to extract meaningful diagnostic features from biomedical images holds the promise to improve patient care given the expansion of digital pathology. Among these deep learning models, Vision Transformer (ViT) models have been demonstrated to capture long-range spatial relationships with more robust prediction power for image classification tasks than regular convolutional neural network (CNN) models, and also better model interpretability. Model interpretation is important for understanding and elucidating how a deep learning model makes predictions, especially for developing transparent models for digital pathology. However, like other deep learning algorithms, with limited annotated biomedical imaging datasets, ViT models are prone to poor performance due to overfitting, which can lead to false predictions due to random noise. Overfitting affects model interpretation when predictions are made out of random noise. To address this issue, we introduce a novel metric – Training Attention and Validation Attention Consistency (TAVAC) – for evaluating ViT model degree of overfitting on imaging datasets and quantifying the reproducibility of interpretation. Specifically, the model interpretation is performed by comparing the high-attention regions in the image between training and testing. We test the method on four publicly available image classification datasets and two independent breast cancer histological image datasets. All overfitted models exhibited significantly lower TAVAC scores than the good-fit models. The TAVAC score quantitatively measures the level of generalization of model interpretation on a fine-grained level for small groups of cells in each H&E image, which cannot be provided by traditional performance evaluation metrics like prediction accuracy. Furthermore, the application of TAVAC extends beyond medical diagnostic AI models; it enhances the monitoring of model interpretative reproducibility at pixel-resolution in basic research, to reveal critical spatial patterns and cellular structures essential to understanding biological processes and disease mechanisms. TAVAC sets a new standard for evaluating the performance of deep learning model interpretation and provides a method for determining the significance of high-attention regions detected from the attention map of the biomedical images.

Publisher

Cold Spring Harbor Laboratory

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3