Abstract
AbstractThe use of deep learning algorithms to extract meaningful diagnostic features from biomedical images holds the promise to improve patient care given the expansion of digital pathology. Among these deep learning models, Vision Transformer (ViT) models have been demonstrated to capture long-range spatial relationships with more robust prediction power for image classification tasks than regular convolutional neural network (CNN) models, and also better model interpretability. Model interpretation is important for understanding and elucidating how a deep learning model makes predictions, especially for developing transparent models for digital pathology. However, like other deep learning algorithms, with limited annotated biomedical imaging datasets, ViT models are prone to poor performance due to overfitting, which can lead to false predictions due to random noise. Overfitting affects model interpretation when predictions are made out of random noise. To address this issue, we introduce a novel metric – Training Attention and Validation Attention Consistency (TAVAC) – for evaluating ViT model degree of overfitting on imaging datasets and quantifying the reproducibility of interpretation. Specifically, the model interpretation is performed by comparing the high-attention regions in the image between training and testing. We test the method on four publicly available image classification datasets and two independent breast cancer histological image datasets. All overfitted models exhibited significantly lower TAVAC scores than the good-fit models. The TAVAC score quantitatively measures the level of generalization of model interpretation on a fine-grained level for small groups of cells in each H&E image, which cannot be provided by traditional performance evaluation metrics like prediction accuracy. Furthermore, the application of TAVAC extends beyond medical diagnostic AI models; it enhances the monitoring of model interpretative reproducibility at pixel-resolution in basic research, to reveal critical spatial patterns and cellular structures essential to understanding biological processes and disease mechanisms. TAVAC sets a new standard for evaluating the performance of deep learning model interpretation and provides a method for determining the significance of high-attention regions detected from the attention map of the biomedical images.
Publisher
Cold Spring Harbor Laboratory
Reference41 articles.
1. Abnar, S. & Zuidema, W ., 2020. Quantifying Attention Flow in Transformers. s.l., Association for Computational Linguistics.
2. Spatial deconvolution of HER2-positive breast cancer delineates tumor-associated cell type interactions;Nature Communications,2021
3. Berger, V. & Zhou, Y ., 2014. Kolmogorov–Smirnov Test: Overview. Kolmogorov-Smirnov test: Overview. Wiley StatsRef: Statistics Reference Online, 29 September.
4. Integrating structured biological data by Kernel Maximum Mean Discrepancy
5. Bossard, L. , Guillaumin, M. & Van Gool, L. , 2014. Food-101 – Mining Discriminative Components with Random Forests. In: Computer Vision – ECCV 2014. s.l.:Springer International Publishing, pp. 446–461.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献