Author:
Shen Dawei,Zhang Yaozhong,Imoto Seiya
Abstract
AbstractWhole Slide Images (WSIs) are high-resolution digital scans of entire microscope slides, extensively used in pathology to enable detailed examination of tissue samples. WSI tumor classification is a classic application of Multiple Instance Learning (MIL). In this process, a WSI is first divided into image tiles, and each tile is encoded into an embedding vector using a pretrained vision encoder. A lightweight MIL model then aggregates all the embeddings in a WSI for classification. A key factor affecting the performance of this classification is the quality of the embedding vectors. However, the embedding vectors generated by the pretrained vision encoder are continuous and not task-specific, causing them to contain significant noise and resulting in low distinguishability between tumor tiles and normal tiles. This weakens the model’s capability. In this work, inspired by VQ-VAE, we propose VQ-MIL, where each continuous embedding vector is mapped to a discrete, task-specific space using weakly supervised vector quantization. This approach effectively separates tumor instances from normal instances and reduces the noise associated with each instance. Our experiments demonstrate that our method achieves state-of-the-art classification results on two benchmark datasets. The source code is available athttps://github.com/aCoalBall/VQMIL.
Publisher
Cold Spring Harbor Laboratory