Author:
Maleki Danial,Rahnamayan Shahryar,Tizhoosh H. R.
Abstract
AbstractThe exponential growth of data across various medical domains has generated a substantial demand for techniques to analyze multimodal big data. This demand is particularly pronounced in fields such as computational pathology due to the diverse nature of the tissue. Cross-modal retrieval aims to identify a common latent space where different modalities, such as image-text pairs, exhibit close alignment. The primary challenge, however, often lies in the representation of tissue features. While language models can be trained relatively easily, visual models frequently struggle due to the scarcity of labeled data. To address this issue, the innovative concept of harmonization has been introduced, extending the learning scheme distillation without supervision, known as DINO. The harmonization of scale refines the DINO paradigm through a novel patching approach, overcoming the complexities posed by gigapixel whole slide images in digital pathology. Experiments conducted on diverse datasets have demonstrated that the proposed approach significantly enhances cross-modal retrieval in tissue imaging. Moreover, it exhibits vast potential for other fields that rely on gigapixel imaging.
Publisher
Springer Science and Business Media LLC
Reference50 articles.
1. Ramesh, A. et al. Zero-shot text-to-image generation. arXiv:2102.12092 (2021).
2. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125 (2022).
3. Kalra, S. et al. Yottixel-an image search engine for large archives of histopathology whole slide images. Med. Image Anal. 65, 101757 (2020).
4. Kalra, S. et al. Pan-cancer diagnostic consensus through searching archival histopathology images using artificial intelligence. NPJ Digit. Med. 3, 31 (2020).
5. Baltrušaitis, T., Ahuja, C. & Morency, L.-P. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2018).