Abstract
SUMMARYHaematoxilin and eosin (H&E) stained slides are commonly used as the gold standard for disease diagnosis. Remarkable progress in the deep learning field in recent years has enabled the detection of complex molecular patterns within such histopathology slides, suggesting automated approaches could help inform pathologists’ decisions. In this context, Multiple Instance Learning (MIL) algorithms have been shown to outperform Transfer Learning (TL) based methods for a variety of tasks. However, there is still a considerable complexity to implementing and using such methods for computational biology research and clinical practice. We introduce HistoMIL, a Python package designed to simplify the implementation, training, and inference process of MIL-based algorithms for computational pathologists and biomedical researchers. In HistoMIL, we have integrated a self-supervised learning-based module to train the feature encoder, a full pipeline encompassing TL as well as three MIL algorithms, namely ABMIL (1), DSMIL (2), and TransMIL (3). By utilising the PyTorch Lightning framework (4), HistoMIL enables effortless customization of training intricacies and implementation of novel algorithms. We illustrate the capabilities of HistoMIL by building predictive models for 2,487 cancer hallmark genes on breast cancer histology slides from The Cancer Genome Atlas, on which we demonstrate AUROC performances of up to 85%. Cell proliferation processes were most easily detected, shedding light on the opportunities but also limitations of applying deep learning for gene expression detection. The HistoMIL package is proposed as a tool to simplify the implementation and usage of deep learning tasks for researchers.
Publisher
Cold Spring Harbor Laboratory
Reference53 articles.
1. Leiby JS , Hao J , Kang GH , Park JW , Kim D , editors. Attention-based multiple instance learning with self-supervision to predict microsatellite instability in colorectal cancer from histology whole-slide images. 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC); 2022: IEEE.
2. Li B , Li Y , Eliceiri KW , editors. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2021.
3. Transmil: Transformer based correlated multiple instance learning for whole slide image classification;Advances in neural information processing systems,2021
4. the PyTorch Lightning team;Pytorch lightning,2019
5. Deep learning