Implementing vision transformer for classifying 2D biomedical images-Reference-Cited by-同舟云学术

Implementing vision transformer for classifying 2D biomedical images

Published:2024-05-31 Issue:1 Volume:14 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Halder Arindam,Gharami Sanghita,Sadhu Priyangshu,Singh Pawan Kumar^ORCID,Woźniak Marcin^ORCID,Ijaz Muhammad Fazal^ORCID

Abstract

AbstractIn recent years, the growth spurt of medical imaging data has led to the development of various machine learning algorithms for various healthcare applications. The MedMNISTv2 dataset, a comprehensive benchmark for 2D biomedical image classification, encompasses diverse medical imaging modalities such as Fundus Camera, Breast Ultrasound, Colon Pathology, Blood Cell Microscope etc. Highly accurate classifications performed on these datasets is crucial for identification of various diseases and determining the course of treatment. This research paper presents a comprehensive analysis of four subsets within the MedMNISTv2 dataset: BloodMNIST, BreastMNIST, PathMNIST and RetinaMNIST. Each of these selected datasets is of diverse data modalities and comes with various sample sizes, and have been selected to analyze the efficiency of the model against diverse data modalities. The study explores the idea of assessing the Vision Transformer Model’s ability to capture intricate patterns and features crucial for these medical image classification and thereby transcend the benchmark metrics substantially. The methodology includes pre-processing the input images which is followed by training the ViT-base-patch16-224 model on the mentioned datasets. The performance of the model is assessed using key metrices and by comparing the classification accuracies achieved with the benchmark accuracies. With the assistance of ViT, the new benchmarks achieved for BloodMNIST, BreastMNIST, PathMNIST and RetinaMNIST are 97.90%, 90.38%, 94.62% and 57%, respectively. The study highlights the promise of Vision transformer models in medical image analysis, preparing the way for their adoption and further exploration in healthcare applications, aiming to enhance diagnostic accuracy and assist medical professionals in clinical decision-making.

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41598-024-63094-9.pdf

Reference33 articles.

1. Yang, J. et al. MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification. Sci. Data 10, 41. https://doi.org/10.1038/s41597-022-01721-8 (2023).

2. Ghalati, M. K., Nunes, A., Ferreira, H., Serranho, P. & Bernardes, R. Texture analysis and its applications in biomedical imaging: A survey. IEEE Rev. Biomed. Eng. 15, 222–246. https://doi.org/10.1109/RBME.2021.3115703 (2022).

3. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. https://arxiv.org/abs/2010.11929 (2020)

4. Sultana, F., Sufian, A., Dutta, P. Advancements in image classification using convolutional neural network. In 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Kolkata, India 122–129 (2018) https://doi.org/10.1109/ICRCICN.2018.8718718.

5. Khan, R. U., Zhang, X., Kumar, R., Aboagye, E. O. Evaluating the performance of ResNet model based on image recognition. In Proceedings of the 2018 International Conference on Computing and Artificial Intelligence (ICCAI '18). Association for Computing Machinery, New York 86–90 (2018) https://doi.org/10.1145/3194452.3194461