Author:
Liu Shiwei,Yue Wenwen,Guo Zhiqing,Wang Liejun
Abstract
AbstractVisual Transformers(ViT) have made remarkable achievements in the field of medical image analysis. However, ViT-based methods have poor classification results on some small-scale medical image classification datasets. Meanwhile, many ViT-based models sacrifice computational cost for superior performance, which is a great challenge in practical clinical applications. In this paper, we propose an efficient medical image classification network based on an alternating mixture of CNN and Transformer tandem, which is called Eff-CTNet. Specifically, the existing ViT-based method still mainly relies on multi-head self-attention (MHSA). Among them, the attention maps of MHSA are highly similar, which leads to computational redundancy. Therefore, we propose a group cascade attention (GCA) module to split the feature maps, which are provided to different attention heads to further improves the diversity of attention and reduce the computational cost. In addition, we propose an efficient CNN (EC) module to enhance the ability of the model and extract the local detail information in medical images. Finally, we connect them and design an efficient hybrid medical image classification network, namely Eff-CTNet. Extensive experimental results show that our Eff-CTNet achieves advanced classification performance with less computational cost on three public medical image classification datasets.
Funder
the 2023 Xinjiang Uygur Autonomous Region Postgraduate Research Innovation project
the National Science Foundation of China
the Tianshan Talent Training Program
Publisher
Springer Science and Business Media LLC
Reference44 articles.
1. Li, Q. et al. Medical image classification with convolutional neural network. In 2014 13th international conference on control automation robotics & vision (ICARCV), 844–848 (IEEE, 2014).
2. Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, 234–241 (Springer, 2015).
3. Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 (OpenReview.net), (2021).
4. Dai, Y., Gao, Y. & Liu, F. Transmed: Transformers advance multi-modal medical image classification. Diagnostics 11, 1384 (2021).
5. Shou, Y. et al. Object detection in medical images based on hierarchical transformer and mask mechanism. Comput. Intell. Neurosci.2022 (2022).