MetaSwin: a unified meta vision transformer model for medical image segmentation-Reference-Cited by-同舟云学术

MetaSwin: a unified meta vision transformer model for medical image segmentation

Published:2024-01-03 Issue: Volume:10 Page:e1762
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Lee Soyeon¹,Lee Minhyeok²

Affiliation:

1. Department of Intelligent Semiconductor Engineering, Chung-Ang University, Seoul, South Korea

2. School of Electrical and Electronics Engineering, Chung-Ang University, Seoul, South Korea

Abstract

Transformers have demonstrated significant promise for computer vision tasks. Particularly noteworthy is SwinUNETR, a model that employs vision transformers, which has made remarkable advancements in improving the process of segmenting medical images. Nevertheless, the efficacy of training process of SwinUNETR has been constrained by an extended training duration, a limitation primarily attributable to the integration of the attention mechanism within the architecture. In this article, to address this limitation, we introduce a novel framework, called the MetaSwin model. Drawing inspiration from the MetaFormer concept that uses other token mix operations, we propose a transformative modification by substituting attention-based components within SwinUNETR with a straightforward yet impactful spatial pooling operation. Additionally, we incorporate of Squeeze-and-Excitation (SE) blocks after each MetaSwin block of the encoder and into the decoder, which aims at segmentation performance. We evaluate our proposed MetaSwin model on two distinct medical datasets, namely BraTS 2023 and MICCAI 2015 BTCV, and conduct a comprehensive comparison with the two baselines, i.e., SwinUNETR and SwinUNETR+SE models. Our results emphasize the effectiveness of MetaSwin, showcasing its competitive edge against the baselines, utilizing a simple pooling operation and efficient SE blocks. MetaSwin’s consistent and superior performance on the BTCV dataset, in comparison to SwinUNETR, is particularly significant. For instance, with a model size of 24, MetaSwin outperforms SwinUNETR’s 76.58% Dice score using fewer parameters (15,407,384 vs 15,703,304) and a substantially reduced training time (300 vs 467 mins), achieving an improved Dice score of 79.12%. This research highlights the essential contribution of a simplified transformer framework, incorporating basic elements such as pooling and SE blocks, thus emphasizing their potential to guide the progression of medical segmentation models, without relying on complex attention-based mechanisms.

Funder

The National Research Foundation of Korea (NRF) grant funded by the Korea government

Publisher

PeerJ

Link

https://peerj.com/articles/cs-1762.pdf

Reference38 articles.

1. COViT-GAN: vision transformer for COVID-19 detection in CT scan imageswith self-attention GAN for data augmentation;Ambita,2021

2. The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification;Baid,2021

3. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features;Bakas;Scientific Data,2017

4. Monai: an open-source framework for deep learning in healthcare;Cardoso,2022

5. TransUNet: transformers make strong encoders for medical image segmentation;Chen,2021

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Using Meta-Transformers for Multimodal Clinical Decision Support and Evidence-Based Medicine;2024-08-20

2. Deep Transfer Learning Using Real-World Image Features for Medical Image Classification, with a Case Study on Pneumonia X-ray Images;Bioengineering;2024-04-20