When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism-Reference-Cited by-同舟云学术

When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism

Published:2022-06-28 Issue:2 Volume:36 Page:2423-2430
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Wang Guangting,Zhao Yucheng,Tang Chuanxin,Luo Chong,Zeng Wenjun

Abstract

Attention mechanism has been widely believed as the key to success of vision transformers (ViTs), since it provides a flexible and powerful way to model spatial relationships. However, is the attention mechanism truly an indispensable part of ViT? Can it be replaced by some other alternatives? To demystify the role of attention mechanism, we simplify it into an extremely simple case: ZERO FLOP and ZERO parameter. Concretely, we revisit the shift operation. It does not contain any parameter or arithmetic calculation. The only operation is to exchange a small portion of the channels between neighboring features. Based on this simple operation, we construct a new backbone network, namely ShiftViT, where the attention layers in ViT are substituted by shift operations. Surprisingly, ShiftViT works quite well in several mainstream tasks, e.g., classification, detection, and segmentation. The performance is on par with or even better than the strong baseline Swin Transformer. These results suggest that the attention mechanism might not be the vital factor that makes ViT successful. It can be even replaced by a zero-parameter operation. We should pay more attentions to the remaining parts of ViT in the future work. Code is available at github.com/microsoft/SPACH.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 30 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Prior-guided attention fusion transformer for multi-lesion segmentation of diabetic retinopathy;Scientific Reports;2024-09-08

2. UPFormer: U-sharped Perception lightweight Transformer for segmentation of field grape leaf diseases;Expert Systems with Applications;2024-09

3. MBFormer-YOLO: Multibranch Adaptive Spatial Feature Detection Network for Small Infrared Object Detection;IEEE Sensors Journal;2024-06-15

4. A feature aggregation network for contour detection inspired by complex cells properties;The Visual Computer;2024-05-21

5. Multi-scale gated network for efficient image super-resolution;The Visual Computer;2024-05-03