Lightweight transformer image feature extraction network-Reference-Cited by-同舟云学术

Lightweight transformer image feature extraction network

Published:2024-01-31 Issue: Volume:10 Page:e1755
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Zheng Wenfeng¹,Lu Siyu¹,Yang Youshuai¹,Yin Zhengtong²,Yin Lirong³

Affiliation:

1. School of Automation, University of Electronic Science and Technology of China, Chengdu, Sichuan, China

2. College of Resource and Environment Engineering, Guizhou University, Guiyang, Guizhou, China

3. Department of Geography and Anthropology, Louisiana State University, Baton Rouge, LA, United States of America

Abstract

In recent years, the image feature extraction method based on Transformer has become a research hotspot. However, when using Transformer for image feature extraction, the model’s complexity increases quadratically with the number of tokens entered. The quadratic complexity prevents vision transformer-based backbone networks from modelling high-resolution images and is computationally expensive. To address this issue, this study proposes two approaches to speed up Transformer models. Firstly, the self-attention mechanism’s quadratic complexity is reduced to linear, enhancing the model’s internal processing speed. Next, a parameter-less lightweight pruning method is introduced, which adaptively samples input images to filter out unimportant tokens, effectively reducing irrelevant input. Finally, these two methods are combined to create an efficient attention mechanism. Experimental results demonstrate that the combined methods can reduce the computation of the original Transformer model by 30%–50%, while the efficient attention mechanism achieves an impressive 60%–70% reduction in computation.

Funder

Sichuan Science and Technology Program

Publisher

PeerJ

Link

https://peerj.com/articles/cs-1755.pdf

Reference38 articles.

1. The quarks of attention: structure and capacity of neural attention building blocks;Baldi;Artificial Intelligence,2023

2. CrossViT: cross-attention multi-scale vision transformer for image classification;Chen,2021

3. Convit: improving vision transformers with soft convolutional inductive biases;d’Ascoli,2021

4. An image is worth 16×16 words: transformers for image recognition at Scale;Dosovitskiy,2021

5. Multiscale vision transformers;Fan,2021

Cited by 59 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Advanced image segmentation for precision agriculture using CNN-GAT fusion and fuzzy C-means clustering;Computers and Electronics in Agriculture;2024-11

2. YOLO-MIF: Improved YOLOv8 with Multi-Information fusion for object detection in Gray-Scale images;Advanced Engineering Informatics;2024-10

3. Multi-threshold image segmentation based on an improved whale optimization algorithm: A case study of Lupus Nephritis;Biomedical Signal Processing and Control;2024-10

4. Target detection and classification via EfficientDet and CNN over unmanned aerial vehicles;Frontiers in Neurorobotics;2024-08-30

5. Vehicle recognition pipeline via DeepSort on aerial image datasets;Frontiers in Neurorobotics;2024-08-16