A transformer‐based lightweight method for multiple‐object tracking-Reference-Cited by-同舟云学术

A transformer‐based lightweight method for multiple‐object tracking

Published:2024-04-16 Issue:9 Volume:18 Page:2329-2345
ISSN:1751-9659
Container-title:IET Image Processing
language:en
Short-container-title:IET Image Processing

Author:

Wan Qin¹²,Ge Zhu¹^ORCID,Yang Yang³,Shen Xuejun¹,Zhong Hang²^ORCID,Zhang Hui²,Wang Yaonan²,Wu Di¹

Affiliation:

1. School of the College of Electric and Information Engineering Hunan Institute of Engineering Xiangtan China

2. National Engineering Research Center for Robot Visual Perception and Control Technology Hunan University Changsha China

3. Industrial 4.0 Innovation Center Hunan Zhongnan Intelligent Equipment Co., Ltd Changsha China

Abstract

AbstractAt present, the multi‐object tracking method based on transformer generally uses its powerful self‐attention mechanism and global modelling ability to improve the accuracy of object tracking. However, most existing methods excessively rely on hardware devices, leading to an inconsistency between accuracy and speed in practical applications. Therefore, a lightweight transformer joint position awareness algorithm is proposed to solve the above problems. Firstly, a joint attention module to enhance the ShuffleNet V2 network is proposed. This module comprises the spatio‐temporal pyramid module and the convolutional block attention module. The spatio‐temporal pyramid module fuses multi‐scale features to capture information on different spatial and temporal scales. The convolutional block attention module aggregates channel and spatial dimension information to enhance the representation ability of the model. Then, a position encoding generator module and a dynamic template update strategy are proposed to solve the occlusion. Group convolution is adopted in the input sequence through position encoding generator module, with each convolution group responsible for handling the relative positional relationships of a specific range. In order to improve the reliability of the template, dynamic template update strategy is used to update the template at the appropriate time. The effectiveness of the approach is validated on the MOT16, MOT17, and MOT20 datasets.

Funder

National Natural Science Foundation of China

Publisher

Institution of Engineering and Technology (IET)

Reference47 articles.

1. Zhao L. Zhang X. Zhang X. Wang S. Wang S. Ma S. Gao W.:Intelligent analysis oriented surveillance video coding. In:Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) pp. 37–42.IEEE Piscataway NJ(2017)

2. Zhou Y. Tang Y. Zhang X. Mu Z. Gao F. Yi Y. Li Y.:Identification of charging behavior for electric bicycles based on supervised Fisher classifier. In:Proceedings of IEEE Sustainable Power and Energy Conference (iSPEC) pp. 3776–3780.IEEE Piscataway NJ(2021)

3. Zhang Y. Yang Z. Zhu Z. Feng W. Zhou Z. Wang W.:Visual navigation of mobile robots in complex environments based on distributed deep reinforcement learning. In:Proceedings of 6th Asian Conference on Artificial Intelligence Technology (ACAIT) pp. 1–5.IEEE Piscataway NJ(2022)

4. Multiscale and Direction Target Detecting in Remote Sensing Images via Modified YOLO-v4

5. DRPN: Making CNN dynamically handle scale variation