Motion estimation and multi-stage association for tracking-by-detection-Reference-Cited by-同舟云学术

Motion estimation and multi-stage association for tracking-by-detection

Published:2023-11-22 Issue: Volume: Page:
ISSN:2199-4536
Container-title:Complex & Intelligent Systems
language:en
Short-container-title:Complex Intell. Syst.

Author:

Li Ye,Wu Lei,Chen Yiping,Wang Xinzhong,Yin Guangqiang,Wang Zhiguo

Abstract

AbstractMulti-object tracking (MOT) aims to locate and identify objects in videos. As deep learning brings excellent performances to object detection, the tracking-by-detection (TBD) has gradually become a mainstream tracking framework. However, some drawbacks still exist in the current TBD framework: (1) inaccurate prediction of the bounding boxes would occur in the detection part, which is caused by overlooking the actual pedestrian ratio in the surveillance scene. (2) The width of the bounding boxes in the next frame might be indirectly predicted by the aspect ratio, which increases the error of width prediction in the motion prediction part. (3) Association is only performed for high-confidence detection boxes, and the low-confidence boxes caused by occlusion are discarded in the data association part, resulting in fragmentation of trajectories. To address the above issues, we propose a multi-target tracking model incorporating motion estimation and multi-stage association (MEMA). First, the aspect ratio of the ground-true bounding box is introduced to improve the fit of the detection and the ground-true bounding box, and we design the elliptical Gaussian kernel to improve the positioning accuracy of the object center point. Then, the prediction state vector of the Kalman filter is modified to predict the width and its corresponding velocity directly. It can reduce the width error of the prediction box and eliminate the velocity error of the motion estimation, which leads to a more pedestrian-friendly prediction bounding box. Finally, we propose a multi-stage association strategy to correlate different confidence boxes. Without using the appearance feature, the strategy can reduce the impact of occlusion and improve the tracking performance. On the MOT17 test set, the method proposed in this paper achieves a MOTA of 74.3% and an IDF1 of 72.4%, outperforming the current SOTA.

Funder

Natural Science Foundation of Xinjiang Province

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s40747-023-01273-3.pdf

Reference46 articles.

1. Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE International Conference on image processing (ICIP), pp 3464–3468

2. Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on image processing (ICIP), pp 3645–3649

3. Chen L, Ai H, Zhuang Z, Shang C (2018) Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: 2018 IEEE International Conference on multimedia and expo (ICME), pp 1–6

4. Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) Poi: multiple object tracking with high performance detection and appearance feature. In: European Conference on computer vision, pp 36–42

5. Nima M, Mohammad AS, Mohammad R (2019) Multi-target tracking using cnn-based features: Cnnmtt. Multimed Tools Appl 78:7077–7096