Enhanced Techniques for Pedestrian Localization and Movement Prediction with Single-View Angle Analysis-Reference-Cited by-同舟云学术

Enhanced Techniques for Pedestrian Localization and Movement Prediction with Single-View Angle Analysis

Published:2024-08-30 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Gao Lin¹,Li Handong¹,Li Yanyun¹,Liu Fan¹

Affiliation:

1. Guizhou University

Abstract

Based on a single-view scenario, our approach employs a unified Transformer framework featuring an enhanced multi-scale sparse attention mechanism to simultaneously perform three tasks: multi-pedestrian 3D pose estimation, tracking, and prediction. Initially, video data is processed to extract information, followed by training a video transformer to encode spatio-temporal features from multiple frames. This transformer decodes significant pose features from multi-person pose queries. These pose queries are then used for regression to predict multi-person pose trajectories and future movements in a single shot. To mitigate the challenges of occlusion and the complexity of pedestrian motion, we utilize a backbone network to extract features and implement an improved multi-scale spatio-temporal attention mechanism. This mechanism aggregates spatio-temporal information from multiple frames at various scales and captures long-term interactions. The backbone network excels at extracting detailed features from video data, while the multi-scale spatio-temporal attention mechanism, with its compact parameters, ensures a balance between efficiency and accuracy. Consequently, the integration of these components enhances prediction accuracy without excessively increasing model parameters.

Publisher

Springer Science and Business Media LLC

Reference29 articles.

1. A survey of multiple pedestrian tracking based on tracking-by-detection framework[J];SUN Z;IEEE Trans. Circuits Syst. Video Technol.,2020

2. Lidar-based gait analysis and activity recognition in a 4d surveillance system[J];BENEDEK C;IEEE Trans. Circuits Syst. Video Technol.,2016

3. Richly activated graph convolutional network for robust skeleton-based action recognition[J];SONG Y-F, ZHANG Z;IEEE Trans. Circuits Syst. Video Technol.,2020

4. Realtime multi-person 2d pose estimation using part affinity fields[J];CAO Z;IEEE Trans. Pattern Anal. Mach. Intell.,2021

5. The progress of human pose estimation: A survey and taxonomy of models applied in 2D human pose estimation[J];MUNEA TL;IEEE Access.,2020