Predicting Intentions of Pedestrians from 2D Skeletal Pose Sequences with a Representation-Focused Multi-Branch Deep Learning Network-Reference-Cited by-同舟云学术

Predicting Intentions of Pedestrians from 2D Skeletal Pose Sequences with a Representation-Focused Multi-Branch Deep Learning Network

Published:2020-12-10 Issue:12 Volume:13 Page:331
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Gesnouin Joseph^ORCID,Pechberti Steve,Bresson Guillaume,Stanciulescu Bogdan,Moutarde Fabien^ORCID

Abstract

Understanding the behaviors and intentions of humans is still one of the main challenges for vehicle autonomy. More specifically, inferring the intentions and actions of vulnerable actors, namely pedestrians, in complex situations such as urban traffic scenes remains a difficult task and a blocking point towards more automated vehicles. Answering the question “Is the pedestrian going to cross?” is a good starting point in order to advance in the quest to the fifth level of autonomous driving. In this paper, we address the problem of real-time discrete intention prediction of pedestrians in urban traffic environments by linking the dynamics of a pedestrian’s skeleton to an intention. Hence, we propose SPI-Net (Skeleton-based Pedestrian Intention network): a representation-focused multi-branch network combining features from 2D pedestrian body poses for the prediction of pedestrians’ discrete intentions. Experimental results show that SPI-Net achieved 94.4% accuracy in pedestrian crossing prediction on the JAAD data set while being efficient for real-time scenarios since SPI-Net can reach around one inference every 0.25 ms on one GPU (i.e., RTX 2080ti), or every 0.67 ms on one CPU (i.e., Intel Core i7 8700K).

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Link

https://www.mdpi.com/1999-4893/13/12/331/pdf

Reference98 articles.

1. Learning Spatiotemporal Features with 3D Convolutional Networks;Tran;arXiv,2014

2. Long-Term Temporal Convolutions for Action Recognition

3. Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems, 2014; pp. 568–576http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.749.5720&rep=rep1&type=pdf

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dependent Hidden Markov Model for pedestrian intention prediction: considering Multivariate Interaction Force;Transportmetrica A: Transport Science;2024-07-08

2. Continuous Recognition of Teachers’ Hand Signals for Students with Attention Deficits;Algorithms;2024-07-07

3. PedAST-GCN: Fast Pedestrian Crossing Intention Prediction Using Spatial–Temporal Attention Graph Convolution Networks;IEEE Transactions on Intelligent Transportation Systems;2024

4. Machine Learning - Imaging Applications in Transport Systems: A Review;2023 International Conference on Electrical, Computer and Energy Technologies (ICECET);2023-11-16

5. STMA-GCN_PedCross: Skeleton Based Spatial-Temporal Graph Convolution Networks with Multiple Attentions for Fast Pedestrian Crossing Intention Prediction;2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC);2023-09-24