CAPformer: Pedestrian Crossing Action Prediction Using Transformer

Author:

Lorenzo JavierORCID,Alonso Ignacio ParraORCID,Izquierdo RubénORCID,Ballardini Augusto LuisORCID,Saz Álvaro HernándezORCID,Llorca David FernándezORCID,Sotelo Miguel ÁngelORCID

Abstract

Anticipating pedestrian crossing behavior in urban scenarios is a challenging task for autonomous vehicles. Early this year, a benchmark comprising JAAD and PIE datasets have been released. In the benchmark, several state-of-the-art methods have been ranked. However, most of the ranked temporal models rely on recurrent architectures. In our case, we propose, as far as we are concerned, the first self-attention alternative, based on transformer architecture, which has had enormous success in natural language processing (NLP) and recently in computer vision. Our architecture is composed of various branches which fuse video and kinematic data. The video branch is based on two possible architectures: RubiksNet and TimeSformer. The kinematic branch is based on different configurations of transformer encoder. Several experiments have been performed mainly focusing on pre-processing input data, highlighting problems with two kinematic data sources: pose keypoints and ego-vehicle speed. Our proposed model results are comparable to PCPA, the best performing model in the benchmark reaching an F1 Score of nearly 0.78 against 0.77. Furthermore, by using only bounding box coordinates and image data, our model surpasses PCPA by a larger margin (F1=0.75 vs. F1=0.72). Our model has proven to be a valid alternative to recurrent architectures, providing advantages such as parallelization and whole sequence processing, learning relationships between samples not possible with recurrent architectures.

Funder

Ministerio de Ciencia e Innovación

Comunidad de Madrid

Universidad de Alcalá

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Reference47 articles.

1. Global Status Report on Road Safety 2018,2018

2. Safer Roads, Safer Cities: How to Improve Urban Road Safety in The EU;Adminaité-Fodor,2019

3. European New Car Assessment Programme (Euro NCAP) Test Protocol-AEB VRU Systems,2020

4. Human motion trajectory prediction: a survey

5. Pedestrian Action Anticipation Using Contextual Feature Fusion in Stacked RNNs;Rasouli;arXiv,2020

Cited by 21 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Pedestrian Crossing Intention Prediction Based on Cross-Modal Transformer and Uncertainty-Aware Multi-Task Learning for Autonomous Driving;IEEE Transactions on Intelligent Transportation Systems;2024-09

2. Multi-modal transformer with language modality distillation for early pedestrian action anticipation;Computer Vision and Image Understanding;2024-09

3. Prediction of Vehicular Yielding Intention While Approaching a Pedestrian Crosswalk;Transportation Research Record: Journal of the Transportation Research Board;2024-06-19

4. Knowledge-based explainable pedestrian behavior predictor;2024 IEEE Intelligent Vehicles Symposium (IV);2024-06-02

5. Contrasting Disentangled Partial Observations for Pedestrian Action Prediction;2024 IEEE Intelligent Vehicles Symposium (IV);2024-06-02

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3