Human-like Attention-Driven Saliency Object Estimation in Dynamic Driving Scenes
Author:
Jin Lisheng, Ji Bingdong, Guo BaicangORCID
Abstract
Identifying a notable object and predicting its importance in front of a vehicle are crucial for automated systems’ risk assessment and decision making. However, current research has rarely exploited the driver’s attentional characteristics. In this study, we propose an attention-driven saliency object estimation (SOE) method that uses the attention intensity of the driver as a criterion for determining the salience and importance of objects. First, we design a driver attention prediction (DAP) network with a 2D-3D mixed convolution encoder–decoder structure. Second, we fuse the DAP network with faster R-CNN and YOLOv4 at the feature level and name them SOE-F and SOE-Y, respectively, using a shared-bottom multi-task learning (MTL) architecture. By transferring the spatial features onto the time axis, we are able to eliminate the drawback of the bottom features being extracted repeatedly and achieve a uniform image-video input in SOE-F and SOE-Y. Finally, the parameters in SOE-F and SOE-Y are classified into two categories, domain invariant and domain adaptive, and then the domain-adaptive parameters are trained and optimized. The experimental results on the DADA-2000 dataset demonstrate that the proposed method outperforms the state-of-the-art methods in several evaluation metrics and can more accurately predict driver attention. In addition, driven by a human-like attention mechanism, SOE-F and SOE-Y can identify and detect the salience, category, and location of objects, providing risk assessment and a decision basis for autonomous driving systems.
Funder
National Natural Science Foundation of China S&T Program of Hebei
Subject
Electrical and Electronic Engineering,Industrial and Manufacturing Engineering,Control and Optimization,Mechanical Engineering,Computer Science (miscellaneous),Control and Systems Engineering
Reference36 articles.
1. Suman, V., and Bera, A. (2020). RAIST: Learning Risk Aware Traffic Interactions via Spatio-Temporal Graph Convolutional Networks. arXiv. 2. Five factors that guide attention in visual search;Wolfe;Nat. Hum. Behav.,2017 3. Zhang, Z., Tawari, A., Martin, S., and Crandall, D. (August, January 31). Interaction graphs for object importance estimation in on-road driving videos. Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France. 4. Wang, W., Shen, J., Guo, F., Cheng, M.M., and Borji, A. (2018, January 18–22). Revisiting video saliency: A large-scale benchmark and a new model. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA. 5. Alletto, S., Palazzi, A., Solera, F., Calderara, S., and Cucchiara, R. (July, January 26). Dr (eye) ve: A dataset for attention-based tasks with applications to autonomous and assisted driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA.
|
|