SalSAC: A Video Saliency Prediction Model with Shuffled Attentions and Correlation-Based ConvLSTM-Reference-Cited by-同舟云学术

SalSAC: A Video Saliency Prediction Model with Shuffled Attentions and Correlation-Based ConvLSTM

Published:2020-04-03 Issue:07 Volume:34 Page:12410-12417
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Wu Xinyi,Wu Zhenyao,Zhang Jinglin,Ju Lili,Wang Song

Abstract

The performance of predicting human fixations in videos has been much enhanced with the help of development of the convolutional neural networks (CNN). In this paper, we propose a novel end-to-end neural network “SalSAC” for video saliency prediction, which uses the CNN-LSTM-Attention as the basic architecture and utilizes the information from both static and dynamic aspects. To better represent the static information of each frame, we first extract multi-level features of same size from different layers of the encoder CNN and calculate the corresponding multi-level attentions, then we randomly shuffle these attention maps among levels and multiply them to the extracted multi-level features respectively. Through this way, we leverage the attention consistency across different layers to improve the robustness of the network. On the dynamic aspect, we propose a correlation-based ConvLSTM to appropriately balance the influence of the current and preceding frames to the prediction. Experimental results on the DHF1K, Hollywood2 and UCF-sports datasets show that SalSAC outperforms many existing state-of-the-art methods.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 29 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. tSPM-Net: A probabilistic spatio-temporal approach for scanpath prediction;Computers & Graphics;2024-08

2. Predicting 360° Video Saliency: A ConvLSTM Encoder-Decoder Network With Spatio-Temporal Consistency;IEEE Journal on Emerging and Selected Topics in Circuits and Systems;2024-06

3. Transformer-based multi-level attention integration network for video saliency prediction;Multimedia Tools and Applications;2024-05-25

4. Predicting the Noticeability of Dynamic Virtual Elements in Virtual Reality;Proceedings of the CHI Conference on Human Factors in Computing Systems;2024-05-11

5. Multi-Step Multidimensional Statistical Arbitrage Prediction Using PSO Deep-ConvLSTM: An Enhanced Approach for Forecasting Price Spreads;Applied Sciences;2024-04-29