Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection-Reference-Cited by-同舟云学术

Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection

Published:2020-04-03 Issue:07 Volume:34 Page:10869-10876
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Gu Yuchao,Wang Lijuan,Wang Ziqin,Liu Yun,Cheng Ming-Ming,Lu Shao-Ping

Abstract

Spatiotemporal information is essential for video salient object detection (VSOD) due to the highly attractive object motion for human's attention. Previous VSOD methods usually use Long Short-Term Memory (LSTM) or 3D ConvNet (C3D), which can only encode motion information through step-by-step propagation in the temporal domain. Recently, the non-local mechanism is proposed to capture long-range dependencies directly. However, it is not straightforward to apply the non-local mechanism into VSOD, because i) it fails to capture motion cues and tends to learn motion-independent global contexts; ii) its computation and memory costs are prohibitive for video dense prediction tasks such as VSOD. To address the above problems, we design a Constrained Self-Attention (CSA) operation to capture motion cues, based on the prior that objects always move in a continuous trajectory. We group a set of CSA operations in Pyramid structures (PCSA) to capture objects at various scales and speeds. Extensive experimental results demonstrate that our method outperforms previous state-of-the-art methods in both accuracy and speed (110 FPS on a single Titan Xp) on five challenge datasets. Our code is available at https://github.com/guyuchao/PyramidCSA.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 71 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Collaborative spatial-temporal video salient object detection with cross attention transformer;Signal Processing;2024-11

2. Video Salient Object Detection Via Multi-level Spatiotemporal Bidirectional Network Using Multi-scale Transfer Learning;IETE Journal of Research;2024-08-05

3. Multi-temporal dependency handling in video smoke recognition: A holistic approach spanning spatial, short-term, and long-term perspectives;Expert Systems with Applications;2024-07

4. Multi-Involution Memory Network for Unsupervised Video Object Segmentation;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

5. Fie-net: spatiotemporal full-stage interaction enhancement network for video salient object detection;Signal, Image and Video Processing;2024-06-17