Salient object detection for RGBD video via spatial interaction and depth-based boundary refinement-Reference-Cited by-同舟云学术

Salient object detection for RGBD video via spatial interaction and depth-based boundary refinement

Published:2023-05-06 Issue:6 Volume:9 Page:6343-6358
ISSN:2199-4536
Container-title:Complex & Intelligent Systems
language:en
Short-container-title:Complex Intell. Syst.

Author:

Zhang Yujian,Zhang Ziyan,Zhang Ping,Xu Mengnan

Abstract

AbstractRecently proposed state-of-the-art saliency detection models rely heavily on labeled datasets and rarely focus on perfect RGBD feature fusion, which lowers their generalization ability. In this paper, we propose a depth-based interaction and refinement network (DIR-Net) to fully leverage the depth information provided with RGB images to generate and refine the corresponding saliency segmentation maps. In total, three modules are included in our framework. A depth-based refinement module (DRM) and an RGB module work in parallel while coordinating via interactive spatial guidance modules (ISGMs), which utilize spatial and channel attention computed from both depth features and RGB features. In each layer, the features in both modules are refined and guided by the spatial information obtained from the other module through ISGMs. In the RGB module, before sending the depth-guided feature map to the decoder, a convolutional gated recurrent unit (ConvGRU)-based block is introduced to handle temporal information. Thinking about the clear movement information in RGB features, the block also guides temporal information in DRM. By merging the results from both the DRM and RGB modules, a segmentation map with distinct boundaries is generated. Considering the lack of depth images in popular public datasets, we utilize a depth estimation network that incorporates manual postprocessing-based correction to generate depth images on the DAVIS and UVSD datasets. The state-of-the-art performance achieved on both the original and new datasets illustrates the advantage of our RGBD feature fusion strategy, with a real-time speed of 19 fps on a single GPU.

Funder

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s40747-023-01072-w.pdf

Reference60 articles.

1. Ren Z, Gao S, Chia L, Tsang IW (2013) Region-based saliency detection and its application in object recognition. IEEE Trans Circ Syst Video Technol 24(5):769–779. https://doi.org/10.1109/TCSVT.2013.2280096

2. Fan D, Wang W, Cheng M, Shen J (2019) Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8554–8564 . https://doi.org/10.1109/CVPR.2019.00875

3. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 . https://doi.org/10.1109/cvpr.2015.7298965

4. Borji A, Frintrop S, Sihite DN, Itti L (2012) Adaptive object tracking by learning background context. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 23–30 . IEEE. https://doi.org/10.1109/CVPRW.2012.6239191

5. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448. https://doi.org/10.1109/ICCV.2015.169

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing learning on uncertain pixels in self-distillation for object segmentation;Complex & Intelligent Systems;2024-06-15

2. Consistency-constrained RGB-T crowd counting via mutual information maximization;Complex & Intelligent Systems;2024-04-15