Self-Supervised Pretraining for RGB-D Salient Object Detection-Reference-Cited by-同舟云学术

Self-Supervised Pretraining for RGB-D Salient Object Detection

Published:2022-06-28 Issue:3 Volume:36 Page:3463-3471
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Zhao Xiaoqi,Pang Youwei,Zhang Lihe,Lu Huchuan,Ruan Xiang

Abstract

Existing CNNs-Based RGB-D salient object detection (SOD) networks are all required to be pretrained on the ImageNet to learn the hierarchy features which helps provide a good initialization. However, the collection and annotation of large-scale datasets are time-consuming and expensive. In this paper, we utilize self-supervised representation learning (SSL) to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation. Our pretext tasks require only a few and unlabeled RGB-D datasets to perform pretraining, which makes the network capture rich semantic contexts and reduce the gap between two modalities, thereby providing an effective initialization for the downstream task. In addition, for the inherent problem of cross-modal fusion in RGB-D SOD, we propose a consistency-difference aggregation (CDA) module that splits a single feature fusion into multi-path fusion to achieve an adequate perception of consistent and differential information. The CDA module is general and suitable for cross-modal and cross-level feature fusion. Extensive experiments on six benchmark datasets show that our self-supervised pretrained model performs favorably against most state-of-the-art methods pretrained on ImageNet. The source code will be publicly available at https://github.com/Xiaoqi-Zhao-DLUT/SSLSOD.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 27 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Progressive cross-level fusion network for RGB-D salient object detection;Journal of Visual Communication and Image Representation;2024-10

2. MAGNet: Multi-scale Awareness and Global fusion Network for RGB-D salient object detection;Knowledge-Based Systems;2024-09

3. Learning Adaptive Fusion Bank for Multi-Modal Salient Object Detection;IEEE Transactions on Circuits and Systems for Video Technology;2024-08

4. Gated multi-modal edge refinement network for light field salient object detection;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-06-28

5. CMDCF: an effective cross-modal dense cooperative fusion network for RGB-D SOD;Neural Computing and Applications;2024-05-07