CAE-Net: Cross-Modal Attention Enhancement Network for RGB-T Salient Object Detection-Reference-Cited by-同舟云学术

CAE-Net: Cross-Modal Attention Enhancement Network for RGB-T Salient Object Detection

Published:2023-02-14 Issue:4 Volume:12 Page:953
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Lv Chengtao¹^ORCID,Wan Bin¹,Zhou Xiaofei¹^ORCID,Sun Yaoqi¹²,Hu Ji¹²,Zhang Jiyong¹^ORCID,Yan Chenggang¹

Affiliation:

1. School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China

2. Lishui Institute of Hangzhou Dianzi University, Lishui 323000, China

Abstract

RGB salient object detection (SOD) performs poorly in low-contrast and complex background scenes. Fortunately, the thermal infrared image can capture the heat distribution of scenes as complementary information to the RGB image, so the RGB-T SOD has recently attracted more and more attention. Many researchers have committed to accelerating the development of RGB-T SOD, but some problems still remain to be solved. For example, the defective sample and interfering information contained in the RGB or thermal image hinder the model from learning proper saliency features, meanwhile the low-level features with noisy information result in incomplete salient objects or false positive detection. To solve these problems, we design a cross-modal attention enhancement network (CAE-Net). First, we concretely design a cross-modal fusion (CMF) module to fuse cross-modal features, where the cross-attention unit (CAU) is employed to enhance the two modal features, and channel attention is used to dynamically weigh and fuse the two modal features. Then, we design the joint-modality decoder (JMD) to fuse cross-level features, where the low-level features are purified by higher level features, and multi-scale features are sufficiently integrated. Besides, we add two single-modality decoder (SMD) branches to preserve more modality-specific information. Finally, we employ a multi-stream fusion (MSF) module to fuse three decoders’ features. Comprehensive experiments are conducted on three RGB-T datasets, and the results show that our CAE-Net is comparable to the other methods.

Funder

National Key Research and Development Program of China

Fundamental Research Funds for the Provincial Universities of Zhejiang

National Natural Science Foundation of China

“Pioneer” and “Leading Goose” R&D Program of Zhejiang Province

Zhejiang Province Nature Science Foundation of China

Hangzhou Dianzi University (HDU) and the China Electronics Corporation DATA (CECDATA) Joint Research Center of Big Data Technologies

111 Project

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/4/953/pdf

Reference65 articles.

1. Liu, J., Gong, S., Guan, W., Li, B., Li, H., and Liu, J. (2020). Tracking and Localization based on Multi-angle Vision for Underwater Target. Electronics, 9.

2. Tang, L., Sun, K., Huang, S., Wang, G., and Jiang, K. (2022). Quality Assessment of View Synthesis Based on Visual Saliency and Texture Naturalness. Electronics, 11.

3. Ji, L., Hu, X., and Wang, M. (2018). Saliency Preprocessing Locality-Constrained Linear Coding for Remote Sensing Scene Classification. Electronics, 7.

4. Duan, C., Liu, Y., Xing, C., and Wang, Z. (2022). Infrared and Visible Image Fusion Using Truncated Huber Penalty Function Smoothing and Visual Saliency Based Threshold Optimization. Electronics, 11.

5. Gradient-based learning applied to document recognition;LeCun;Proc. IEEE,1998

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Real-time RGBT tracking via isometric feature encoding networking;2024-08-26