Attentive Excitation and Aggregation for Bilingual Referring Image Segmentation-Reference-Cited by-同舟云学术

Attentive Excitation and Aggregation for Bilingual Referring Image Segmentation

Published:2021-04-30 Issue:2 Volume:12 Page:1-17
ISSN:2157-6904
Container-title:ACM Transactions on Intelligent Systems and Technology
language:en
Short-container-title:ACM Trans. Intell. Syst. Technol.

Author:

Zhou Qianli¹,Hui Tianrui²^ORCID,Wang Rong¹,Hu Haimiao³,Liu Si³

Affiliation:

1. People’s Public Security University of China, Beijing, China

2. Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China

3. Beihang University, Beijing, China

Abstract

The goal of referring image segmentation is to identify the object matched with an input natural language expression. Previous methods only support English descriptions, whereas Chinese is also broadly used around the world, which limits the potential application of this task. Therefore, we propose to extend existing datasets with Chinese descriptions and preprocessing tools for training and evaluating bilingual referring segmentation models. In addition, previous methods also lack the ability to collaboratively learn channel-wise and spatial-wise cross-modal attention to well align visual and linguistic modalities. To tackle these limitations, we propose a Linguistic Excitation module to excite image channels guided by language information and a Linguistic Aggregation module to aggregate multimodal information based on image-language relationships. Since different levels of features from the visual backbone encode rich visual information, we also propose a Cross-Level Attentive Fusion module to fuse multilevel features gated by language information. Extensive experiments on four English and Chinese benchmarks show that our bilingual referring image segmentation model outperforms previous methods.

Funder

Operating Expenses of Basic Scientific Research Project of the People?s Public Security University of China

National Key Research and Development Program of China

Fundamental Research Funds for the Central Universities

Zhejiang Lab

Beijing Natural Science Foundation

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3446345

Reference52 articles.

1. Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473 Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

2. MUTAN: Multimodal Tucker Fusion for Visual Question Answering

3. See-Through-Text Grouping for Referring Image Segmentation

4. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cross‐modal fusion encoder via graph neural network for referring image segmentation;IET Image Processing;2024-01-02

2. Construction and Application of English-Chinese Multimodal Emotional Corpus Based on Artificial Intelligence;International Journal of Human–Computer Interaction;2023-01-26

3. Cross-modality synergy network for referring expression comprehension and segmentation;Neurocomputing;2022-01