Universal Relocalizer for Weakly Supervised Referring Expression Grounding-Reference-Cited by-同舟云学术

Universal Relocalizer for Weakly Supervised Referring Expression Grounding

Published:2024-05-16 Issue:7 Volume:20 Page:1-23
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Zhang Panpan¹^ORCID,Liu Meng²^ORCID,Song Xuemeng¹^ORCID,Cao Da³^ORCID,Gao Zan⁴^ORCID,Nie Liqiang⁵^ORCID

Affiliation:

1. Shandong University, Qingdao, China

2. Shandong Jianzhu University, Jinan, China

3. Hunan University, Changsha, China

4. Qilu University of Technology, Jinan, China

5. Harbin Institute of Technology Shenzhen, Shenzhen, China

Abstract

This article introduces the Universal Relocalizer, a novel approach designed for weakly supervised referring expression grounding. Our method strives to pinpoint a target proposal that corresponds to a specific query, eliminating the need for region-level annotations during training. To bolster the localization precision and enrich the semantic understanding of the target proposal, we devise three key modules: the category module, the color module, and the spatial relationship module. The category and color modules assign respective category and color labels to region proposals, enabling the computation of category and color scores. Simultaneously, the spatial relationship module integrates spatial cues, yielding a spatial score for each proposal to enhance localization accuracy further. By adeptly amalgamating the category, color, and spatial scores, we derive a refined grounding score for every proposal. Comprehensive evaluations on the RefCOCO, RefCOCO+, and RefCOCOg datasets manifest the prowess of the Universal Relocalizer, showcasing its formidable performance across the board.

Funder

National Natural Science Foundation of China

Shandong Provincial Natural Science Foundation

Science and Technology Innovation Program for Distinguished Young Scholars of Shandong Province Higher Education Institutions

Special Fund for distinguished professors of Shandong Jianzhu University

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3656045

Reference51 articles.

1. VQA: Visual Question Answering

2. Ying Cheng, Ruize Wang, Jiashuo Yu, Rui-Wei Zhao, Yuejie Zhang, and Rui Feng. 2021. Exploring logical reasoning for referring expression comprehension. In Proceedings of the ACM International Conference on Multimedia. 5047–5055.

3. Dynamic Distribution-Sensitive Point Location

4. Reliable Mutual Distillation for Medical Image Segmentation Under Imperfect Annotations

5. BTDP