Region-Focused Network for Dense Captioning

Author:

Huang Qingbao1ORCID,Li Pijian1ORCID,Huang Youji1ORCID,Shuang Feng2ORCID,Cai Yi3ORCID

Affiliation:

1. Guangxi University, Nanning, China

2. Guangxi University, Nanning, China and Guangxi Key Laboratory of Intelligent Control and Maintenance of Power Equipment, Nanning, China

3. South China University of Technology, Guangzhou, China and Key Laboratory of Big Dat and Intelligent Robot (SCUT), Ministry of Education, Guangzhou, China

Abstract

Dense captioning is a very critical but under-explored task, which aims to densely detect localized regions-of-interest (RoIs) and describe them with natural language in a given image. Although recent studies tried to fuse multi-scale features from different visual instances to generate more accurate descriptions, their methods still suffer from the lack of exploration of relation semantic information in images, leading to less informative descriptions. Furthermore, indiscriminately fusing all visual instance features will introduce redundant information, resulting in poor matching between descriptions and corresponding regions. In this work, we propose a Region-Focused Network (RFN) to address these issues. Specifically, to fully comprehend the images, we first extract the object-level features, and encode the interaction and position relations between objects to enhance the object representations. Then, to decrease the interference from redundant information about the target region, we extract the most relevant information to the region. Finally, a region-based Transformer is employed to compose and align the previous mined information and generate the corresponding descriptions. Extensive experiments on Visual Genome V1.0 and V1.2 datasets show that our RFN model outperforms the state-of-the-art methods, thus verifying its effectiveness. Our code is available at https://github.com/VILAN-Lab/DesCap .

Funder

National Natural Science Foundation of China

Guangxi Natural Science Foundation

Guangxi Scientific and Technological Bases and Talents Special Projects

Bagui Scholar Program of Guangxi, and partly by National Natural Science Foundation of China

Fundamental Research Funds for the Central Universities, SCUT

Innovation Project of Guangxi Graduate Education

Technology Planning Project of Guangdong Province

Publisher

Association for Computing Machinery (ACM)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3