Text-Guided Graph Neural Networks for Referring 3D Instance Segmentation-Reference-Cited by-同舟云学术

Text-Guided Graph Neural Networks for Referring 3D Instance Segmentation

Published:2021-05-18 Issue:2 Volume:35 Page:1610-1618
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Huang Pin-Hao,Lee Han-Hung,Chen Hwann-Tzong,Liu Tyng-Luh

Abstract

This paper addresses a new task called referring 3D instance segmentation, which aims to segment out the target instance in a 3D scene given a query sentence. Previous work on scene understanding has explored visual grounding with natural language guidance, yet the emphasis is mostly constrained on images and videos. We propose a Text-guided Graph Neural Network (TGNN) for referring 3D instance segmentation on point clouds. Given a query sentence and the point cloud of a 3D scene, our method learns to extract per-point features and predicts an offset to shift each point toward its object center. Based on the point features and the offsets, we cluster the points to produce fused features and coordinates for the candidate objects. The resulting clusters are modeled as nodes in a Graph Neural Network to learn the representations that encompass the relation structure for each candidate object. The GNN layers leverage each object's features and its relations with neighbors to generate an attention heatmap for the input sentence expression. Finally, the attention heatmap is used to "guide" the aggregation of information from neighborhood nodes. Our method achieves state-of-the-art performance on referring 3D instance segmentation and 3D localization on ScanRefer, Nr3D, and Sr3D benchmarks, respectively.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Revisiting 3D visual grounding with Context-aware Feature Aggregation;Neurocomputing;2024-10

2. ViewInfer3D: 3D Visual Grounding Based on Embodied Viewpoint Inference;IEEE Robotics and Automation Letters;2024-09

3. Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding;2024 IEEE International Conference on Multimedia and Expo Workshops (ICMEW);2024-07-15

4. Graph Neural Networks in Point Clouds: A Survey;Remote Sensing;2024-07-09

5. Talk2BEV: Language-enhanced Bird’s-eye View Maps for Autonomous Driving;2024 IEEE International Conference on Robotics and Automation (ICRA);2024-05-13