Visual-and-Language Multimodal Fusion for Sweeping Robot Navigation Based on CNN and GRU-Reference-Cited by-同舟云学术

Visual-and-Language Multimodal Fusion for Sweeping Robot Navigation Based on CNN and GRU

Published:2024-02-20 Issue:1 Volume:36 Page:1-21
ISSN:1546-2234
Container-title:Journal of Organizational and End User Computing
language:ng
Short-container-title:

Author:

Zhang Yiping¹,Wilker Kolja²^ORCID

Affiliation:

1. Sino-German Institute of Design and Communication, Zhejiang Wanli University, China

2. DFI College of Communication Art and New Media, Germany

Abstract

Effectively fusing information between the visual and language modalities remains a significant challenge. To achieve deep integration of natural language and visual information, this research introduces a multimodal fusion neural network model, which combines visual information (RGB images and depth maps) with language information (natural language navigation instructions). Firstly, the authors used faster R-CNN and ResNet50 to extract image features and attention mechanism to further extract effective information. Secondly, GRU model is used to extract language features. Finally, another GRU model is used to fuse the visual- language features, and then the history information is retained to give the next action instruction to the robot. Experimental results demonstrate that the proposed method effectively addresses the localization and decision-making challenges for robotic vacuum cleaners.

Publisher

IGI Global

Reference43 articles.

1. Sim-to-real transfer for vision-and-language navigation.;P.Anderson;Proceedings of the 2020 Conference on Robot Learning,2021

2. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments

3. Development of quadruped walking robots: A review

4. Chinese Word Segmentation based on Bidirectional GRU-CRF Model

5. Multi-feature fusion: Graph neural network and CNN combining for hyperspectral image classification

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multiple Unmanned Aerial Vehicle (multi-UAV) Reconnaissance and Search with Limited Communication Range Using Semantic Episodic Memory in Reinforcement Learning;Drones;2024-08-14