Video Captioning Based on Both Egocentric and Exocentric Views of Robot Vision for Human-Robot Interaction-Reference-Cited by-同舟云学术

Video Captioning Based on Both Egocentric and Exocentric Views of Robot Vision for Human-Robot Interaction

Published:2021-11-30 Issue: Volume: Page:
ISSN:1875-4791
Container-title:International Journal of Social Robotics
language:en
Short-container-title:Int J of Soc Robotics

Author:

Kang Soo-Han,Han Ji-Hyeong^ORCID

Abstract

AbstractRobot vision provides the most important information to robots so that they can read the context and interact with human partners successfully. Moreover, to allow humans recognize the robot’s visual understanding during human-robot interaction (HRI), the best way is for the robot to provide an explanation of its understanding in natural language. In this paper, we propose a new approach by which to interpret robot vision from an egocentric standpoint and generate descriptions to explain egocentric videos particularly for HRI. Because robot vision equals to egocentric video on the robot’s side, it contains as much egocentric view information as exocentric view information. Thus, we propose a new dataset, referred to as the global, action, and interaction (GAI) dataset, which consists of egocentric video clips and GAI descriptions in natural language to represent both egocentric and exocentric information. The encoder-decoder based deep learning model is trained based on the GAI dataset and its performance on description generation assessments is evaluated. We also conduct experiments in actual environments to verify whether the GAI dataset and the trained deep learning model can improve a robot vision system

Funder

National Research Foundation of Korea

Publisher

Springer Science and Business Media LLC

Subject

General Computer Science,Human-Computer Interaction,Philosophy,Electrical and Electronic Engineering,Control and Systems Engineering,Social Psychology

Link

https://link.springer.com/content/pdf/10.1007/s12369-021-00842-1.pdf

Reference38 articles.

1. Kong Yu, Fu Yun (2018) Human action recognition and prediction: A survey. arXiv preprint arXiv:1806.11230

2. McColl D, Hong A, Hatakeyama N, Nejat G, Benhabib B (2016) A survey of autonomous human affect detection methods for social robots engaged in natural hri. J Intell Robot Syst 82(1):101–133

3. Ji Yanli, Yang Yang, Shen Fumin (2019) Heng Tao Shen, and Xuelong Li. A survey of human action analysis in hri applications, IEEE Transactions on Circuits and Systems for Video Technology

4. Lunghi Giacomo, Marin Raul, Di Castro Mario, Masi Alessandro, Sanz Pedro J |(2019) Multimodal human-robot interface for accessible remote robotic interventions in hazardous environments. IEEE Access, 7:127290–127319

5. Ruiz Ariel Y Ramos, Rivera Luis J Figueroa, Chandrasekaran Balasubramaniyan (2019) A sensor fusion based robotic system architecture using human interaction for motion control. In: 2019 IEEE 9th annual computing and communication workshop and conference (CCWC), pages 0095–0100. IEEE

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Deep Dive into Robot Vision - An Integrative Systematic Literature Review Methodologies and Research Endeavor Practices;ACM Computing Surveys;2024-04-25

2. Trends in Event Understanding and Caption Generation/Reconstruction in Dense Video: A Review;Computers, Materials & Continua;2024

3. Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26

4. Transformer-Based Disease Identification for Small-Scale Imbalanced Capsule Endoscopy Dataset;Electronics;2022-08-31

5. Video localized caption generation framework for industrial videos;Journal of Intelligent & Fuzzy Systems;2022-08-10