Affiliation:
1. The University of Texas at Austin, USA
Abstract
Human-robot interaction (HRI) in human social environments (HSEs) poses unique challenges for robot perception systems, which must combine asynchronous, heterogeneous data streams in real-time. Multimodal perception systems are well-suited for HRI in HSEs, and can provide more rich, robust interaction for robots operating amongst humans. In this article, we provide an overview of multimodal perception systems being used in HSEs, which is intended to be an introduction to the topic and summary of relevant trends, techniques, resources, challenges, and terminology. We surveyed 15 peer-reviewed robotics and HRI publications over the past 10+ years, providing details about the data acquisition, processing, and fusion techniques used in 65 multimodal perception systems across various HRI domains. Our survey provides information about hardware, software, datasets, and methods currently available for HRI perception research, as well as how these perception systems are being applied in HSEs. Based on the survey, we summarize trends, challenges, limitations of multimodal human perception systems for robots, then identify resources for researchers and developers and propose future research areas to advance the field.
Publisher
Association for Computing Machinery (ACM)
Reference189 articles.
1. The performance and cognitive workload analysis of a multimodal speech and visual gesture (mSVG) UAV control interface
2. The Multimodal Speech and Visual Gesture (mSVG) Control Model for a Practical Patrol, Search, and Rescue Aerobot
3. Samer Al Moubayed, Jonas Beskow, and Gabriel Skantze. 2014. Spontaneous spoken dialogues with the furhat human-like robot head. In Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction. ACM, Bielefeld Germany, 326–326. https://doi.org/10.1145/2559636.2559781
4. Samer Al Moubayed, Jonas Beskow, Gabriel Skantze, and Björn Granström. 2012. Furhat: a back-projected human-like robot head for multiparty human-machine interaction. In Cognitive Behavioural Systems: COST 2102 International Training School, Dresden, Germany, February 21-26, 2011, Revised Selected Papers. Springer, 114–130.
5. RAVEL: an annotated corpus for training robots with audiovisual abilities