Affiliation:
1. Austrian Research Institute for Artificial Intelligence (OFAI)
2. Tufts University
Abstract
Abstract
Human instructors often refer to objects and actions involved in a task description using both linguistic and non-linguistic means of communication. Hence, for robots to engage in natural human-robot interactions, we need to better understand the various relevant aspects of human multi-modal task descriptions. We analyse reference resolution to objects in a data collection comprising two object manipulation tasks (22 teacher student interactions in Task 1 and 16 in Task 2) and find that 78.76% of all referring expressions to the objects relevant in Task 1 are verbally underspecified and 88.64% of all referring expressions are verbally underspecified in Task 2. The data strongly suggests that a language processing module for robots must be genuinely multi-modal, allowing for seamless integration of information transmitted in the verbal and the visual channel, whereby tracking the speaker’s eye gaze and gestures as well as object recognition are necessary preconditions.
Publisher
John Benjamins Publishing Company
Subject
Human-Computer Interaction,Linguistics and Language,Animal Science and Zoology,Language and Linguistics,Communication
Reference63 articles.
1. Admoni, H., Datsikas, C., & Scassellati, B. (2014). Speech and gaze conflicts in collaborative human-robot interactions. In Proceedings of the 36th Annual Conference of the Cognitive Science Society (CogSci 2014).
2. Verweise mit Demonstrativa im gesprochenen Deutsch
3. The rapid use of gender information: evidence of the time course of pronoun resolution from eyetracking
4. Overspecification facilitates object identification