Affiliation:
1. University of Roma Tor Vergata, Rome, Italy
Abstract
Situated natural language interactions between humans and robots are strictly necessary for complex applications: communication here implies the reference to the environment shared between a user and the robot. This paper proposes a transformer-based architecture that supports the integration of spatial information (as logical representation) about a semantic map of the environment and the input utterances. The generated interpretation is a logical form of the command that makes references to the state of the world through a single end-to-end process, stimulated at each interaction by an explicit linguistic description of the environment. In this specific work, the end-to-end capability of the targeted transformer is studied in light of its multilingual applications where the robot can be queried in different natural languages. The obtained experimental results confirm the applicability of transformers to grounded human-robotic interaction, with benefits in terms of both portability of the approach across domains and effectiveness in terms of reachable accuracy. Moreover, language-specific processing chains are shown to be preferable to large-scale multilingual models for their better trade-off between accuracy and complexity. Overall, the proposed architecture outperforms previous approaches and paves the way for sustainable multilingual architectures.
Reference8 articles.
1. parsing;Das;Computational Linguistics,2014
2. Frames and the semantics of understanding;Fillmore;Quaderni di Semantica,1985
3. Natural language generation for social robotics:Opportunities and challenges;Foster;Philosophical Transactions of theRoyal Society B: Biological Sciences,2019
4. Automatic labeling of semantic roles;Gildea;Computational Linguistics,2002
5. Bart-it: An efficientsequence-to-sequence model for italian text summarization;Quatra;Future Internet