Author:
Yin Hang,Melo Francisco,Billard Aude,Paiva Ana
Abstract
We contribute a learning from demonstration approach for robots to acquire skills from multi-modal high-dimensional data. Both latent representations and associations of different modalities are proposed to be jointly learned through an adapted variational auto-encoder. The implementation and results are demonstrated in a robotic handwriting scenario, where the visual sensory input and the arm joint writing motion are learned and coupled. We show the latent representations successfully construct a task manifold for the observed sensor modalities. Moreover, the learned associations can be exploited to directly synthesize arm joint handwriting motion from an image input in an end-to-end manner. The advantages of learning associative latent encodings are further highlighted with the examples of inferring upon incomplete input images. A comparison with alternative methods demonstrates the superiority of the present approach in these challenging tasks.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Coupled Conditional Neural Movement Primitives;Neural Computing and Applications;2024-08-02
2. Audio-Visual Cross-Modal Generation with Multimodal Variational Generative Model;2024 IEEE International Symposium on Circuits and Systems (ISCAS);2024-05-19
3. Perceive, Represent, Generate: Translating Multimodal Information to Robotic Motion Trajectories;2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS);2022-10-23