Learning to summarize and answer questions about a virtual robot’s past actions-Reference-Cited by-同舟云学术

Learning to summarize and answer questions about a virtual robot’s past actions

Published:2023-11-16 Issue:8 Volume:47 Page:1103-1118
ISSN:0929-5593
Container-title:Autonomous Robots
language:en
Short-container-title:Auton Robot

Author:

DeChant Chad,Akinola Iretiayo,Bauer Daniel

Abstract

AbstractWhen robots perform long action sequences, users will want to easily and reliably find out what they have done. We therefore demonstrate the task of learning to summarize and answer questions about a robot agent’s past actions using natural language alone. A single system with a large language model at its core is trained to both summarize and answer questions about action sequences given ego-centric video frames of a virtual robot and a question prompt. To enable training of question answering, we develop a method to automatically generate English-language questions and answers about objects, actions, and the temporal order in which actions occurred during episodes of robot action in the virtual environment. Training one model to both summarize and answer questions enables zero-shot transfer of representations of objects learned through question answering to improved action summarization.

Funder

Long Term Future Fund

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s10514-023-10134-4.pdf

Reference64 articles.

1. Anderson, P., Wu, Q., Teney, D., et al. (2018). Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3674–3683.

2. Antol, S., Agrawal, A., Lu, J., et al. (2015). Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision, pp. 2425–2433.

3. Apostolidis, E., Adamantidou, E., Metsai, A.I., et al. (2021). Video summarization using deep neural networks: A survey. arXiv preprint arXiv:2101.06072.

4. Bärmann, L., & Waibel, A. (2022). Where did i leave my keys? - episodic-memory-based question answering on egocentric videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp 1560–1568.

5. Barrett, D.P., Bronikowski, S.A., Yu, H., et al. (2015). Robot language learning, generation, and comprehension. arXiv preprint arXiv:1508.06161.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Avenues in IoT with advances in Artificial Intelligence;2024 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops);2024-03-11