Switch-Based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning-Reference-Cited by-同舟云学术

Switch-Based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning

Published:2019-07-17 Issue: Volume:33 Page:7289-7296
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Wu Yuexin,Li Xiujun,Liu Jingjing,Gao Jianfeng,Yang Yiming

Abstract

Training task-completion dialogue agents with reinforcement learning usually requires a large number of real user experiences. The Dyna-Q algorithm extends Q-learning by integrating a world model, and thus can effectively boost training efficiency using simulated experiences generated by the world model. The effectiveness of Dyna-Q, however, depends on the quality of the world model - or implicitly, the pre-specified ratio of real vs. simulated experiences used for Q-learning. To this end, we extend the recently proposed Deep Dyna-Q (DDQ) framework by integrating a switcher that automatically determines whether to use a real or simulated experience for Q-learning. Furthermore, we explore the use of active learning for improving sample efficiency, by encouraging the world model to generate simulated experiences in the stateaction space where the agent has not (fully) explored. Our results show that by combining switcher and active learning, the new framework named as Switch-based Active Deep Dyna-Q (Switch-DDQ), leads to significant improvement over DDQ and Q-learning baselines in both simulation and human evaluations.1

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments;Frontiers of Information Technology & Electronic Engineering;2024-04

2. Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy Learning;IEEE Access;2024

3. Decomposed Deep Q-Network for Coherent Task-Oriented Dialogue Policy Learning;IEEE/ACM Transactions on Audio, Speech, and Language Processing;2024

4. Enhancing Exploration Efficiency of Fixed-Wing UAVs Through Intelligent Decision-Making and Advanced Control Integration;Lecture Notes in Electrical Engineering;2024

5. A Survey on Recent Advances and Challenges in Reinforcement Learning Methods for Task-oriented Dialogue Policy Learning;Machine Intelligence Research;2023-01-07