Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets-Reference-Cited by-同舟云学术

Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets

Published:2008-12 Issue:4 Volume:34 Page:487-511
ISSN:0891-2017
Container-title:Computational Linguistics
language:en
Short-container-title:Computational Linguistics

Author:

Henderson James¹²,Lemon Oliver¹²,Georgila Kallirroi¹²

Affiliation:

1. * Université de Genèave, Département d'Informatique, Battelle-bâtiment A, 7 route de Drize, 1227 Carouge, Switzerland..

2. ** University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, UK., .

Abstract

We propose a method for learning dialogue management policies from a fixed data set. The method addresses the challenges posed by Information State Update (ISU)-based dialogue systems, which represent the state of a dialogue as a large set of features, resulting in a very large state space and a huge policy space. To address the problem that any fixed data set will only provide information about small portions of these state and policy spaces, we propose a hybrid model that combines reinforcement learning with supervised learning. The reinforcement learning is used to optimize a measure of dialogue reward, while the supervised learning is used to restrict the learned policy to the portions of these spaces for which we have data. We also use linear function approximation to address the need to generalize from a fixed amount of data to large state spaces. To demonstrate the effectiveness of this method on this challenging task, we trained this model on the COMMUNICATOR corpus, to which we have added annotations for user actions and Information States. When tested with a user simulation trained on a different part of the same data set, our hybrid model outperforms a pure supervised learning model and a pure reinforcement learning model. It also outperforms the hand-crafted systems on the COMMUNICATOR data, according to automatic evaluation measures, improving over the average COMMUNICATOR system policy by 10%. The proposed method will improve techniques for bootstrapping and automatic optimization of dialogue management policies from limited initial data sets.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/coli.2008.07-028-R2-05-82

Reference9 articles.

1. Information state and dialogue management in the TRINDI dialogue move engine toolkit

2. A stochastic model of human-machine interaction for learning dialog strategies

3. A probabilistic framework for dialog simulation and optimal strategy learning

4. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies

Cited by 52 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ORLEP: an efficient offline reinforcement learning evaluation platform;Multimedia Tools and Applications;2023-09-22

2. A Survey on Recent Advances and Challenges in Reinforcement Learning Methods for Task-oriented Dialogue Policy Learning;Machine Intelligence Research;2023-01-07

3. Lost in Dialogue: A Review and Categorisation of Current Dialogue System Approaches and Technical Solutions;KI 2023: Advances in Artificial Intelligence;2023

4. Conversational QA over Knowledge Bases;Neural Approaches to Conversational Information Retrieval;2023

5. Conversational AI for multi-agent communication in Natural Language;AI Communications;2022-09-30