Author:
Lowe Ryan,Pow Nissan,Serban Iulian Vlad,Charlin Laurent,Liu Chia-Wei,Pineau Joelle
Abstract
In this paper, we construct and train end-to-end neural network-based dialogue systems usingan updated version of the recent Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering, which can be both time consuming and expensive. We provide baselines in two different environments: one where models are trained to maximize the log-likelihood of a generated utterance conditioned on the context of the conversation, and one where models are trained to select the correct next response from a list of candidate responses. These are both evaluated on a recall task that we call Next Utterance Classification (NUC), as well as other generation-specific metrics. Finally, we provide a qualitative error analysis to help determine the most promising directions for future research on the Ubuntu Dialogue Corpus, and for end-to-end dialogue systems in general.
Publisher
University of Illinois Libraries
Subject
Linguistics and Language,Communication,Language and Linguistics
Cited by
43 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Evaluating the Experience of LGBTQ+ People Using Large Language Model Based Chatbots for Mental Health Support;Proceedings of the CHI Conference on Human Factors in Computing Systems;2024-05-11
2. Toward an end-to-end implicit addressee modeling for dialogue disentanglement;Multimedia Tools and Applications;2024-02-06
3. Introduction;Synthesis Lectures on Human Language Technologies;2024
4. Disentangling User Conversations with Voice Assistants for Online Shopping;Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval;2023-07-18
5. PICO: An Approach to Build Dataset for Customer Service Chatbot;2023 IEEE International Conference on Real-time Computing and Robotics (RCAR);2023-07-17