Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech-Reference-Cited by-同舟云学术

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

Published:2000-09 Issue:3 Volume:26 Page:339-373
ISSN:0891-2017
Container-title:Computational Linguistics
language:en
Short-container-title:Computational Linguistics

Author:

Stolcke Andreas¹,Ries Klaus²,Coccaro Noah³,Shriberg Elizabeth⁴,Bates Rebecca⁵,Jurafsky Daniel³,Taylor Paul⁶,Martin Rachel⁷,Ess-Dykema Carol Van⁸,Meteer Marie⁹

Affiliation:

1. SRI International, Speech Technology and Research Laboratory, SRI International, 333 Ravenswood Ave., Menlo Park, CA 94025, 1-650-859-2544.

2. Carnegie Mellon University and University of Karlsruhe

3. University of Colorado at Boulder

4. SRI International

5. University of Washington

6. University of Edinburgh

7. Johns Hopkins University

8. U.S. Department of Defense

9. BBN Technologies

Abstract

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as STATEMENT, Question, BACKCHANNEL, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/089120100561737

Reference22 articles.

1. The Hcrc Map Task Corpus

2. A Maximum Likelihood Approach to Continuous Speech Recognition

3. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains

Cited by 411 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Towards Modeling and Evaluating Instructional Explanations in Teacher-Student Dialogues;Proceedings of the 2024 International Conference on Information Technology for Social Good;2024-09-04

2. Task-based dialogue policy learning based on diffusion models;Applied Intelligence;2024-09-02

3. Multimodal Dialog Act Classification for Digital Character Conversations;ACM Conversational User Interfaces 2024;2024-07-08

4. Applying Large Language Models to Enhance Dialogue and Communication Analysis for Adaptive Team Training;2024-07-01

5. Prediction Models of Collaborative Behaviors in Dyadic Interactions: An Application for Inclusive Teamwork Training in Virtual Environments;Signals;2024-06-03