Trimodal prediction of speaking and listening willingness to help improve turn-changing modeling-Reference-Cited by-同舟云学术

Trimodal prediction of speaking and listening willingness to help improve turn-changing modeling

Published:2022-10-18 Issue: Volume:13 Page:
ISSN:1664-1078
Container-title:Frontiers in Psychology
language:
Short-container-title:Front. Psychol.

Author:

Ishii Ryo,Ren Xutong,Muszynski Michal,Morency Louis-Philippe

Abstract

Participants in a conversation must carefully monitor the turn-management (speaking and listening) willingness of other conversational partners and adjust their turn-changing behaviors accordingly to have smooth conversation. Many studies have focused on developing actual turn-changing (i.e., next speaker or end-of-turn) models that can predict whether turn-keeping or turn-changing will occur. Participants' verbal and non-verbal behaviors have been used as input features for predictive models. To the best of our knowledge, these studies only model the relationship between participant behavior and turn-changing. Thus, there is no model that takes into account participants' willingness to acquire a turn (turn-management willingness). In this paper, we address the challenge of building such models to predict the willingness of both speakers and listeners. Firstly, we find that dissonance exists between willingness and actual turn-changing. Secondly, we propose predictive models that are based on trimodal inputs, including acoustic, linguistic, and visual cues distilled from conversations. Additionally, we study the impact of modeling willingness to help improve the task of turn-changing prediction. To do so, we introduce a dyadic conversation corpus with annotated scores of speaker/listener turn-management willingness. Our results show that using all three modalities (i.e., acoustic, linguistic, and visual cues) of the speaker and listener is critically important for predicting turn-management willingness. Furthermore, explicitly adding willingness as a prediction task improves the performance of turn-changing prediction. Moreover, turn-management willingness prediction becomes more accurate when this joint prediction of turn-management willingness and turn-changing is performed by using multi-task learning techniques.

Publisher

Frontiers Media SA

Subject

General Psychology

Reference61 articles.

1. “Towards incremental end-of-utterance detection in dialogue systems,”;Atterer;International Conference on Computational Linguistics (COLING),2008

2. “Openface 2.0: facial behavior analysis toolkit,”;Baltrusaitis;2018 13th IEEE International Conference on Automatic Face,2018

3. Iemocap: interactive emotional dyadic motion capture database;Busso;Lang. Resour. Evaluat,2008

4. Multimodal floor control shift detection;Chen;In ICMI, pages,2009

5. “Learning phrase representations using RNN encoder-decoder for statistical machine translation,”;Cho;Conference on Empirical Methods in Natural Language Processing (EMNLP),2014

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multimodal Voice Activity Prediction: Turn-taking Events Detection in Expert-Novice Conversation;International Conference on Human-Agent Interaction;2023-12-04

2. A Study of Prediction of Listener's Comprehension Based on Multimodal Information;Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents;2023-09-19