Author:
Ishii Ryo,Otsuka Kazuhiro,Kumano Shiro,Higashinaka Ryuichiro,Tomita Junji
Abstract
We investigated the mouth-opening transition pattern (MOTP), which represents the change of mouth-opening degree during the end of an utterance, and used it to predict the next speaker and utterance interval between the start time of the next speaker’s utterance and the end time of the current speaker’s utterance in a multi-party conversation. We first collected verbal and nonverbal data that include speech and the degree of mouth opening (closed, narrow-open, wide-open) of participants that were manually annotated in four-person conversation. A key finding of the MOTP analysis is that the current speaker often keeps her mouth narrow-open during turn-keeping and starts to close it after opening it narrowly or continues to open it widely during turn-changing. The next speaker often starts to open her mouth narrowly after closing it during turn-changing. Moreover, when the current speaker starts to close her mouth after opening it narrowly in turn-keeping, the utterance interval tends to be short. In contrast, when the current speaker and the listeners open their mouths narrowly after opening them narrowly and then widely, the utterance interval tends to be long. On the basis of these results, we implemented prediction models of the next-speaker and utterance interval using MOTPs. As a multimodal-feature fusion, we also implemented models using eye-gaze behavior, which is one of the most useful items of information for prediction of next-speaker and utterance interval according to our previous study, in addition to MOTPs. The evaluation result of the models suggests that the MOTPs of the current speaker and listeners are effective for predicting the next speaker and utterance interval in multi-party conversation. Our multimodal-feature fusion model using MOTPs and eye-gaze behavior is more useful for predicting the next speaker and utterance interval than using only one or the other.
Subject
Computer Networks and Communications,Computer Science Applications,Human-Computer Interaction,Neuroscience (miscellaneous)
Reference45 articles.
1. Conversation Scene Analysis [Social Sciences]
2. Predicting of Who Will Be the Next Speaker and When Using Gaze Behavior in Multiparty Meetings;Ishii;ACM TiiS,2016
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献