Abstract
In this paper we propose a novel deep learning based approach to generate realistic three-party head and eye motions based on novel acoustic speech input together with speaker marking (i.e., speaking time for each interlocutor). Specifically, we first acquire a high quality, three-party conversational motion dataset. Then, based on the acquired dataset, we train a deep learning based framework to automatically predict the dynamic directions of both the eyes and heads of all the interlocutors based on speech signal input. Via the combination of existing lip-sync and speech-driven hand/body gesture generation algorithms, we can generate realistic three-party conversational animations. Through many experiments and comparative user studies, we demonstrate that our approach can generate realistic three-party head-and-eye motions based on novel speech recorded on new subjects with different genders and ethnicities.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Computer Science Applications
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. S3: Speech, Script and Scene driven Head and Eye Animation;ACM Transactions on Graphics;2024-07-19
2. Real-Time Conversational Gaze Synthesis for Avatars;ACM SIGGRAPH Conference on Motion, Interaction and Games;2023-11-15
3. Multimodal Turn Analysis and Prediction for Multi-party Conversations;INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION;2023-10-09
4. S2M-Net: Speech Driven Three-party Conversational Motion Synthesis Networks;Proceedings of the 15th ACM SIGGRAPH Conference on Motion, Interaction and Games;2022-11-03
5. Social robots as eating companions;Frontiers in Computer Science;2022-08-31