Rhythmic Gesticulator


Ao Tenglong1,Gao Qingzhe2,Lou Yuke1,Chen Baoquan1,Liu Libin1


1. Peking University, China

2. Shandong University and Peking University, China


Automatic synthesis of realistic co-speech gestures is an increasingly important yet challenging task in artificial embodied agent creation. Previous systems mainly focus on generating gestures in an end-to-end manner, which leads to difficulties in mining the clear rhythm and semantics due to the complex yet subtle harmony between speech and gestures. We present a novel co-speech gesture synthesis method that achieves convincing results both on the rhythm and semantics. For the rhythm, our system contains a robust rhythm-based segmentation pipeline to ensure the temporal coherence between the vocalization and gestures explicitly. For the gesture semantics, we devise a mechanism to effectively disentangle both low- and high-level neural embeddings of speech and motion based on linguistic theory. The high-level embedding corresponds to semantics, while the low-level embedding relates to subtle variations. Lastly, we build correspondence between the hierarchical embeddings of the speech and the motion, resulting in rhythm- and semantics-aware gesture synthesis. Evaluations with existing objective metrics, a newly proposed rhythmic metric, and human feedback show that our method outperforms state-of-the-art systems by a clear margin.


Association for Computing Machinery (ACM)


Computer Graphics and Computer-Aided Design

Reference74 articles.

1. Style‐Controllable Speech‐Driven Gesture Synthesis Using Normalising Flows

2. Deep motifs and motion signatures

3. Andreas Aristidou , Anastasios Yiannakidis , Kfir Aberman , Daniel Cohen-Or , Ariel Shamir , and Yiorgos Chrysanthou . 2022. Rhythm is a Dancer: Music-Driven Motion Synthesis with Global Structure . IEEE Transactions on Visualization and Computer Graphics ( 2022 ), 1--1. Andreas Aristidou, Anastasios Yiannakidis, Kfir Aberman, Daniel Cohen-Or, Ariel Shamir, and Yiorgos Chrysanthou. 2022. Rhythm is a Dancer: Music-Driven Motion Synthesis with Global Structure. IEEE Transactions on Visualization and Computer Graphics (2022), 1--1.

4. Alexei Baevski , Steffen Schneider , and Michael Auli . 2020 . vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations . In International Conference on Learning Representations. Alexei Baevski, Steffen Schneider, and Michael Auli. 2020. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations. In International Conference on Learning Representations.

5. Multimodal Machine Learning: A Survey and Taxonomy

Cited by 19 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26

2. The KCL-SAIR team's entry to the GENEA Challenge 2023 Exploring Role-based Gesture Generation in Dyadic Interactions: Listener vs. Speaker;International Cconference on Multimodal Interaction;2023-10-09

3. DiffuGesture: Generating Human Gesture From Two-person Dialogue With Diffusion Models;International Cconference on Multimodal Interaction;2023-10-09

4. Gesture Motion Graphs for Few-Shot Speech-Driven Gesture Reenactment;INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION;2023-10-09

5. Large language models in textual analysis for gesture selection;INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION;2023-10-09








Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3