Rhythmic Gesticulator-Reference-Cited by-同舟云学术

Rhythmic Gesticulator

Published:2022-11-30 Issue:6 Volume:41 Page:1-19
ISSN:0730-0301
Container-title:ACM Transactions on Graphics
language:en
Short-container-title:ACM Trans. Graph.

Author:

Ao Tenglong¹,Gao Qingzhe²,Lou Yuke¹,Chen Baoquan¹,Liu Libin¹

Affiliation:

1. Peking University, China

2. Shandong University and Peking University, China

Abstract

Automatic synthesis of realistic co-speech gestures is an increasingly important yet challenging task in artificial embodied agent creation. Previous systems mainly focus on generating gestures in an end-to-end manner, which leads to difficulties in mining the clear rhythm and semantics due to the complex yet subtle harmony between speech and gestures. We present a novel co-speech gesture synthesis method that achieves convincing results both on the rhythm and semantics. For the rhythm, our system contains a robust rhythm-based segmentation pipeline to ensure the temporal coherence between the vocalization and gestures explicitly. For the gesture semantics, we devise a mechanism to effectively disentangle both low- and high-level neural embeddings of speech and motion based on linguistic theory. The high-level embedding corresponds to semantics, while the low-level embedding relates to subtle variations. Lastly, we build correspondence between the hierarchical embeddings of the speech and the motion, resulting in rhythm- and semantics-aware gesture synthesis. Evaluations with existing objective metrics, a newly proposed rhythmic metric, and human feedback show that our method outperforms state-of-the-art systems by a clear margin.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design

Link

https://dl.acm.org/doi/pdf/10.1145/3550454.3555435

Reference74 articles.

1. Style‐Controllable Speech‐Driven Gesture Synthesis Using Normalising Flows

2. Deep motifs and motion signatures

3. Andreas Aristidou , Anastasios Yiannakidis , Kfir Aberman , Daniel Cohen-Or , Ariel Shamir , and Yiorgos Chrysanthou . 2022. Rhythm is a Dancer: Music-Driven Motion Synthesis with Global Structure . IEEE Transactions on Visualization and Computer Graphics ( 2022 ), 1--1. Andreas Aristidou, Anastasios Yiannakidis, Kfir Aberman, Daniel Cohen-Or, Ariel Shamir, and Yiorgos Chrysanthou. 2022. Rhythm is a Dancer: Music-Driven Motion Synthesis with Global Structure. IEEE Transactions on Visualization and Computer Graphics (2022), 1--1.

4. Alexei Baevski , Steffen Schneider , and Michael Auli . 2020 . vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations . In International Conference on Learning Representations. Alexei Baevski, Steffen Schneider, and Michael Auli. 2020. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations. In International Conference on Learning Representations.

5. Multimodal Machine Learning: A Survey and Taxonomy

Cited by 19 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26

2. The KCL-SAIR team's entry to the GENEA Challenge 2023 Exploring Role-based Gesture Generation in Dyadic Interactions: Listener vs. Speaker;International Cconference on Multimodal Interaction;2023-10-09

3. DiffuGesture: Generating Human Gesture From Two-person Dialogue With Diffusion Models;International Cconference on Multimodal Interaction;2023-10-09

4. Gesture Motion Graphs for Few-Shot Speech-Driven Gesture Reenactment;INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION;2023-10-09

5. Large language models in textual analysis for gesture selection;INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION;2023-10-09