1. Simon Alexanderson , Rajmund Nagy , Jonas Beskow , and Gustav Eje Henter . 2022. Listen , denoise, action! audio-driven motion synthesis with diffusion models. arXiv preprint arXiv:2211.09707 ( 2022 ). Simon Alexanderson, Rajmund Nagy, Jonas Beskow, and Gustav Eje Henter. 2022. Listen, denoise, action! audio-driven motion synthesis with diffusion models. arXiv preprint arXiv:2211.09707 (2022).
2. Tenglong Ao , Zeyi Zhang , and Libin Liu . 2023. GestureDiffuCLIP: Gesture diffusion model with CLIP latents. arXiv preprint arXiv:2303.14613 ( 2023 ). Tenglong Ao, Zeyi Zhang, and Libin Liu. 2023. GestureDiffuCLIP: Gesture diffusion model with CLIP latents. arXiv preprint arXiv:2303.14613 (2023).
3. Rohan Badlani , Adrian Łańcucki , Kevin J Shih , Rafael Valle , Wei Ping , and Bryan Catanzaro . 2022 . One TTS alignment to rule them all . In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6092–6096 . Rohan Badlani, Adrian Łańcucki, Kevin J Shih, Rafael Valle, Wei Ping, and Bryan Catanzaro. 2022. One TTS alignment to rule them all. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6092–6096.
4. Multimodal Machine Learning: A Survey and Taxonomy
5. The IVI Lab entry to the GENEA Challenge 2022 – A Tacotron2 Based Method for Co-Speech Gesture Generation With Locality-Constraint Attention Mechanism