LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild-Reference-Cited by-同舟云学术

LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild

Published:2024-02 Issue: Volume:157 Page:103028
ISSN:0167-6393
Container-title:Speech Communication
language:en
Short-container-title:Speech Communication

Author:

Chen Zhipeng^ORCID,Wang Xinheng,Xie Lun,Yuan Haijie,Pan Hang

Funder

Beijing Natural Science Foundation

Publisher

Elsevier BV

Subject

Computer Science Applications,Computer Vision and Pattern Recognition,Linguistics and Language,Language and Linguistics,Communication,Modeling and Simulation,Software

Reference36 articles.

1. LRS3-TED: a large-scale dataset for visual speech recognition;Afouras,2018

2. Agarwal, M., et al., 2023. Audio-visual face reenactment. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 5178–5187.

3. Chan, K.C., Wang, X., Yu, K., Dong, C., Loy, C.C., 2021. Basicvsr: The search for essential components in video super-resolution and beyond. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.. pp. 4947–4956. http://dx.doi.org/10.1109/CVPR46437.2021.00491.

4. Chen, L., Maddox, R.K., Duan, Z., Xu, C., 2019. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.. pp. 7832–7841. http://dx.doi.org/10.1109/CVPR.2019.00802.

5. Cheng, K., Cun, X., Zhang, Y., Xia, M., Yin, F., Zhu, M., Wang, N., 2022. VideoReTalking: Audio-based lip synchronization for Talking Head Video Editing In the Wild. In: SIGGRAPH Asia 2022 Conference Papers. pp. 1–9. http://dx.doi.org/10.1145/3550469.3555399.