Brand-new Speech Animation Technology based on First Order Motion Model and MelGAN-VC-Reference-Cited by-同舟云学术

Brand-new Speech Animation Technology based on First Order Motion Model and MelGAN-VC

Published:2021-02-01 Issue:1 Volume:1828 Page:012029
ISSN:1742-6588
Container-title:Journal of Physics: Conference Series
language:
Short-container-title:J. Phys.: Conf. Ser.

Author:

Chen Shaomin,Gao Xinyi,Wang Jiangtao,Xiao Yu,Zhang Yueling,Xu Gang

Abstract

Abstract Speech animation has huge application potential in instant messaging and entertainment media fields such as videophones, virtual meetings, audio and video chats. The traditional voice-driven speech animation has the problem of a single adaptation language, and the performance-driven speech animation has the problem of high cost of capture equipment and difficult mass production. Based on the above existing problems, we propose a new method of speech animation generation, that is, given a static portrait of a person and a face-driven video, finally generate a face animation video of the character in the given portrait. The conversion system consists of two parts: face conversion and voice conversion. We noticed that the final generated face animation video has problems such as low definition, not smooth playback, and metallic sound. On this basis, this article proposes to increase the animation enhancement experiment and replace the encoder measures for improvement. Through comparative experiments, the above measures are proved to be effective.

Publisher

IOP Publishing

Subject

General Physics and Astronomy

Link

https://iopscience.iop.org/article/10.1088/1742-6596/1828/1/012029/pdf

Reference14 articles.

1. Speech-to-video synthesis using MPEG-4 compliant visual features;Aleksic;IEEE Transactions on Circuits and Systems for Video Technology,2004

2. Reading between the dots: combining 3D markers and FACS classification for high-quality blendshape facial animation;Ravikumar;Proc. Graphics Interface,2016

3. Generating realistic videos from keyframes with concatenated GANs;Wen;IEEE Transactions on Circuits and Systems for Video Technology,2019