1. VR facial animation via multiview image translation
2. What Makes Training Multi-Modal Classification Networks Hard?
3. Conditional image generation with pixelcnn decoders;van den oord;Advances in neural information processing systems,2016
4. Wavenet: A generative model for raw audio;van den oord;ISCA Speech Synthesis Workshop,2016
5. Neural voice puppetry: Audio-driven facial reenactment;thies;ECCV 2020,2020