1. Deep speech 2: end-to-end speech recognition in english and mandarin;Amodei,2016
2. Blended diffusion for text-driven editing of natural images;Avrahami,2022
3. wav2vec 2.0: a framework for self-supervised learning of speech representations;Baevski;Adv. Neural Inf. Proces. Syst.,2020
4. Conditional image generation with score-based diffusion models;Batzolis;arXiv,2021
5. Realistic talking face animation with speech-induced head motion;Biswas,2021