1. Neural machine translation by jointly learning to align and translate;Bahdanau,2015
2. IEMOCAP: Interactive emotional dyadic motion capture database;Busso;Lang. Resou. Eval.,2008
3. Listen, attend and spell;Chan,2015
4. Word-level speech recognition with a letter to word encoder;Collobert,2020
5. Jukebox: A generative model for music;Dhariwal,2020