1. Mead: A large-scale audio-visual dataset for emotional talking-face generation;wang;European Conference on Computer Vision,0
2. Patch to the Future: Unsupervised Visual Prediction
3. Scaling autoregressive video models;weissenborn;International Con-ference on Learning Representations,2020