1. Taming Transformers for High-Resolution Image Synthesis
2. Neural discrete representation learning;van den oord;Advances in neural information processing systems,2017
3. Vid2speech: speech re-construction from silent video;ephrat;in 2017 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP),0
4. FoleyAutomatic
5. Learning to separate object sounds by watching unlabeled video;gao;In Proceedings of the European Conference on Computer Vision (ECCV),0