1. Adversarial Generative Grammars for Human Activity Prediction
2. Episodic Transformer for Vision-and-Language Navigation
3. It is not the journey but the destination: Endpoint conditioned trajectory prediction;mangalam;Proceedings of the European Conference on Computer Vision (ECCV),0
4. UniVL: A unified video and language pre-training model for multimodal understanding and generation;luo;ArXiv Preprint,2020
5. ViL-BERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks;lu;Advances in Neural IInformation Processing Systems,0