1. Dense Monocular Depth Estimation in Complex Dynamic Scenes
2. Im2text: Describing images using 1 million captioned photographs;ordonez;Advances in neural information processing systems,2011
3. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks;lu;Advances in neural information processing systems,2019
4. Microsoft coco: Common objects in context;lin;European Conference on Computer Vision,2014
5. A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images