1. Unsupervised learning from narrated instruction videos;Alayrac,2016
2. Techniques and systems for image and video retrieval;Aslandogan;IEEE Trans. Knowl. Data Eng.,1999
3. Neural machine translation by jointly learning to align and translate;Bahdanau,2015
4. Weakly-supervised alignment of video with text;Bojanowski,2015
5. Active object localization with deep reinforcement learning;Caicedo,2015