1. Devise: a deep visual-semantic embedding model;Frome;Adv. Neural Inf. Process. Syst.,2013
2. T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, In ICLR, 2013, arXiv:1301.3781.
3. R. Kiros, R. Salakhutdinov, and R. S. Zemel, Unifying visual-semantic embeddings with multimodal neural language models, arXiv preprint, 2014, arXiv:1411.2539.
4. Long short-term memory;Hochreiter;Neural Comput.,1997
5. Associating neural word embeddings with deep image representations using fisher vectors;Klein,2015