1. Structured multimodal feature embedding and alignment for image-sentence retrieval;ge;ACMMM,0
2. Attention is all you need;vaswani;NeurIPS,0
3. De-vise: A deep visual-semantic embedding model;frome;NeurIPS,0
4. Visual word ambiguity;van gemert;PAMI,2009
5. Deep Residual Learning for Image Recognition