1. Multimodal deep learning;Ngiam,2011
2. Multimodal learning with deep boltzmann machines;Srivastava,2012
3. Look, imagine and match: improving textual-visual cross-modal retrieval with generative models;Gu,2018
4. A low rank structural large margin method for cross-modal ranking;Lu,2013
5. Stacked cross attention for image-text matching;Lee,2018