1. VQA: Visual question answering;Antol,2015
2. Translating embeddings for modeling multi-relational data;Bordes,2013
3. Multi-channel graph neural network for entity alignment;Cao,2019
4. Cross-modal retrieval in the cooking context: Learning semantic text-image embeddings;Carvalho,2018
5. Meaformer: Multi-modal entity alignment transformer for meta modality hybrid;Chen,2023