1. Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, and Wen-tau Yih. Retrieval-augmented multimodal language modeling. arXiv preprint arXiv:2211.12561, 2022.
2. Multimodal knowledge enhanced visual-semantic embedding for image-text retrieval. ACM Transactions on Multimedia Computing;Feng Duoduo;Communications and Applications,2023
3. Audio–text retrieval based on contrastive learning and collaborative attention mechanism
4. TIAR: Text-Image-Audio Retrieval with weighted multimodal re-ranking
5. Content-Based Music-Image Retrieval Using Self- and Cross-Modal Feature Embedding Memory