Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Artificial Intelligence,Computer Networks and Communications,Computer Science Applications,Computer Vision and Pattern Recognition,Modeling and Simulation,Signal Processing,Control and Systems Engineering
Reference44 articles.
1. H. Zhu, M. D. Luo, R. Wang, A. H. Zheng, R. He. Deep audio-visual learning: A survey. International Journal of Automation and Computing, vol. 18, no. 3, pp. 351–376, 2021. DOI: https://doi.org/10.1007/s11633-021-1293-0.
2. L. W. Zhou, H. Palangi, L. Zhang, H. D. Hu, J. Corso, J. F. Gao. Unified vision-language pre-training for image captioning and VQA. In Proceedings of AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 13041–13049, 2020. DOI: https://doi.org/10.1609/aaai.v34i07.7005.
3. Y. C. Chen, L. J. Li, L. C. Yu, A. El Kholy, F. Ahmed, Z. Gan, Y. Cheng, J. J. Liu. UNITER: Universal image-text representation learning. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 104–120, 2020. DOI: https://doi.org/10.1007/978-3-030-58577-8_7.
4. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, 2021.
5. N. Reimers, I. Gurevych. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 4512–4525, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.365.