Funder
National Key R &D Program of China
National Natural Science Foundation of China
Major Project of Anhui Province
Publisher
Springer Science and Business Media LLC
Reference87 articles.
1. Afouras, T., Owens, A., Chung, J. S., & Zisserman, A. (2020). Self-supervised learning of audio-visual objects from video. In Proceedings of the European conference on computer vision (ECCV) (pp. 208–224).
2. Alayrac, J. B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M., et al. (2022) Flamingo: A visual language model for few-shot learning. arXiv:2204.14198
3. Arandjelovic, R., & Zisserman, A. (2017). Look, listen and learn. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 609–617).
4. Arandjelovic, R., & Zisserman, A. (2018). Objects that sound. In Proceedings of the European conference on computer vision (ECCV) (pp. 435–451).
5. Barraco, M., Cornia, M., Cascianelli, S., Baraldi, L., & Cucchiara, R. (2022). The unreasonable effectiveness of clip features for image captioning: An experimental analysis. In Workshops of proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4662–4670).
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献