Publisher
Springer Nature Switzerland
Reference20 articles.
1. Chefer, H., Gur, S., Wolf, L.: Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Los Alamitos, CA, USA, October 2021, pp. 387–396. IEEE Computer Society (2021). https://doi.org/10.1109/ICCV48922.2021.00045. https://doi.ieeecomputersociety.org/10.1109/ICCV48922.2021.00045
2. Chen, D., Dolan, W.: Collecting highly parallel data for paraphrase evaluation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, June 2011, pp. 190–200. Association for Computational Linguistics (2011). https://aclanthology.org/P11-1020
3. Dong, J., Li, X., Snoek, C.G.M.: Predicting visual features from text for image and video caption retrieval. IEEE Trans. Multimedia 20(12), 3377–3388 (2018). https://doi.org/10.1109/TMM.2018.2832602
4. Dong, J., et al.: Dual encoding for zero-example video retrieval. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9338–9347 (2019). https://doi.org/10.1109/CVPR.2019.00957
5. Goh, G., et al.: Multimodal neurons in artificial neural networks. Distill (2021). https://doi.org/10.23915/distill.00030. https://distill.pub/2021/multimodal-neurons