Publisher
Springer Nature Switzerland
Reference37 articles.
1. Al-Malla, M.A., Jafar, A., Ghneim, N.: Image captioning model using attention and object features to mimic human image understanding. J. Big Data 9(1), 1–16 (2022)
2. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. ArXiv abs/2004.10934 (2020)
3. Chen, X., Hsieh, C.J., Gong, B.: When vision Transformers outperform ResNets without pre-training or strong data augmentations. arXiv preprint arXiv:2106.01548 (2021)
4. Chen, X., Zitnick, C.L.: Learning a recurrent visual representation for image caption generation. arXiv preprint arXiv:1411.5654 (2014)
5. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807 (2017)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Deep Vision Transformer and T5-Based for Image Captioning;2023 RIVF International Conference on Computing and Communication Technologies (RIVF);2023-12-23