1. Meshed-memory transformer for image captioning;Cornia,2020
2. The unreasonable effectiveness of CLIP features for image captioning: an experimental analysis;Barraco,2022
3. ClipCap: CLIP prefix for image captioning;Mokady,2021
4. Show, attend and tell: Neural image caption generation with visual attention;Xu,2015
5. Reasoning like humans: On dynamic attention prior in image captioning;Wang;Knowl.-Based Syst.,2021