1. Bottom-up and top-down attention for image captioning and visual question answering;Anderson,2018
2. Layer normalization;Ba,2016
3. Image captioning model using part-of-speech guidance module for description with diverse vocabulary;Bae;IEEE Access,2022
4. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments;Banerjee,2005
5. Cross-modal memory networks for radiology report generation;Chen,2021