1. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
2. Layer normalization;Ba,2016
3. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments;Banerjee
4. Beit: Bert pre-training of image transformers;Bao
5. Microsoft coco captions: Data collection and evaluation server;Chen,2015