1. VQA: Visual Question Answering
2. Satanjeev Banerjee and Alon Lavie . 2005 . METEOR: An automatic metric for MT evaluation with improved correlation with human judgments . In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72 . Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72.
3. Tom B Brown , Benjamin Mann , Nick Ryder , Melanie Subbiah , Jared Kaplan , Prafulla Dhariwal , Arvind Neelakantan , Pranav Shyam , Girish Sastry , Amanda Askell , 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 ( 2020 ). Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020).
4. Describing Clothing by Semantic Attributes
5. Jaemin Cho Jie Lei Hao Tan and Mohit Bansal. 2021. Unifying Vision-and-Language Tasks via Text Generation. In ICML. Jaemin Cho Jie Lei Hao Tan and Mohit Bansal. 2021. Unifying Vision-and-Language Tasks via Text Generation. In ICML.