1. History for Visual Dialog: Do we really need it?
2. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
3. VQA: Visual Question Answering
4. Multimodal Machine Learning: A Survey and Taxonomy
5. Hangbo Bao , Li Dong , Furu Wei , Wenhui Wang , Nan Yang , Xiaodong Liu , Yu Wang , Jianfeng Gao , Songhao Piao , Ming Zhou , and Hsiao-Wuen Hon . 2020 . UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training . In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research , Vol. 119), , Hal Daumé III and Aarti Singh (Eds.). PMLR, 642--652. https://proceedings.mlr.press/v119/bao20a.html Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, and Hsiao-Wuen Hon. 2020. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), , Hal Daumé III and Aarti Singh (Eds.). PMLR, 642--652. https://proceedings.mlr.press/v119/bao20a.html