1. Bert: Pre-training of deep bidirectional transformers for language understanding;devlin;arXiv preprint arXiv 1810 04805,2018
2. Teaching machines to read and comprehend;hermann;Advances in neural information processing systems,2015
3. How2: a large-scale dataset for multimodal language understanding;sanabria;arXiv preprint arXiv 1811 00347,2018
4. A Neural Attention Model for Abstractive Sentence Summarization