1. Nayyer Aafaq, Naveed Akhtar, Wei Liu, Syed Zulqarnain Gilani, and Ajmal Mian. 2019. Spatio-temporal dynamics and semantic attribute enriched visual encoding for video captioning. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit.12487–12496.
2. Maximiliana Behnke and Kenneth Heafield. 2020. Losing heads in the lottery: Pruning transformer attention in neural machine translation. In Proc. Conf. Empirical Methods Natural Lang. Process.2664–2674.
3. Cynthia L. Bennett, Jane E. Martez, E. Mott, Edward Cutrell, and Meredith Ringel Morris. 2018. How teens with visual impairments take, edit, and share photos on social media. In Proc. CHI Conf. Hum. Factors Comput. Syst.76.
4. João Carreira and Andrew Zisserman. 2017. Quo Vadis, action recognition? A new model and the kinetics dataset. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit.4724–4733.
5. David L. Chen and William B. Dolan. 2011. Collecting highly parallel data for paraphrase evaluation. In Proc. Annu. Meeting Assoc. Comput. Linguistics. 190–200.