1. Memory Aware Synapses: Learning What (not) to Forget
2. Jimmy Lei Ba Jamie Ryan Kiros and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450(2016). Jimmy Lei Ba Jamie Ryan Kiros and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450(2016).
3. Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473(2014). Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473(2014).
4. Payal Bajaj Daniel Campos Nick Craswell Li Deng Jianfeng Gao Xiaodong Liu Rangan Majumder Andrew McNamara Bhaskar Mitra Tri Nguyen 2016. Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268(2016). Payal Bajaj Daniel Campos Nick Craswell Li Deng Jianfeng Gao Xiaodong Liu Rangan Majumder Andrew McNamara Bhaskar Mitra Tri Nguyen 2016. Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268(2016).
5. Hangbo Bao , Li Dong , Furu Wei , Wenhui Wang , Nan Yang , Xiaodong Liu , Yu Wang , Jianfeng Gao , Songhao Piao , Ming Zhou , 2020 . Unilmv2: Pseudo-masked language models for unified language model pre-training . In International Conference on Machine Learning. PMLR, 642–652 . Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, 2020. Unilmv2: Pseudo-masked language models for unified language model pre-training. In International Conference on Machine Learning. PMLR, 642–652.