1. Bert: Pre-training of deep bidirectional transformers for language understanding;Devlin;arXiv preprint arXiv:1810.04805,2018
2. Unified language model pre-training for natural language understanding and generation;Dong;Advances in neural information processing systems,2019
3. Language models are unsupervised multitask learners;Radford;OpenAI blog,2019
4. Code Prediction by Feeding Trees to Transformers
5. CodeFill