1. BERT: Pre-training of deep bidirectional transformers for language understanding;Devlin,2019
2. StructBERT: Incorporating language structures into pre-training for deep language understanding;Wang,2019
3. Language models are few-shot learners;Brown,2020
4. Rules and Representations;Chomsky,1980
5. Aspects of the Theory of Syntax;Chomsky,1965