1. BERT: Pre-training of deep bidirectional transformers for language understanding;Devlin;arXiv,2018
2. Natural language processing (almost) from scratch;Collobert;J. Mach. Learn. Res.,2011
3. Bidirectional LSTM-CRF models for sequence tagging;Huang;arXiv,2015