1. Unsupervised Cross-lingual Representation Learning at Scale
2. Devlin, J. , Chang, M. , Lee, K. and Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.
3. Evang, K. , Basile, V. , Chrupala, G. and Bos, J. (2013). Elephant: Sequence labeling for word and sentence segmentation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA. Association for Computational Linguistics, pp. 1422–1426.
4. Hashimoto, K. , Xiong, C. , Tsuruoka, Y. and Socher, R. (2016). A joint many-task model: Growing a neural network for multiple NLP tasks. CoRR, abs/1611.01587.
5. Neudecker, C. (2016). An open corpus for named entity recognition in historic newspapers. In Calzolari N., Choukri K., Declerck T., Goggi S., Grobelnik M., Maegaard B., Mariani J., Mazo H., Moreno A., Odijk J. and Piperidis S. (eds), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia. European Language Resources Association (ELRA), pp. 4348–4352.