1. Comparing partitions
2. Comparing clusterings—an information based distance
3. Learning phrase representations using RNN encoderd-ecoder for statistical machine translation;cho;CoRR,2014
4. BERT: Pretraining of deep bidirectional transformers for language understanding;devlin;Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Volume 1 (Long and Short Papers),2019
5. Adam: A method for stochastic optimization;kingma;3rd International Conference on Learning Representations ICLR 2015,2015