1. On layer normalization in the transformer architecture;xiong;ICML,2020
2. BERT: Pre-training of deep bidirectional transformers for language understanding;devlin;NAACL-2019,2019
3. A structured self-attentive sentence embedding;lin;ICLRE,2017
4. Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
5. Deep neural network embeddings for text-independent speaker verification.;snyder;InterSpeech,2017