1. Layer normalization;ba;arXiv preprint arXiv 1607 06450,2016
2. Deep Residual Learning for Image Recognition
3. Bert: Pre-training of deep bidirectional transformers for language understanding;devlin;arXiv preprint arXiv 1810 04805,2018
4. Augmented neural odes;dupont;NeurIPS,2019
5. Visualizing data using t-sne;maaten;Journal of Machine Learning Research,2008