1. Branch-train-merge: Embarrassingly parallel training of expert language models;li;CoRR,2022
2. ERNIE 2.0: A continual pre-training framework for language understanding;sun;The Thirty-Fourth AAAI Conference on Artificial Intelligence AAAI 2020 The Thirty-Second Innovative Applications of Artificial Intelligence Conference IAAI 2020 The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence EAAI 2020 New York NY USA February 7-12 2020,2020
3. Gshard: Scaling giant models with conditional computation and automatic sharding;lepikhin;9th International Conference on Learning Representations ICLR 2021 Virtual Event Austria May 3-7 2021,2021
4. ERNIE: enhanced representation through knowledge integration;sun;CoRR,2019
5. Decoupled weight decay regularization;loshchilov;7th International Conference on Learning Representations ICLR 2019 New Orleans LA USA May 6-9 2019,2019