1. Adam: A Method for Stochastic Optimization;kingma;ICLRE,2015
2. SGDR: Stochastic Gradient Descent with Warm Restarts;loshchilov;ICLRE,2017
3. The Elements of Statistical Learning;hastie;Springer,2009
4. RoBERTa: A Robustly Optimized BERT Pretraining Approach;liu;arXiv 1907 11692,0