1. Exploring the limits of large scale pre-training;Abnar,2021
2. Intrinsic dimensionality explains the effectiveness of language model fine-tuning;Aghajanyan,2020
3. Constrained optimization and Lagrange multiplier methods;Bertsekas,2014
4. Numerical comparison of augmented Lagrangian algorithms for nonconvex problems;Birgin;Computational Optimization and Applications,2005
5. Distributed optimization and statistical learning via the alternating direction method of multipliers;Boyd,2011