1. J. Hestness et al. Deep learning scaling is predictable empirically. arXiv [Preprint] (2017). https://doi.org/10.48550/arXiv.1712.00409 (Accessed 1 January 2021).
2. J. Kaplan et al. Scaling laws for neural language models. arXiv [Preprint] (2020). https://doi.org/10.48550/arXiv.2001.08361 (Accessed 1 January 2021).
3. J. S. Rosenfeld A. Rosenfeld Y. Belinkov N. Shavit “A constructive prediction of the generalization error across scales” in International Conference on Learning Representations (2020).
4. T. Henighan et al. Scaling laws for autoregressive generative modeling. arXiv [Preprint] (2020). https://doi.org/10.48550/arXiv.2010.14701 (Accessed 1 January 2021).
5. J. S. Rosenfeld J. Frankle M. Carbin N. Shavit “On the predictability of pruning across scales” in International Conference on Machine Learning (PMLR 2021) pp. 9075–9083.