1. Tensorflow: A system for large-scale machine learning;Abadi,2016
2. Arora, S., Cohen, N., Golowich, N., Hu, W., 2019a. A convergence analysis of gradient descent for deep linear neural networks, in: International Conference on Learning Representations. https://openreview.net/forum?id=SkMQg3C5K7.
3. Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks;Arora,2019
4. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al., 2020. Language models are few-shot learners, in: Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
5. A dynamical view on optimization algorithms of overparameterized neural networks;Bu,2021