1. Andrychowicz, M., et al.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems, pp. 3981–3989 (2016)
2. Bansal, N., Chen, X., Wang, Z.: Can we gain more from orthogonality regularizations in training deep CNNs? arXiv preprint arXiv:1810.09102 (2018)
3. Chen, P.H., Reddi, S., Kumar, S., Hsieh, C.J.: Learning to learn with better convergence (2020). https://openreview.net/forum?id=S1xGCAVKvr
4. Chen, T., et al.: Training stronger baselines for learning to optimize. In: Advances in Neural Information Processing Systems 33 (2020)
5. Cotter, N.E., Conwell, P.R.: Fixed-weight networks can learn. In: 1990 IJCNN International Joint Conference on Neural Networks, pp. 553–559. IEEE (1990)