1. Aberdam, A., Litman, R., Tsiper, S., et al., 2022. Sequence-to-Sequence Contrastive Learning for Text Recognition. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 15302–15312.
2. Online stochastic gradient descent on non-convex losses from high-dimensional inference;Arous;J. Mach. Learn. Res.,2021
3. Balles, L., Hennig, P., 2018. Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients. In: Proceedings of the 35th International Conference on Machine Learning. pp. 413–422.
4. Chen, X., Liu, S., Sun, R., Hong, M., 2019. On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization. In: International Conference on Learning Representations. https://openreview.net/forum?id=H1x-x309tm.
5. Diakonikolas, J., Orecchia, L., 2018. Alternating Randomized Block Coordinate Descent. In: Proceedings of the 35th International Conference on Machine Learning. pp. 1232–1240.