1. Understanding gradient descent on the edge of stability in deep learning;Arora,2022
2. Decentralized deep learning using momentum-accelerated consensus;Balu,2021
3. EMNIST: extending mnist to handwritten letters;Cohen,2017
4. Momentum improves normalized SGD;Cutkosky,2020