1. Alistarh, D., Grubic, D., Li, J. Z., Tomioka, R., & Vojnovic, M.: Qsgd: Communication-efficient SGD via gradient quantization and encoding. In Proceedings of the 31st international conference on neural information processing systems (NIPS’17) (pp. 1707–1718). Curran Associates Inc.
2. Arora, S., Ge, R., Neyshabur, B., & Zhang, Y. (2018). Stronger generalization bounds for deep nets via a compression approach. In International conference on machine learning, PMLR (pp. 254–263).
3. Ashbrock, J., & Powell, A. M. (2021). Stochastic Markov gradient descent and training low-bit neural networks. Sampling Theory, Signal Processing, and Data Analysis, 19(15), 1.
4. Bǎlcan, M.-F., & Blum, A. (2010). A discriminative model for semi-supervised learning. Journal of the ACM, 57(3), 1.
5. Bartlett, P. L., Bousquet, O., & Mendelson, S. (2005). Local Rademacher complexities. The Annals of Statistics, 33(4), 1497–1537.