1. Austin, J., Johnson, D.D., Ho, J., Tarlow, D., Van Den Berg, R.: Structured denoising diffusion models in discrete state-spaces. Adv. Neural Inform. Process. Syst. 34, 17891–17993 (2021)
2. Bachlechner, T., Majumder, B.P., Mao, H., Cottrell, G., McAuley, J.: Rezero is All You Need: Fast Convergence At Large Depth. arXiv preprint arXiv:2003.04887 (2020)
3. Lecture Notes in Mathematics;JP Buhler,1993
4. Draguns, A., Ozolinš, E., Šostaks, A., Apinis, M., Freivalds, K.: Residual shuffle-exchange networks for fast processing of long sequences. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 7245–7253 (2021)
5. Freivalds, K., Liepins, R.: Improving the neural GPU architecture for algorithm learning. The ICML workshop Neural Abstract Machines and Program Induction v2 (NAMPI 2018) (2018)