1. Jimmy Lei Ba Jamie Ryan Kiros and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450(2016). Jimmy Lei Ba Jamie Ryan Kiros and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450(2016).
2. Thomas Bachlechner Bodhisattwa Prasad Majumder Huanru Henry Mao Garrison W Cottrell and Julian McAuley. 2020. Rezero is all you need: Fast convergence at large depth. arXiv preprint arXiv:2003.04887(2020). Thomas Bachlechner Bodhisattwa Prasad Majumder Huanru Henry Mao Garrison W Cottrell and Julian McAuley. 2020. Rezero is all you need: Fast convergence at large depth. arXiv preprint arXiv:2003.04887(2020).
3. Philip Bachman R Devon Hjelm and William Buchwalter. 2019. Learning representations by maximizing mutual information across views. In Advances in Neural Information Processing Systems. 15535–15545. Philip Bachman R Devon Hjelm and William Buchwalter. 2019. Learning representations by maximizing mutual information across views. In Advances in Neural Information Processing Systems. 15535–15545.
4. Shaojie Bai J Zico Kolter and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271(2018). Shaojie Bai J Zico Kolter and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271(2018).
5. Ting Chen , Simon Kornblith , Mohammad Norouzi , and Geoffrey Hinton . 2020 . A simple framework for contrastive learning of visual representations . In International conference on machine learning. PMLR, 1597–1607 . Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.