1. [1] M. Tan and Q. Le, “EfficientNetV2: Smaller models and faster training,” Proceedings of the 38th international conference on machine learning (ICML), vol. 139, pp. 10096-10106, July 2021.
2. [2] A. Vaswani, et al., “Attention Is All You Need,” Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS), vol. 30, pp. 6000-6010, December 2017.
3. [3] J. Kaplan, et al., “Scaling Laws for Neural Language Models,” arXiv preprint, January 2020. DOI:10.48550/arXiv.2001.08361
4. [4] D. Silver, et al., “Mastering the game of Go without human knowledge,” Nature, vol. 550, pp. 354-359, October 2017. DOI: 10.1038/nature24270
5. [5] N.P. Jouppi, et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit,” Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), June 2017. DOI: 10.1145/3079856.3080246