1. B. L. Edelman, S. Goel, S. Kakade, and C. Zhang, “Inductive Biases and Variable Creation in Self-Attention Mechanisms,” in Proceedings of the 39th International Conference on Machine Learning, Proceedings of Machine Learning Research, pp. 5793—5831, 2022.
2. S. Garg, D. Tsipras, P. S. Liang, and G. Valiant, “What can transformers learn in-context? a case study of simple function classes,” Advances in Neural Information Processing Systems, vol. 35, pp. 30583-30598, 2022.