1. Z. Azerbayev, B. Piotrowski, H. Schoelkopf, E. W. Ayers, D. Radev, and J. Avigad, ProofNet: Autoformalizing and formally proving undergraduate-level mathematics, Preprint, arXiv:2302.12433, (2023).
2. Y. Benchekroun, M. Dervishi, M. Ibrahim, J-B. Gaya, X. Martinet, and G. Mialon, WorldSense: A synthetic benchmark for grounded reasoning in large language models, Preprint, arXiv:2311.15930, (2023).
3. D. G. Bobrow, A question-answering system for high school algebra word problems, Proceedings of the fall joint computer conference, Part I, October 27–29, 1964, (591–614).
4. R. Bommasani et al., On the opportunities and risks of foundation models, Preprint, arXiv:2108.07258, (2021).
5. Human-level play in the game of Diplomacy by combining language models with strategic reasoning;Meta Fundamental AI Research Diplomacy Team;Science,2022