A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level

Author:

Drori Iddo12,Zhang Sarah3,Shuttleworth Reece1ORCID,Tang Leonard4,Lu Albert1ORCID,Ke Elizabeth1,Liu Kevin1,Chen Linda1,Tran Sunny1ORCID,Cheng Newman2ORCID,Wang Roman2ORCID,Singh Nikhil5ORCID,Patti Taylor L.6,Lynch Jayson7,Shporer Avi8ORCID,Verma Nakul2,Wu Eugene2,Strang Gilbert3ORCID

Affiliation:

1. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America

2. Department of Computer Science, Columbia University, New York, NY 10027, United States of America

3. Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America

4. Department of Mathematics, Harvard University, Cambridge, MA 02138, United States of America

5. Media Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America

6. Department of Physics, Harvard University, Cambridge, MA 02138, United States of America

7. School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada

8. Department of Physics and Kavli Institute for Astrophysics and Space Research, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America

Abstract

We demonstrate that a neural network pretrained on text and fine-tuned on code solves mathematics course problems, explains solutions, and generates questions at a human level. We automatically synthesize programs using few-shot learning and OpenAI’s Codex transformer and execute them to solve course problems at 81% automatic accuracy. We curate a dataset of questions from Massachusetts Institute of Technology (MIT)’s largest mathematics courses (Single Variable and Multivariable Calculus, Differential Equations, Introduction to Probability and Statistics, Linear Algebra, and Mathematics for Computer Science) and Columbia University’s Computational Linear Algebra. We solve questions from a MATH dataset (on Prealgebra, Algebra, Counting and Probability, Intermediate Algebra, Number Theory, and Precalculus), the latest benchmark of advanced mathematics problems designed to assess mathematical reasoning. We randomly sample questions and generate solutions with multiple modalities, including numbers, equations, and plots. The latest GPT-3 language model pretrained on text automatically solves only 18.8% of these university questions using zero-shot learning and 30.8% using few-shot learning and the most recent chain of thought prompting. In contrast, program synthesis with few-shot learning using Codex fine-tuned on code generates programs that automatically solve 81% of these questions. Our approach improves the previous state-of-the-art automatic solution accuracy on the benchmark topics from 8.8 to 81.1%. We perform a survey to evaluate the quality and difficulty of generated questions. This work automatically solves university-level mathematics course questions at a human level and explains and generates university-level mathematics course questions at scale, a milestone for higher education.

Publisher

Proceedings of the National Academy of Sciences

Subject

Multidisciplinary

Reference20 articles.

1. 7 Revealing Ways AIs Fail: Neural Networks can be Disastrously Brittle, Forgetful, and Surprisingly Bad at Math

2. A. Vaswani “Attention is all you need” in Proceedings of Advances in Neural Information Processing Systems (2017) eds. I Guyon et al. (Curran Associates Inc. Long Beach CA) vol. 30.

3. T. B. Brown “Language models are few-shot learners” in Proceedings of Advances in Neural Information Processing Systems (2020) eds. H Larochelle M Ranzato R Hadsell MF Balcan HT Lin. (Curran Associates Inc. Virtual) vol. 33 pp. 1877–1901.

4. D. Hendrycks “Measuring massive multitask language understanding” in Proceedings of the International Conference on Learning Representations (2021) eds. A Oh N Murray I Titov (Virtual).

5. D. Hendrycks “Measuring mathematical problem solving with the MATH dataset” in Proceedings of Advances in Neural Information Processing Systems: Datasets and Benchmarks (2021) eds. J Vanschoren S Yeung. (Virtual) vol. 1.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3