Improving mathematics assessment readability: Do large language models help?

Author:

Patel Nirmal1ORCID,Nagpal Pooja2,Shah Tirth1,Sharma Aditya1,Malvi Shrey1,Lomas Derek3

Affiliation:

1. Playpower Labs Gujarat India

2. Central Square Foundation Delhi India

3. Industrial Design Engineering Delft University of Technology Delft Netherlands

Abstract

AbstractBackgroundReadability metrics provide us with an objective and efficient way to assess the quality of educational texts. We can use the readability measures for finding assessment items that are difficult to read for a given grade level. Hard‐to‐read math word problems can put some students at a disadvantage if they are behind in their literacy learning. Despite their math abilities, these students can perform poorly on difficult‐to‐read word problems because of their poor reading skills. Less readable math tests can create equity issues for students who are relatively new to the language of assessment. Less readable test items can also affect the assessment's construct validity by partially measuring reading comprehension.ObjectivesThis study shows how large language models help us improve the readability of math assessment items.MethodsWe analysed 250 test items from grades 3 to 5 of EngageNY, an open‐source curriculum. We used the GPT‐3 AI system to simplify the text of these math word problems. We used text prompts and the few‐shot learning method for the simplification task.Results and ConclusionsOn average, GPT‐3 AI produced output passages that showed improvements in readability metrics, but the outputs had a large amount of noise and were often unrelated to the input. We used thresholds over text similarity metrics and changes in readability measures to filter out the noise. We found meaningful simplifications that can be given to item authors as suggestions for improvement.TakeawaysGPT‐3 AI is capable of simplifying hard‐to‐read math word problems. The model generates noisy simplifications using text prompts or few‐shot learning methods. The noise can be filtered using text similarity and readability measures. The meaningful simplifications AI produces are sound but not ready to be used as a direct replacement for the original items. To improve test quality, simplifications can be suggested to item authors at the time of digital question authoring.

Publisher

Wiley

Subject

Computer Science Applications,Education

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3