Abstract
Due to the rapidly improving capability of large language models such as Generative Pre-trained Transformer models (GPT), artificial intelligence (AI) based tools have entered use in education at scale. However, empirical data are largely lacking on the effects of AI tools on learning. Here, we determine the impact of four GPT-based tools on college-aged participants’ comprehension of standardized American College Test (ACT)-derived passages and associated tests using a randomized cross-over online study (n = 195). The four tools studied were AI-generated summaries, AI-generated outlines, a question-and-answer tutor chatbot, and a Socratic discussion chatbot. Consistent with our pre-registered hypotheses, we found a differential effect of AI tools based on baseline reading comprehension ability. AI tools significantly improved comprehension in lower performing participants and significantly worsened comprehension in higher performing participants. With respect to specific tools, low performers were most benefited by the Socratic chatbot while high performers were worsened most by the summary tool. These findings suggest that while AI tools have massive potential to enhance learning, blanket implementation may cause unintended harm to higher-performing students, calling for caution and further empirical study from developers and educators. Introduction