Practical and ethical challenges of large language models in education: A systematic scoping review

Author:

Yan Lixiang1ORCID,Sha Lele1,Zhao Linxuan1ORCID,Li Yuheng1ORCID,Martinez‐Maldonado Roberto1,Chen Guanliang1ORCID,Li Xinyu1,Jin Yueqiao1,Gašević Dragan1

Affiliation:

1. Centre for Learning Analytics at Monash, Faculty of Information Technology Monash University Clayton Victoria Australia

Abstract

AbstractEducational technology innovations leveraging large language models (LLMs) have shown the potential to automate the laborious process of generating and analysing textual content. While various innovations have been developed to automate a range of educational tasks (eg, question generation, feedback provision, and essay grading), there are concerns regarding the practicality and ethicality of these innovations. Such concerns may hinder future research and the adoption of LLMs‐based innovations in authentic educational contexts. To address this, we conducted a systematic scoping review of 118 peer‐reviewed papers published since 2017 to pinpoint the current state of research on using LLMs to automate and support educational tasks. The findings revealed 53 use cases for LLMs in automating education tasks, categorised into nine main categories: profiling/labelling, detection, grading, teaching support, prediction, knowledge representation, feedback, content generation, and recommendation. Additionally, we also identified several practical and ethical challenges, including low technological readiness, lack of replicability and transparency and insufficient privacy and beneficence considerations. The findings were summarised into three recommendations for future studies, including updating existing innovations with state‐of‐the‐art models (eg, GPT‐3/4), embracing the initiative of open‐sourcing models/systems, and adopting a human‐centred approach throughout the developmental process. As the intersection of AI and education is continuously evolving, the findings of this study can serve as an essential reference point for researchers, allowing them to leverage the strengths, learn from the limitations, and uncover potential research opportunities enabled by ChatGPT and other generative AI models. Practitioner notesWhat is currently known about this topic Generating and analysing text‐based content are time‐consuming and laborious tasks. Large language models are capable of efficiently analysing an unprecedented amount of textual content and completing complex natural language processing and generation tasks. Large language models have been increasingly used to develop educational technologies that aim to automate the generation and analysis of textual content, such as automated question generation and essay scoring. What this paper adds A comprehensive list of different educational tasks that could potentially benefit from LLMs‐based innovations through automation. A structured assessment of the practicality and ethicality of existing LLMs‐based innovations from seven important aspects using established frameworks. Three recommendations that could potentially support future studies to develop LLMs‐based innovations that are practical and ethical to implement in authentic educational contexts. Implications for practice and/or policy Updating existing innovations with state‐of‐the‐art models may further reduce the amount of manual effort required for adapting existing models to different educational tasks. The reporting standards of empirical research that aims to develop educational technologies using large language models need to be improved. Adopting a human‐centred approach throughout the developmental process could contribute to resolving the practical and ethical challenges of large language models in education.

Funder

Australian Research Council

Jacobs Foundation

Publisher

Wiley

Subject

Education

Reference89 articles.

1. Artificial Intelligence Ethics Guidelines for K-12 Education: A Review of the Global Landscape

2. On the Application of Sentence Transformers to Automatic Short Answer Grading in Blended Assessment

3. Improved Automated Classification of Sentences in Data Science Exercises

4. Bang Y. Cahyawijaya S. Lee N. Dai W. Su D. Wilie B. Lovenia H. Ji Z. Yu T. Chung W. Do Q. V. Xu Y. &Fung P.(2023).A multitask multilingual multimodal evaluation of chatGPT on reasoning hallucination and interactivity.arXiv preprint arXiv:2302.04023.

5. Findings from the Teaching, Learning, and Computing Survey

Cited by 78 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3