Açık Uçlu Maddelerin Puanlanmasında ChatGPT ve Gerçek Puanlayıcıların Puanlayıcılar Arası Güvenirlik Bakımından İncelenmesi

Author:

DEMİR Seda1ORCID

Affiliation:

1. TOKAT GAZİOSMANPAŞA ÜNİVERSİTESİ

Abstract

The aim of this study is to examine the inter-rater reliability of the responses to open-ended items scored by ChatGPT, an artificial intelligence-based tool, and two real raters according to the scoring keys. The study group consists of 30 students, aged between 13 and 15, studying in Eskişehir province in the 2022-2023 academic year. The data of the study were collected face-to-face with the help of 16 open-ended items selected from the sample questions published in the International Student Assessment Program-PISA Reading Skills. Correlation, percentage of agreement and the Generalizability theory were used to determine inter-rater reliability. SPSS 25 was used for correlation analysis, Excel for percentage of agreement analysis, and EduG 6.1 for the Generalizability theory analysis. The results of the study showed that there was a positive and high level of correlation between the raters, the raters showed a high level of agreement, and the reliability (G) coefficients calculated using the Generalizability theory were lower than the correlation values and percentage of agreement. In addition, it was determined that all raters showed excellent positive correlation and full agreement with each other in the scoring of the answers given to the short-answer items whose answers were directly in the text. In addition, according to the results of the Generalizability theory, it was found out that the items (i) explained the total variance the most among the main effects and the student-item interaction (sxi) explained the most among the interaction effects. As a result, it can be suggested to educators to get support from artificial intelligence-based tools such as ChatGPT when scoring open-ended items that take a long time to score, especially in crowded classes or when time is limited.

Publisher

Gaziosmanpasa University

Subject

General Medicine

Reference44 articles.

1. Aiken, L. R. (2000). Psychological testing and assessment. Allyn and Bacon.

2. Aktay, S., Seçkin, G. Ö. K., & Uzunoğlu, D. (2023). ChatGPT in education. TAY Journal, 7(2), 378-406. https://doi.org/10.29329/tayjournal.2023.543.03

3. Atılgan, H. (2005). Generalizability theory and a sample application for inter-rater reliability. Educational Sciences and Practice, 4(7), 95-108. http://www.ebuline.com/pdfs/7Sayi/7_6.pdf

4. Atılgan, H., Kan, A., & Doğan, N. (2011). Eğitimde ölçme ve değerlendirme [Measurement and evaluation in education]. (5th ed.) Anı Yayıncılık.

5. Baykul, Y. (2000) Eğitimde ve psikolojide ölçme: Klasik Test Teorisi ve uygulaması [Measurement in education and psychology: Classical Test Theory and its application]. ÖSYM Yayınları.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3