The future of standardised assessment: Validity and trust in algorithms for assessment and scoring-Reference-Cited by-同舟云学术

The future of standardised assessment: Validity and trust in algorithms for assessment and scoring

Published:2023-01-17 Issue:1 Volume:58 Page:98-110
ISSN:0141-8211
Container-title:European Journal of Education
language:en
Short-container-title:Euro J of Education

Author:

Aloisi Cesare¹

Affiliation:

1. AQA Manchester UK

Abstract

AbstractThis article considers the challenges of using artificial intelligence (AI) and machine learning (ML) to assist high‐stakes standardised assessment. It focuses on the detrimental effect that even state‐of‐the‐art AI and ML systems could have on the validity of national exams of secondary education, and how lower validity would negatively affect trust in the system. To reach this conclusion, three unresolved issues in AI (unreliability, low explainability and bias) are addressed, to show how each of them would compromise the interpretations and uses of exam results (i.e., exam validity). Furthermore, the article relates validity to trust, and specifically to the ABI+ model of trust. Evidence gathered as part of exam validation supports each of the four trust‐enabling components of the ABI+ model (ability, benevolence, integrity and predictability). It is argued, therefore, that the three AI barriers to exam validity limit the extent to which an AI‐assisted exam system could be trusted. The article suggests that addressing the issues of AI unreliability, low explainability and bias should be sufficient to put AI‐assisted exams on par with traditional ones, but might not go as far as fully reassure the public. To achieve this, it is argued that changes to the quality assurance mechanisms of the exam system will be required. This may involve, for example, integrating principled AI frameworks in assessment policy and regulation.

Publisher

Wiley

Subject

Education

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/ejed.12542

Reference72 articles.

1. Persistent Anti-Muslim Bias in Large Language Models

2. Improving the Reliability of Deep Neural Networks in NLP: A Review

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Understanding validity criteria in technology-enhanced learning: A systematic literature review;Computers & Education;2024-10

2. Measuring scientific inquiry ability related to hands-on practice: An automated approach based on multimodal data analysis;Education and Information Technologies;2024-08-28

3. Introducing technologies into national large-scale testing: Are we ready?;Education Policy Analysis Archives;2024-04-02

4. On the Dynamic Generation of Items Within an Assessment Test Using Genetic Algorithms;Lecture Notes in Computer Science;2024

5. Enhancing English Proficiency Test Evaluation: Leveraging Artificial Intelligence for Result Classification;2023 10th International Conference on Soft Computing & Machine Intelligence (ISCMI);2023-11-25