Is AI Ground Truth Really True? The Dangers of Training and Evaluating AI Tools Based on Experts’ Know-What-Reference-Cited by-同舟云学术

Is AI Ground Truth Really True? The Dangers of Training and Evaluating AI Tools Based on Experts’ Know-What

Published:2021-09-01 Issue:3 Volume:45 Page:1501-1526
ISSN:0276-7783
Container-title:MIS Quarterly
language:
Short-container-title:MISQ

Author:

Lebovitz Sarah, ,Levina Natalia,Lifshitz-Assa Hila^ORCID, ,

Abstract

Organizational decision-makers need to evaluate AI tools in light of increasing claims that such tools out-perform human experts. Yet, measuring the quality of knowledge work is challenging, raising the question of how to evaluate AI performance in such contexts. We investigate this question through a field study of a major U.S. hospital, observing how managers evaluated five different machine-learning (ML) based AI tools. Each tool reported high performance according to standard AI accuracy measures, which were based on ground truth labels provided by qualified experts. Trying these tools out in practice, however, revealed that none of them met expectations. Searching for explanations, managers began confronting the high uncertainty of experts’ know-what knowledge captured in ground truth labels used to train and validate ML models. In practice, experts address this uncertainty by drawing on rich know-how practices, which were not incorporated into these ML-based tools. Discovering the disconnect between AI’s know-what and experts’ know-how enabled managers to better understand the risks and benefits of each tool. This study shows dangers of treating ground truth labels used in ML models objectively when the underlying knowledge is uncertain. We outline implications of our study for developing, training, and evaluating AI for knowledge work.

Publisher

MIS Quarterly

Subject

Information Systems and Management,Computer Science Applications,Information Systems,Management Information Systems

Cited by 113 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Digital Healthcare;Legal Medicine;2025

2. Fusing domain knowledge with machine learning: A public sector perspective;The Journal of Strategic Information Systems;2024-09

3. Artificial intelligence-based virtual assistant and employee engagement: an empirical investigation;Personnel Review;2024-08-16

4. Impact of Gold-Standard Label Errors on Evaluating Performance of Deep Learning Models in Diabetic Retinopathy Screening: Nationwide Real-World Validation Study;Journal of Medical Internet Research;2024-08-14

5. AI and mental health: evaluating supervised machine learning models trained on diagnostic classifications;AI & SOCIETY;2024-08-02