Abstract
AbstractThe quality of Machine Learning (ML) applications is commonly assessed by quantifying how well an algorithm fits its respective training data. Yet, a perfect model that learns from and reproduces erroneous data will always be flawed in its real-world application. Hence, a comprehensive assessment of ML quality must include an additional data perspective, especially for models trained on human-annotated data. For the collection of human-annotated training data, best practices often do not exist and leave researchers to make arbitrary decisions when collecting annotations. Decisions about the selection of annotators or label options may affect training data quality and model performance.In this paper, I will outline and summarize previous research and approaches to the collection of annotated training data. I look at data annotation and its quality confounders from two perspectives: the set of annotators and the strategy of data collection. The paper will highlight the various implementations of text and image annotation collection and stress the importance of careful task construction. I conclude by illustrating the consequences for future research and applications of data annotation. The paper is intended give readers a starting point on annotated data quality research and stress the necessity of thoughtful consideration of the annotation collection process to researchers and practitioners.
Funder
Ludwig-Maximilians-Universität München
Publisher
Springer Science and Business Media LLC
Subject
General Economics, Econometrics and Finance,General Social Sciences,Statistics and Probability
Reference80 articles.
1. Al Kuwatly H, Wich M, Groh G (2020) Identifying and measuring annotator bias based on annotators’ demographic characteristics. In: Association for Computational Linguistics (ed) Proceedings of the fourth workshop on online abuse and harms, pp 184–190
2. Antin J, Shaw A (2012) Social desirability bias and self-reports of motivation: a study of amazon mechanical turk in the US and India. In: Proceedings of the SIGCHI Conference on human factors in computing systems, pp 2925–2934
3. Arhin K, Baldini I, Wei D et al (2021) Ground-truth, whose truth?—examining the challenges with annotating toxic text datasets
4. Beatty PC, Willis GB (2007) Research synthesis: the practice of cognitive interviewing. Public Opin Q 71:287–311. https://doi.org/10.1093/poq/nfm006
5. Beck J, Eckman S, Chew R, Kreuter F (2022) Improving labeling through social science insights: results and research agenda. In: Chen JYC, Fragomeni G, Degen H, Ntoa S (eds) HCI international 2022—late breaking papers: interacting with eXtended reality and artificial intelligence. Springer Nature Switzerland, Cham, pp 245–261
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Editorial issue 3 + 4, 2023;AStA Wirtschafts- und Sozialstatistisches Archiv;2023-12