Annotation Curricula to Implicitly Train Non-Expert Annotators

Author:

Lee Ji-Ung1,Klie Jan-Christoph2,Gurevych Iryna3

Affiliation:

1. UKP Lab / TU Darmstadt. lee@ukp.informatik.tu-darmstadt.de

2. UKP Lab / TU Darmstadt. klie@ukp.informatik.tu-darmstadt.de

3. UKP Lab / TU Darmstadt. gurevych@ukp.informatik.tu-darmstadt.de

Abstract

Abstract Annotation studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain. This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations; especially in citizen science or crowdsourcing scenarios where domain expertise is not required. To alleviate these issues, this work proposes annotation curricula, a novel approach to implicitly train annotators. The goal is to gradually introduce annotators into the task by ordering instances to be annotated according to a learning curriculum. To do so, this work formalizes annotation curricula for sentence- and paragraph-level annotation tasks, defines an ordering strategy, and identifies well-performing heuristics and interactively trained models on three existing English datasets. Finally, we provide a proof of concept for annotation curricula in a carefully designed user study with 40 voluntary participants who are asked to identify the most fitting misconception for English tweets about the Covid-19 pandemic. The results indicate that using a simple heuristic to order instances can already significantly reduce the total annotation time while preserving a high annotation quality. Annotation curricula thus can be a promising research direction to improve data collection. To facilitate future research—for instance, to adapt annotation curricula to specific tasks and expert annotation scenarios—all code and data from the user study consisting of 2,400 annotations is made available.1

Publisher

MIT Press

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Reference85 articles.

1. Automatic gap-fill question generation from text books;Agarwal,2011

2. Deep batch active learning by diverse, uncertain gradient lower bounds;Ash,2020

3. Investigating label suggestions for opinion mining in German Covid-19 social media;Beck,2021

4. Difficult cases: From data to learning, and back;KlebanovBeata,2014

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. FARPLS: A Feature-Augmented Robot Trajectory Preference Labeling System to Assist Human Labelers’ Preference Elicitation;Proceedings of the 29th International Conference on Intelligent User Interfaces;2024-03-18

2. Annotation Protocol for Textbook Enrichment with Prerequisite Knowledge Graph;Technology, Knowledge and Learning;2023-09-21

3. Efficient Methods for Natural Language Processing: A Survey;Transactions of the Association for Computational Linguistics;2023

4. Erratum: Annotation Curricula to Implicitly Train Non-Expert Annotators;Computational Linguistics;2022

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3