Inter-annotator agreement in spoken language annotation: Applying uα-family coefficients to discourse segmentation-Reference-Cited by-同舟云学术

Inter-annotator agreement in spoken language annotation: Applying uα-family coefficients to discourse segmentation

Published:2021-12-15 Issue:2 Volume:25 Page:478-506
ISSN:2686-8024
Container-title:Russian Journal of Linguistics
language:
Short-container-title:Russian Journal of Linguistics

Author:

Pons Bordería Salvador^ORCID,Pascual Aliaga Elena^ORCID

Abstract

As databases make Corpus Linguistics a common tool for most linguists, corpus annotation becomes an increasingly important process. Corpus users do not need only raw data, but also annotated data, submitted to tagging or parsing processes through annotation protocols. One problem with corpus annotation lies in its reliability, that is, in the probability that its results can be replicable by independent researchers. Inter-annotation agreement (IAA) is the process which evaluates the probability that, applying the same protocol, different annotators reach similar results. To measure agreement, different statistical metrics are used. This study applies IAA for the first time to the Valencia Espaol Coloquial (Val.Es.Co.) discourse segmentation model, designed for segmenting and labelling spoken language into discourse units. Whereas most IAA studies merely label a set of in advance pre-defined units, this study applies IAA to the Val.Es.Co. protocol, which involves a more complex two-fold process: first, the speech continuum needs to be divided into units; second, the units have to be labelled. Kripendorffs u -family statistical metrics (Krippendorff et al. 2016) allow measuring IAA in both segmentation and labelling tasks. Three expert annotators segmented a spontaneous conversation into subacts, the minimal discursive unit of the Val.Es.Co. model, and labelled the resulting units according to a set of 10 subact categories. Kripendorffs u coefficients were applied in several rounds to elucidate whether the inclusion of a bigger number of categories and their distinction had an impact on the agreement results. The conclusions show high levels of IAA, especially in the annotation of procedural subact categories, where results reach coefficients over 0.8. This study validates the Val.Es.Co. model as an optimal method to fully analyze a conversation into pragmatically-based discourse units.

Publisher

Peoples' Friendship University of Russia

Subject

Linguistics and Language,Language and Linguistics

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Variety and functional diversity of modern discourse in cognitive perspective;Russian Journal of Linguistics;2023-12-15

2. Verb database: Structure, clusters and options;Russian Journal of Linguistics;2023-12-15

3. Los corpus sincrónicos del español. Descripción y potencialidades para la investigación teórica y aplicada de la lengua;Journal of Spanish Language Teaching;2022-07-03

4. Computational linguistics and discourse complexology: Paradigms and research methods;Russian Journal of Linguistics;2022-06-29

5. Computational linguistics and discourse complexology: Paradigms and research methods;RUSS J LINGUIST;2022