Affiliation:
1. SOAS University of London https://dx.doi.org/4913 London UK
Abstract
Abstract
Recent methods have been proposed to produce automatic rhyme annotators for large rhymed corpora. These methods, such as Baley (2022b) greatly reduce the cost of annotating rhymed material, allowing historical linguists to focus on the analysis of the rhyme patterns. However, evidence for the quality of those annotations has been anecdotal, consisting of a handful of individual poem case studies. This paper proposes to address the issue: first, we discuss previously proposed metrics that evaluate the quality of an annotator’s output against a ground-truth annotation (List, Hill, and Foster; 2019) and we propose an alternative metric that is better suited to the task. Then, sampling from Baley’s published annotated corpus and re-annotating it by hand, we use the sample to demonstrate the lacunae in the original approach and show how to fix them. Finally, the hand-annotated sample and source code are published as additional data, so that other researchers can compare the performance of their own annotators.
Subject
Linguistics and Language,Language and Linguistics
Reference19 articles.
1. Amigó, Enrique, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. ‘A Comparison of Extrinsic Clustering Evaluation Metrics Based on Formal Constraints’. Information Retrieval 12 (4): 461–486. https://doi.org/10.1007/s10791-008-9066-8.
2. Bagga, Amit, and Breck Baldwin. 1998. ‘Entity-Based Cross-Document Coreferencing Using the Vector’. In ACL ’98/COLING ’98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, volume 1:79–85. https://doi.org/10.3115/980845.980859.
3. Baley, Julien. 2022a. ‘Automatically Annotated Quan Tang Shi and Quand Song Shi’. Zenodo. https://doi.org/10.5281/zenodo.7138623.
4. Baley, Julien. 2022b. ‘Leveraging Graph Algorithms to Speed up the Annotation of Large Rhymed Corpora’. Cahiers de Linguistique Asie Orientale 51 (1): 46–80. https://doi.org/10.1163/19606028-bja10019.
5. Baley, Julien. 2022c. ‘Hand-Annotated Sample of Tang and Song Poems for Rhyme Judgement Evaluation’. Zenodo. https://doi.org/10.5281/zenodo.7139353.