A multi-level methodology for the automated translation of a coreference resolution dataset: an application to the Italian language-Reference-Cited by-同舟云学术

A multi-level methodology for the automated translation of a coreference resolution dataset: an application to the Italian language

Published:2022-09-19 Issue:24 Volume:34 Page:22493-22518
ISSN:0941-0643
Container-title:Neural Computing and Applications
language:en
Short-container-title:Neural Comput & Applic

Author:

Minutolo Aniello,Guarasci Raffaele^ORCID,Damiano Emanuele,De Pietro Giuseppe,Fujita Hamido,Esposito Massimo

Abstract

AbstractIn the last decade, the demand for readily accessible corpora has touched all areas of natural language processing, including coreference resolution. However, it is one of the least considered sub-fields in recent developments. Moreover, almost all existing resources are only available for the English language. To overcome this lack, this work proposes a methodology to create a corpus for coreference resolution in Italian exploiting knowledge of annotated resources in other languages. Starting from OntonNotes, the methodology translates and refines English utterances to obtain utterances respecting Italian grammar, dealing with language-specific phenomena and preserving coreference and mentions. A quantitative and qualitative evaluation is performed to assess the well-formedness of generated utterances, considering readability, grammaticality, and acceptability indexes. The results have confirmed the effectiveness of the methodology in generating a good dataset for coreference resolution starting from an existing one. The goodness of the dataset is also assessed by training a coreference resolution model based on BERT language model, achieving the promising results. Even if the methodology has been tailored for English and Italian languages, it has a general basis easily extendable to other languages, adapting a small number of language-dependent rules to generalize most of the linguistic phenomena of the language under examination.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Software

Link

https://link.springer.com/content/pdf/10.1007/s00521-022-07641-3.pdf

Reference74 articles.

1. Sukthanker R, Poria S, Cambria E, Thirunavukarasu R (2020) Anaphora and coreference resolution: a review. Inform Fusion 59:139–162

2. Antunes J, Lins RD, Lima R, Oliveira H, Riss M, Simske SJ (2018) Automatic cohesive summarization with pronominal anaphora resolution. Comput Speech Lang 52:141–164

3. Sikdar UK, Ekbal A, Saha S (2016) A generalized framework for anaphora resolution in Indian languages. Knowl Based Syst 109:147–159

4. Blackwell SE (2001) Testing the Neo-Gricean pragmatic theory of anaphora: the influence of consistency constraints on interpretations of coreference in Spanish. J Pragmat 33(6):901–941

5. Lee C, Jung S, Park C-E (2017) Anaphora resolution with pointer networks. Pattern Recogn Lett 95:1–7

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Analysis and Development of a New Method for Defining Path Reliability in WebGIS Based on Fuzzy Logic and Dispersion Indices;Lecture Notes on Data Engineering and Communications Technologies;2024

2. Probing Cross-lingual Transfer of XLM Multi-language Model;Lecture Notes on Data Engineering and Communications Technologies;2024

3. Towards the Automated Population of Thesauri Using BERT: A Use Case on the Cybersecurity Domain;Lecture Notes on Data Engineering and Communications Technologies;2024

4. Data generalization processing and fusion machine translation system based on virtual reality technology;Second International Conference on Electronic Information Technology (EIT 2023);2023-08-15

5. Narrowing the language gap: domain adaptation guided cross-lingual passage re-ranking;Neural Computing and Applications;2023-07-25