Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling?-Reference-Cited by-同舟云学术

Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling?

Published:2024-03-27 Issue:7 Volume:113 Page:4285-4314
ISSN:0885-6125
Container-title:Machine Learning
language:en
Short-container-title:Mach Learn

Author:

Tran Hanh Thi Hong,Martinc Matej,Repar Andraz,Ljubešić Nikola,Doucet Antoine,Pollak Senja^ORCID

Abstract

AbstractAutomatic term extraction (ATE) is a natural language processing task that eases the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms. In this paper, we treat ATE as a sequence-labeling task and explore the efficacy of XLMR in evaluating cross-lingual and multilingual learning against monolingual learning in the cross-domain ATE context. Additionally, we introduce NOBI, a novel annotation mechanism enabling the labeling of single-word nested terms. Our experiments are conducted on the ACTER corpus, encompassing four domains and three languages (English, French, and Dutch), as well as the RSDO5 Slovenian corpus, encompassing four additional domains. Results indicate that cross-lingual and multilingual models outperform monolingual settings, showcasing improved F1-scores for all languages within the ACTER dataset. When incorporating an additional Slovenian corpus into the training set, the multilingual model exhibits superior performance compared to state-of-the-art approaches in specific scenarios. Moreover, the newly introduced NOBI labeling mechanism enhances the classifier’s capacity to extract short nested terms significantly, leading to substantial improvements in Recall for the ACTER dataset and consequentially boosting the overall F1-score performance.

Funder

Javna Agencija za Raziskovalno Dejavnost RS

Republic of Slovenia and the European Union

Région Nouvelle Aquitaine

Campus France

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10994-023-06506-7.pdf

Reference43 articles.

1. Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., & Vollgraf, R. (2019). Flair: An easy-to-use framework for state-of-the-art nlp. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations) (pp. 54–59).

2. Amjadian, E., Inkpen, D., Paribakht, T., & Faez, F. (2016). Local-Global Vectors to Improve Unigram Terminology Extraction. In Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016) (pp. 2–11).

3. Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. In ACL.

4. Daille, B., Gaussier, É., & Langé, J. M. (1994). Towards Automatic Extraction of Monolingual and Bilingual Terminology. In COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics.

5. Damerau, F. J. (1990). Evaluating computer-generated domain-oriented vocabularies. Information Processing and Management, 26(6), 791–801.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. CoastTerm: A Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature;Lecture Notes in Computer Science;2024

2. Is Prompting What Term Extraction Needs?;Lecture Notes in Computer Science;2024