Transfer Learning for Low-Resource Multilingual Relation Classification-Reference-Cited by-同舟云学术

Transfer Learning for Low-Resource Multilingual Relation Classification

Published:2023-02-28 Issue:2 Volume:22 Page:1-24
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Nag Arijit¹^ORCID,Samanta Bidisha¹^ORCID,Mukherjee Animesh¹^ORCID,Ganguly Niloy¹^ORCID,Chakrabarti Soumen¹^ORCID

Affiliation:

1. Indian Institute of Technology, Kharagpur, India

Abstract

Relation classification (sometimes called relation extraction ) requires trustworthy datasets for fine-tuning large language models, as well as for evaluation. Data collection is challenging for Indian languages, because they are syntactically and morphologically diverse, as well as different from resource-rich languages like English. Despite recent interest in deep generative models for Indian languages, relation classification is still not well served by public datasets. In response, we present IndoRE , a dataset with 21K entity- and relation-tagged gold sentences in three Indian languages (Bengali, Hindi, and Telugu), plus English. We start with a multilingual BERT (mBERT)-based system that captures entity span positions and type information, and provides competitive performance on monolingual relation classification. Using this baseline system, we explore transfer mechanisms between languages and the scope to reduce expensive data annotation while achieving reasonable relation extraction performance. Specifically, we

(a) study the accuracy-efficiency trade-off between expensive, manually labeled gold instances vs. automatically translated and aligned silver instances to train a relation extractor,

(b) device a simple mechanism for budgeted gold data annotation by intelligently converting distant-supervised silver training instances to gold training instances with human annotators using active learning, and finally

We release the dataset for future research. 1

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3554734

Reference97 articles.

1. Queries and concept learning;Angluin Dana;Machine Learning,1988

2. Deep batch active learning by diverse, uncertain gradient lower bounds;Ash Jordan T.;arXiv preprint arXiv:1906.03671.,2019

3. Agnostic active learning;Balcan Maria-Florina;Journal of Computer and System Sciences,2009

4. Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski. 2019. Matching the blanks: Distributional similarity for relation learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2895–2905. 10.18653/v1/P19-1279

5. Anson Bastos Abhishek Nadgeri Kuldeep Singh Isaiah Onando Mulang Saeedeh Shekarpour Johannes Hoffart and Manohar Kaul. 2020. RECON: Relation extraction using knowledge graph context in a graph neural network. In Proceedings of the Web Conference 2021 (WWW’21) . 10.48550/ARXIV.2009.08694

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Offloading the computational complexity of transfer learning with generic features;PeerJ Computer Science;2024-03-25

2. User-aware multilingual abusive content detection in social media;Information Processing & Management;2023-09