A method of named entity recognition for Tigrinya-Reference-Cited by-同舟云学术

A method of named entity recognition for Tigrinya

Published:2022-09 Issue:3 Volume:22 Page:56-68
ISSN:1559-6915
Container-title:ACM SIGAPP Applied Computing Review
language:en
Short-container-title:SIGAPP Appl. Comput. Rev.

Author:

Yohannes Hailemariam Mehari¹,Amagasa Toshiyuki¹

Affiliation:

1. University of Tsukuba, Tsukuba, Japan

Abstract

This paper proposes a method for Named-Entity Recognition (NER) for a low-resource language, Tigrinya, using a pre-trained language model. Tigrinya is a morphologically rich, although one of the underrepresented in the field of NLP. This is mainly due to the limited amount of annotated data available. To address this problem, we present the first publicly available datasets of NER for Tigrinya containing two versions, namely, (V1 and V2) annotated manually. The V1 and V2 datasets contain 69,309 and 40,627 tokens, respectively, where the annotations are based on the CoNLL 2003 Beginning, Inside, and Outside (BIO) tagging schema. Specifically, we develop a new pre-trained language model for Tigrinya based on RoBERTa, which we refer to as TigRoBERTa. Our model is then fine-tuned on down-stream tasks on a more specific target NER and POS tasks with limited data. Finally, we further enhance the model performance by applying semi-supervised self-training using unlabeled data. The experimental results show that the method achieved 84% F1-score for NER and 92% accuracy for POS tagging, which is better than or comparable to the baseline method based on the CNN-BiLSTM-CRF.

Publisher

Association for Computing Machinery (ACM)

Subject

Industrial and Manufacturing Engineering

Link

https://dl.acm.org/doi/pdf/10.1145/3570733.3570737

Reference41 articles.

1. MasakhaNER: Named Entity Recognition for African Languages

2. Arabic Named Entity Recognition: A BERT-BGRU Approach

3. Learning long-term dependencies with gradient descent is difficult

4. Enriching Word Vectors with Subword Information

5. Named Entity Recognition with Bidirectional LSTM-CNNs

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimizing Named Entity Recognition for Improving Logical Formulae Abstraction from Technical Requirements Documents;2023 10th International Conference on Dependable Systems and Their Applications (DSA);2023-08-10

2. Long Text Classification Using Pre-trained Language Model for a Low-Resource Language;2023 6th International Conference on Information and Computer Technologies (ICICT);2023-03

3. Self-Attention-based Data Augmentation Method for Text Classification;Proceedings of the 2023 15th International Conference on Machine Learning and Computing;2023-02-17