Toward an Effective Igbo Part-of-Speech Tagger-Reference-Cited by-同舟云学术

Toward an Effective Igbo Part-of-Speech Tagger

Published:2019-12-31 Issue:4 Volume:18 Page:1-26
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Onyenwe Ikechukwu E.¹^ORCID,Hepple Mark¹,Chinedu Uchechukwu²,Ezeani Ignatius¹

Affiliation:

1. University of Sheffield, South Yorkshire, UK

2. Nnamdi Azikiwe University, Awka, Anambra, Nigeria

Abstract

Part-of-speech (POS) tagging is a well-established technology for most Western European languages and a few other world languages, but it has not been evaluated on Igbo, an agglutinative African language. This article presents POS tagging experiments conducted using an Igbo corpus as a test bed for identifying the POS taggers and the Machine Learning (ML) methods that can achieve a good performance with the small dataset available for the language. Experiments have been conducted using different well-known POS taggers developed for English or European languages, and different training data styles and sizes. Igbo has a number of language-specific characteristics that present a challenge for effective POS tagging. One interesting case is the wide use of verbs (and nominalizations thereof) that have an inherent noun complement , which form “linked pairs” in the POS tagging scheme, but which may appear discontinuously. Another issue is Igbo’s highly productive agglutinative morphology, which can produce many variant word forms from a given root. This productivity is a key cause of the out-of-vocabulary (OOV) words observed during Igbo tagging. We report results of experiments on a promising direction for improving tagging performance on such morphologically-inflected OOV words.

Funder

Tertiary Educational Trust Fund (TETFund) Nigeria

Nnamdi Azikiwe University

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3314942

Reference28 articles.

1. ASR and Translation for Under-Resourced Languages

2. TnT

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A State-of-the-Art Review of Nigerian Languages Natural Language Processing Research;Research Anthology on Applied Linguistics and Language Practices;2022-04-01

2. Parts of Speech Tagging: A Setswana Relative;Journal of Physics: Conference Series;2022-02-01

3. A State-of-the-Art Review of Nigerian Languages Natural Language Processing Research;Advances in IT Standards and Standardization Research;2021