NERA 2.0: Improving coverage and performance of rule-based named entity recognition for Arabic-Reference-Cited by-同舟云学术

NERA 2.0: Improving coverage and performance of rule-based named entity recognition for Arabic

Published:2016-05-06 Issue:3 Volume:23 Page:441-472
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

OUDAH MAI^ORCID,SHAALAN KHALED^ORCID

Abstract

AbstractNamed Entity Recognition (NER) is an essential task for many natural language processing systems, which makes use of various linguistic resources. NER becomes more complicated when the language in use is morphologically rich and structurally complex, such as Arabic. This language has a set of characteristics that makes it particularly challenging to handle. In a previous work, we have proposed an Arabic NER system that follows the hybrid approach, i.e. integrates both rule-based and machine learning-based NER approaches. Our hybrid NER system is the state-of-the-art in Arabic NER according to its performance on standard evaluation datasets. In this article, we discuss a novel methodology for overcoming the coverage drawback of rule-based NER systems in order to improve their performance and allow for automated rule update. The presented mechanism utilizes the recognition decisions made by the hybrid NER system in order to identify the weaknesses of the rule-based component and derive new linguistic rules aiming at enhancing the rule base, which will help in achieving more reliable and accurate results. We used ACE 2004 Newswire standard dataset as a resource for extracting and analyzing new linguistic rules for person, location and organization names recognition. We formulate each new rule based on two distinctive feature groups, i.e. Gazetteers of each type of named entities and Part-of-Speech tags, in particular noun and proper noun. Fourteen new patterns are derived, formulated as grammar rules, and evaluated in terms of coverage. The conducted experiments exploit a POS tagged version of the ACE 2004 NW dataset. The empirical results show that the performance of the enhanced rule-based system, i.e. NERA 2.0, improves the coverage of the previously misclassified person, location and organization named entities types by 69.93 per cent, 57.09 per cent and 54.28 per cent, respectively.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference59 articles.

1. Abouenour L. , Bouzoubaa K. and Rosso P. 2012. IDRAAQ: new arabic question answering system based on query expansion and passage retrieval. CLEF (Online Working Notes/Labs/Workshop).

2. ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH

3. Arabic Information Retrieval

4. Integrating Rule-Based System with Classification for Arabic Named Entity Recognition

5. Alias I. 2008. ‘LingPipe 4.1.0’. http://alias-i.com/lingpipe (accessed October 2012).

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Named Entity Recognition of Tunisian Arabic Using the Bi-LSTM-CRF Model;International Journal on Artificial Intelligence Tools;2023-09-28

2. Chinese Named Entity Recognition in Football Based on ALBERT-BiLSTM Model;Applied Sciences;2023-09-28

3. Evaluation on Network Social Media Named Entity Recognition Model Based on Active Learning;ACM Transactions on Asian and Low-Resource Language Information Processing;2023-09-05

4. Comparing Open Arabic Named Entity Recognition Tools;2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI);2023-08

5. Challenges and Solutions for Arabic Natural Language Processing in Social Media;Business Intelligence and Information Technology;2023