A deep learning approach for Named Entity Recognition in Urdu language-Reference-Cited by-同舟云学术

A deep learning approach for Named Entity Recognition in Urdu language

Published:2024-03-28 Issue:3 Volume:19 Page:e0300725
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Anam Rimsha,Anwar Muhammad Waqas,Jamal Muhammad Hasan^ORCID,Bajwa Usama Ijaz,Diez Isabel de la Torre^ORCID,Alvarado Eduardo Silva,Flores Emmanuel Soriano,Ashraf Imran^ORCID

Abstract

Named Entity Recognition (NER) is a natural language processing task that has been widely explored for different languages in the recent decade but is still an under-researched area for the Urdu language due to its rich morphology and language complexities. Existing state-of-the-art studies on Urdu NER use various deep-learning approaches through automatic feature selection using word embeddings. This paper presents a deep learning approach for Urdu NER that harnesses FastText and Floret word embeddings to capture the contextual information of words by considering the surrounding context of words for improved feature extraction. The pre-trained FastText and Floret word embeddings are publicly available for Urdu language which are utilized to generate feature vectors of four benchmark Urdu language datasets. These features are then used as input to train various combinations of Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), CRF, and deep learning models. The results show that our proposed approach significantly outperforms existing state-of-the-art studies on Urdu NER, achieving an F-score of up to 0.98 when using BiLSTM+GRU with Floret embeddings. Error analysis shows a low classification error rate ranging from 1.24% to 3.63% across various datasets showing the robustness of the proposed approach. The performance comparison shows that the proposed approach significantly outperforms similar existing studies.

Funder

the European University of Atlantic

Publisher

Public Library of Science (PLoS)

Reference65 articles.

1. Sharma A, Chakraborty S, Kumar S, et al. Named Entity Recognition in Natural Language Processing: A Systematic Review. In: Proceedings of Second Doctoral Symposium on Computational Intelligence. Springer; 2022. p. 817–828.

2. Core techniques of question answering systems over knowledge bases: a survey;D Diefenbach;Knowledge and Information systems,2018

3. Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension;A Rogers;ACM Computing Surveys,2023

4. Lewis P, Oğuz B, Rinott R, Riedel S, Schwenk H. MLQA: Evaluating cross-lingual extractive question answering. arXiv preprint arXiv:191007475. 2019;.

5. A survey of query auto completion in information retrieval;F Cai;Foundations and Trends in Information Retrieval,2016