Urdu Named Entity Recognition System Using Deep Learning Approaches

Author:

Haq Rafiul1,Zhang Xiaowang12,Khan Wahab3,Feng Zhiyong1

Affiliation:

1. College of Intelligence and Computing, Tianjin University, Tianjin, 300350 China

2. Tianjin University-Aishu Data Intelligence Joint Laboratory, Tianjin, China

3. Department of Computer Science University of Science and technology, Bannu, PK 28100

Abstract

Abstract Named entity recognition (NER) is a fundamental part of other natural language processing tasks such as information retrieval, question answering systems and machine translation. Progress and success have already been achieved in research on the English NER systems. However, the Urdu NER system is still in its infancy due to the complexity and morphological richness of the Urdu language. Existing Urdu NER systems are highly dependent on manual feature engineering and word embedding to capture similarity. Their performance lags if the words are previously unknown or infrequent. The feature-based models suffer from complicated feature engineering and are often highly reliant on external resources. To overcome these limitations in this study, we present several deep neural approaches that automatically learn features from the data and eliminate manual feature engineering. Our extension involved convolutional neural network to extract character-level features and combine them with word embedding to handle out-of-vocabulary words. The study also presents a tweets dataset in Urdu, annotated manually for five named entity classes. The effectiveness of the deep learning approaches is demonstrated on four benchmarks datasets. The proposed method demonstrates notable progress upon current state-of-the-art NER approaches in Urdu. The results show an improvement of 6.26% in the F1 score.

Funder

National Natural Science Foundation of China

Peiyang Young Scholars in Tianjin University

Publisher

Oxford University Press (OUP)

Subject

General Computer Science

Reference46 articles.

1. A survey on deep learning for named entity recognition;Li;IEEE Transactions on Knowledge and Data Engineering.,2022

2. A survey on recent advances in named entity recognition from deep learning models;Yadav;CoRR.,2019

3. ASTRAL: Adversarial trained LSTM-CNN for named entity recognition;Wang;Knowledge-Based Systems.,2020

4. A survey on sentiment analysis in Urdu: A resource-poor language;Khattak;Egyptian Informatics Journal.,2021

5. A review of Urdu sentiment analysis with multilingual perspective: A case of Urdu and roman Urdu language;Khan;Comput. Secur.,2022

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Roman Urdu Corpus for sentiment analysis;The Computer Journal;2024-06-18

2. Enriching Urdu NER with BERT Embedding, Data Augmentation, and Hybrid Encoder-CNN Architecture;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-04-15

3. A deep learning approach for Named Entity Recognition in Urdu language;PLOS ONE;2024-03-28

4. Comparative Analysis of RNN, LSTM, Bi-LSTM Performance for Location and Time Entity Recognition in Forest Fire Texts;2024 2nd International Conference on Software Engineering and Information Technology (ICoSEIT);2024-02-28

5. A deep learning approaches in text-to-speech system: a systematic review and recent research perspective;Multimedia Tools and Applications;2022-09-29

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3