Urdu Named Entity Recognition-Reference-Cited by-同舟云学术

Urdu Named Entity Recognition

Published:2020-01-31 Issue:1 Volume:19 Page:1-13
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Kanwal Safia¹,Malik Kamran¹^ORCID,Shahzad Khurram¹^ORCID,Aslam Faisal¹,Nawaz Zubair¹

Affiliation:

1. Punjab University College of Information Technology, Lahore, Pakistan

Abstract

Named Entity Recognition (NER) plays a pivotal role in various natural language processing tasks, such as machine translation and automatic question-answering systems. Recognizing the importance of NER, a plethora of NER techniques for Western and Asian languages have been developed. However, despite having over 490 million Urdu language speakers worldwide, NER resources for Urdu are either non-existent or inadequate. To fill this gap, this article makes four key contributions. First, we have developed the largest Urdu NER corpus, which contains 926,776 tokens and 99,718 carefully annotated NEs. The developed corpus has at least doubled the number of manually tagged NEs as compared to any of the existing Urdu NER corpora. Second, we have generated six new word embeddings using three different techniques, fastText, Word2vec, and Glove, on two corpora of Urdu text. These are the only publicly available embeddings for the Urdu language, besides the recently released Urdu word embeddings by Facebook. Third, we have pioneered in the application of deep learning techniques, NN and RNN, for Urdu named entity recognition. Finally, we have performed 10-folds of 32 different experiments using the combinations of a traditional supervised learning and deep learning techniques, seven types of word embeddings, and two different Urdu NER datasets. Based on the analysis of the results, several valuable insights are provided about the effectiveness of deep learning techniques, the impact of word embeddings, and variations of datasets.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3329710

Reference36 articles.

1. Nita Patil Ajay S. Patil and B. V. Pawar. 2016. Survey of named entity recognition systems with respect to indian and foreign languages. International Journal of Computer Applications (0975--8887) 134 16 (2016) 6. Nita Patil Ajay S. Patil and B. V. Pawar. 2016. Survey of named entity recognition systems with respect to indian and foreign languages. International Journal of Computer Applications (0975--8887) 134 16 (2016) 6.

2. Weighted Vote-Based Classifier Ensemble for Named Entity Recognition

3. Approaches to named entity recognition: A survey;Potey A.;International Journal of Innovative Research in Computer and Communication Engineering (An ISO,2015

4. Beyond the hype: Big data concepts, methods, and analytics

Cited by 38 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mono-lingual text reuse detection for the Urdu language at lexical level;Engineering Applications of Artificial Intelligence;2024-10

2. Extracting emotion from resource poor language through transfer learning;Multimedia Tools and Applications;2024-07-30

3. Enriching Urdu NER with BERT Embedding, Data Augmentation, and Hybrid Encoder-CNN Architecture;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-04-15

4. A deep learning approach for Named Entity Recognition in Urdu language;PLOS ONE;2024-03-28

5. SEEUNRS: Semantically Enriched Entity-Based Urdu News Recommendation System;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-03-09