Named Entity Recognition and Classification for Punjabi Shahmukhi-Reference-Cited by-同舟云学术

Named Entity Recognition and Classification for Punjabi Shahmukhi

Published:2020-07-31 Issue:4 Volume:19 Page:1-13
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Ahmad Muhammad Tayyab¹,Malik Muhammad Kamran¹^ORCID,Shahzad Khurram¹^ORCID,Aslam Faisal¹,Iqbal Asif¹,Nawaz Zubair¹,Bukhari Faisal¹

Affiliation:

1. Punjab University College of Information Technology, Lahore, Pakistan

Abstract

Named entity recognition (NER) refers to the identification of proper nouns from natural language text and classifying them into named entity types, such as person, location, and organization. Due to the widespread applications of NER, numerous NER techniques and benchmark datasets have been developed for both Western and Asian languages. Even though Shahmukhi script of the Punjabi language has been used by nearly three fourths of the Punjabi speakers worldwide, Gurmukhi has been the main focus of research activities. Specifically, a benchmark NER corpus for Shahmukhi is non-existent, which has thwarted the commencement of NER research for the Shahmukhi script. To this end, this article presents the development and specifications of the first-ever NER corpus for Shahmukhi. The newly developed corpus is composed of 318,275 tokens and 16,300 named entities, including 11,147 persons, 3,140 locations, and 2,013 organizations. To establish the strength of our corpus, we have compared the specifications of our corpus with its Gurmukhi counterparts. Furthermore, we have demonstrated the usability of our corpus using five supervised learning techniques, including two state-of-the-art deep learning techniques. The results are compared, and valuable insights about the behaviors of the most effective technique are discussed.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3383306

Reference28 articles.

1. Recognition of online handwritten Gurmukhi characters based on zone and stroke identification

2. Weighted Vote-Based Classifier Ensemble for Named Entity Recognition

3. A Survey of Arabic Named Entity Recognition and Classification

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. End-to-end framework for agricultural entity extraction – A hybrid model with transformer;Computers and Electronics in Agriculture;2024-10

2. Knowledge-Enriched Prompt for Low-Resource Named Entity Recognition;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-05-10

3. End-to-End Framework for Agricultural Entity Extraction - a Hybrid Model with Transformers;2024

4. Shahmukhi named entity recognition by using contextualized word embeddings;Expert Systems with Applications;2023-11

5. Comparing Open Arabic Named Entity Recognition Tools;2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI);2023-08