Affiliation:
1. Department of Computer Science, University of Engineering and Technology Lahore, Punjab, Pakistan
Abstract
Named entity recognition (NER)
is a task of proper noun identification from natural language text and classification into various types such as location, person, and organization. Due to NER's applications in different
natural language processing (NLP)
tasks, numerous NER approaches and benchmark datasets have been proposed. However, developing NER techniques for low-resource languages is still limited due to the absence of substantial training datasets. Punjabi is a classic example of low resource language. Although various researchers have worked on Punjabi, they focused on the Gurmukhi script. To overcome the challenges in developing NER for the Shahmukhi script, we present an improved technique for Punjabi NER for the Shahmukhi script in this paper. We firstly extend the existing dataset by adding new NER classes by leveraging a novel Pool of Words data augmentation strategy. Our extended dataset has 11,31,509 tokens and 1,25,789 labeled entities with more
named entities (NEs)
than the older dataset. In the next step, we fine-tuned a transformer model known as
Bidirectional Encoder Representations from Transformers (BERT)
for the NER task. We performed experiments using the proposed approach on a new and older dataset version, showing that our method achieved competitive results.
Publisher
Association for Computing Machinery (ACM)
Reference37 articles.
1. Named Entity Recognition in Natural Language Processing: A Systematic Review
2. Long short-term memory RNN for biomedical named entity recognition
3. Named entity recognition using support vector machine: A language independent approach;Ekbal A.;International Journal of Electrical and Computer Engineering,2010
4. Neural machine translation for low-resource languages: A survey;Ranathunga S.;arXiv preprint,2021
5. Low-Resource Named Entity Recognition via the Pre-Training Model
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Advancing NLP for Punjabi Language: A Comprehensive Review of Language Processing Challenges and Opportunities;2024 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT);2024-01-04