Abstract
Named entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteer-related features, which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other well-known features like contextual, word-level, and corpus features to build NER models. NER models are developed using three well-known classifiers—conditional random field (CRF), support vector machine (SVM), and margin infused relaxed algorithms (MIRA). The gazetteer features are shown to improve the performance, and theMIRA-based NER model fared better than its counterparts SVM and CRF.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Discovering Named Entities at Scale of a Data Mining Perspective on Large Text Corpora;2024 IEEE 3rd International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA);2024-02-27
2. Telugu named entity recognition using bert;International Journal of Data Science and Analytics;2022-01-18
3. Chinese Named Entity Recognition Method in History and Culture Field Based on BERT;International Journal of Computational Intelligence Systems;2021-09-22
4. The first named entity recognizer in Maithili: Resource creation and system development;Journal of Intelligent & Fuzzy Systems;2021-08-11