Affiliation:
1. Department of Computer Science and Engineering, Government Polytechnic, Krishnarajpet, India
2. Department of Computer Science and Engineering, JSS Science and Technology University, Mysuru, India
Abstract
Parts-of-speech (POS) tagging is a method used to assign the POS tag for every word present in the text, and named entity recognition (NER) is a process to identify the proper nouns in the text and to classify the identified nouns into certain predefined categories. A POS tagger and a NER system for Kannada text have been proposed utilizing conditional random fields (CRFs). The dataset used for POS tagging consists of 147K tokens, where 103K tokens are used for training and the remaining tokens are used for testing. The proposed CRF model for POS tagging of Kannada text obtained 91.3% of precision, 91.6% of recall, and 91.4% of f-score values, respectively. To develop the NER system for Kannada, the data required is created manually using the modified tag-set containing 40 labels. The dataset used for NER system consists of 16.5K tokens, where 70% of the total words are used for training the model, and the remaining 30% of total words are used for model testing. The developed NER model obtained the 94% of precision, 93.9% of recall, and 93.9% of F1-measure values, respectively.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Defect Detection Method of Overhead Line Pins Based on Multi-Sensor Data Acquisition of UAV;Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering;2024
2. Enhancing HMM-based POS tagger for Mizo language;Journal of Intelligent & Fuzzy Systems;2023-12-02