Affiliation:
1. School of Information Science and Engineering, Lanzhou University, China
2. Key Lab of China’s National Linguistic Information Technology, Northwest Minzu University
Abstract
Tibetan word segmentation and POS tagging are the primary tasks of Tibetan natural language processing. Most of existing methods of Tibetan word segmentation and POS tagging are based on rules and statistics, which need manual construction of features. In addition, the joint mode has shown stronger capabilities for word segmentation and POS tagging and have received great interests. In this paper, we propose Bi-LSTM+IDCNN+CRF structures, a simple yet effective end-to-end neural network model, for joint Tibetan word segmentation and POS tagging. We conduct step-by-step and joint experiments on the Tibetan datasets. The results demonstrate that the performance of the Bi-LSTM+IDCNN+CRF model is the best regardless of the step-by-step or joint mode. We obtain state-of-the-art performance in the joint tagging mode. The F1 score of the word segmentation task reached 92.31%, and the F1 score of the POS tagging task reached 81.26%.
Funder
National Key R&D Program of China
Ministry of Education - China Mobile Research Foundation
Fundamental Research Funds for the Central Universities
National Natural Science Foundation of China
Major National Project of High Resolution Earth Observation System
State Grid Corporation of China Science and Technology Project
Program for New Century Excellent Talents in University
Strategic Priority Research Program of the Chinese Academy of Sciences
Google Research Awards and Google Faculty Award, Science and Technology Plan of Qinghai Province
Publisher
Association for Computing Machinery (ACM)
Reference53 articles.
1. China National Information Technology Standardization on Network. The parts-of-speech tagging set for Tibetan information processing: GB/T 36337-2018[S].2018.
2. China National Information Technology Standardization on Network. Specification on Tibetan segmentation for information processing: GB/T 36452-2018[S].2018.
3. Learning Deep Architectures for AI
4. Design and implementation of Banzhida Tibetan word segmentation system;Cai Zhijie;Journal of Minorities Teachers College of Qinghai Teachers University,2010
5. Xinchi Chen Xipeng Qiu and Xuanjing Huang. 2017. A Feature-Enriched neural model for joint Chinese word segmentation and Part-of-Speech tagging. arXiv:1611.05384.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献