Abstract
New word identification is one of the difficult problems of the Chinese information processing. This paper presents a new method to identify new words. First of all, the text is segmented using N-Gram; then PPM is used to identify the new words which are in the text; finally, the new identified words are added to update the dictionary using LRU. Compared with three well-known word segmentation systems, the experimental results show that this method can improve the precision and recall rate of new word identification to a certain extent.
Publisher
Trans Tech Publications, Ltd.
Reference8 articles.
1. Yiling ZENG, Hongbo XU. Research on Internet hotspot information detection[J]. Journal on Communications, 2007, 28(12): 141-146.
2. Haijun ZHANG, Shumin SHI, Chaoyong ZHU, et al. Survey of Chinese New Words Identification [J]. Computer Science, 2010, 37(3): 6-10.
3. Xuming WANG, Wenquan YANG. Contemporary on Chinese new words [J]. Chinese Language Learning, 2009, (1): 97-104.
4. Kun WANG, Chengqing ZONG, Keh-Yih SU. A character-based joint model for Chinese word segmentation [C]. COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics, 2010, 1173-1181.
5. Xiao SUN, De-gen HUNG, Fu-ji REN. Chinese New Word Identification: A Latent Discriminative Model with Global Features [J]. Journal of Computer Science & Technology, 2011, 26(1): 14-24.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献