Abstract
AbstractWe investigate the usage of semantic information for morphological segmentation since words that are derived from each other will remain semantically related. We use mathematical models such as maximum likelihood estimate (MLE) and maximum a posteriori estimate (MAP) by incorporating semantic information obtained from dense word vector representations. Our approach does not require any annotated data which make it fully unsupervised and require only a small amount of raw data together with pretrained word embeddings for training purposes. The results show that using dense vector representations helps in morphological segmentation especially for low-resource languages. We present results for Turkish, English, and German. Our semantic MLE model outperforms other unsupervised models for Turkish language. Our proposed models could be also used for any other low-resource language with concatenative morphology.
Publisher
Cambridge University Press (CUP)
Subject
Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software
Reference27 articles.
1. Morphological Word-Embeddings
2. Unsupervised Morphology Induction Using Word Embeddings
3. Goldwater, S. , Johnson, M. and Griffiths, T.L. (2006). Interpolating between types and tokens by estimating power-law generators. In Proceedings of the Advances in Neural Information Processing Systems 18. MIT Press, pp. 459–466.
4. Morpheme Boundaries within Words: Report on a Computer Test
5. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献