Incorporating word embeddings in unsupervised morphological segmentation-Reference-Cited by-同舟云学术

Incorporating word embeddings in unsupervised morphological segmentation

Published:2020-07-10 Issue:5 Volume:27 Page:609-629
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

Üstün Ahmet,Can Burcu^ORCID

Abstract

AbstractWe investigate the usage of semantic information for morphological segmentation since words that are derived from each other will remain semantically related. We use mathematical models such as maximum likelihood estimate (MLE) and maximum a posteriori estimate (MAP) by incorporating semantic information obtained from dense word vector representations. Our approach does not require any annotated data which make it fully unsupervised and require only a small amount of raw data together with pretrained word embeddings for training purposes. The results show that using dense vector representations helps in morphological segmentation especially for low-resource languages. We present results for Turkish, English, and German. Our semantic MLE model outperforms other unsupervised models for Turkish language. Our proposed models could be also used for any other low-resource language with concatenative morphology.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference27 articles.

1. Morphological Word-Embeddings

2. Unsupervised Morphology Induction Using Word Embeddings

3. Goldwater, S. , Johnson, M. and Griffiths, T.L. (2006). Interpolating between types and tokens by estimating power-law generators. In Proceedings of the Advances in Neural Information Processing Systems 18. MIT Press, pp. 459–466.

4. Morpheme Boundaries within Words: Report on a Computer Test

5. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Word segmentation of Chinese texts in the geoscience domain using the BERT model;2022-04-19

2. Gender bias in legal corpora and debiasing it;Natural Language Engineering;2022-03-30