Turkish lexicon expansion by using finite state automata-Reference-Cited by-同舟云学术

Turkish lexicon expansion by using finite state automata

Published:2019-03-01 Issue: Volume: Page:1012-1027
ISSN:1303-6203
Container-title:TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES
language:en
Short-container-title:Turk J Elec Eng & Comp Sci

Author:

ÖZTÜRK MUSTAFA BURAK^ORCID,CAN BUĞLALILAR BURCU^ORCID

Abstract

Turkish is an agglutinative language with rich morphology. A Turkish verb can have thousands of different word forms. Therefore, sparsity becomes an issue in many Turkish natural language processing (NLP) applications. This article presents a model for Turkish lexicon expansion. We aimed to expand the lexicon by using a morphological segmentation system by reversing the segmentation task into a generation task. Our model uses finite-state automata (FSA) to incorporate orthographic features and morphotactic rules. We extracted orthographic features by capturing phonological operations that are applied to words whenever a suffix is added. Each FSA state corresponds to either a stem or a suffix category. Stems are clustered based on their parts-of-speech (i.e. noun, verb, or adjective) and suffixes are clustered based on their allomorphic features. We generated approximately 1 million word forms by using only a few thousand Turkish stems with an accuracy of 82.36%, which will help to reduce the out-of-vocabulary size in other NLP applications. Although our experiments are performed on Turkish language, the same model is also applicable to other agglutinative languages such as Hungarian and Finnish.

Publisher

The Scientific and Technological Research Council of Turkey (TUBITAK-ULAKBIM)

Subject

Electrical and Electronic Engineering,General Computer Science

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Madurese language learning system for beginners, design and build using finite state automata(FSA) to support the preservation of Indonesian regional languages;AIP Conference Proceedings;2024

2. Modeling financial statements for small and medium businesses in Worm-Made Fertilizer Using Finite State Automata (FSA);Journal of Physics: Conference Series;2021-06-01