Diacritics restoration based on word n-grams for Slovak texts-Reference-Cited by-同舟云学术

Diacritics restoration based on word n-grams for Slovak texts

Published:2021-01-01 Issue:1 Volume:11 Page:180-189
ISSN:2299-1093
Container-title:Open Computer Science
language:en
Short-container-title:

Author:

Toth Štefan¹,Zaymus Emanuel¹,Ďuračík Michal¹,Hrkút Patrik¹,Meško Matej¹

Affiliation:

1. Department of Software Technologies, Faculty of Management Science and Informatics , University of Žilina , Žilina , Slovakia

Abstract

Abstract Despite the modern boom in technology, we are still faced with the fact that people write texts without diacritics. There are two main reasons for this. The first, historical reason stems from the past when the use of diacritics was troublesome and people would write text without them. The second one is the speed - typing without diacritics is usually faster. Text without diacritics is easy to understand for people, but for some types of documents, missing diacritics can cause a problem. This is also an issue when computers process such text. In this paper, we propose an algorithm based on word n-grams (a contiguous sequence of n words) that can restore diacritics of text written in the Slovak language. We also compare and evaluate our results with other algorithms developed for Slovak text.

Publisher

Walter de Gruyter GmbH

Subject

General Computer Science

Link

https://www.degruyter.com/document/doi/10.1515/comp-2020-0143/pdf

Reference16 articles.

1. Federico M., Bertoldi N., Cettolo M., Irstlm: an open source toolkit for handling large scale language models, Ninth Annual Conference of the International Speech Communication Association, 2008.

2. Gedera J., Doplňovač diakritiky (tool for diacritic restoration), http://text.fiit.stuba.sk:8081/, Last accessed 24 June 2020.

3. Hládek D., Staš J., Juhár J., Diacritics restoration in the slovak texts using hidden markov model, Language and Technology Conference, Springer, 2013, 29–40.

4. Hraška R., Doplňač diakritiky (tool for diacritic restoration), https://diakritika.brm.sk/, Last accessed 24 June 2020.

5. Hucko A. Lacko P., Diacritics restoration using deep neural networks, 2018 World Symposium on Digital Intelligence for Systems and Machines (DISA), IEEE, 2018, 195–200.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Correcting Diacritics and Typos with a ByT5 Transformer Model;Applied Sciences;2022-03-03