Affiliation:
1. Department of Software Technologies, Faculty of Management Science and Informatics , University of Žilina , Žilina , Slovakia
Abstract
Abstract
Despite the modern boom in technology, we are still faced with the fact that people write texts without diacritics. There are two main reasons for this. The first, historical reason stems from the past when the use of diacritics was troublesome and people would write text without them. The second one is the speed - typing without diacritics is usually faster. Text without diacritics is easy to understand for people, but for some types of documents, missing diacritics can cause a problem. This is also an issue when computers process such text. In this paper, we propose an algorithm based on word n-grams (a contiguous sequence of n words) that can restore diacritics of text written in the Slovak language. We also compare and evaluate our results with other algorithms developed for Slovak text.
Reference16 articles.
1. Federico M., Bertoldi N., Cettolo M., Irstlm: an open source toolkit for handling large scale language models, Ninth Annual Conference of the International Speech Communication Association, 2008.
2. Gedera J., Doplňovač diakritiky (tool for diacritic restoration), http://text.fiit.stuba.sk:8081/, Last accessed 24 June 2020.
3. Hládek D., Staš J., Juhár J., Diacritics restoration in the slovak texts using hidden markov model, Language and Technology Conference, Springer, 2013, 29–40.
4. Hraška R., Doplňač diakritiky (tool for diacritic restoration), https://diakritika.brm.sk/, Last accessed 24 June 2020.
5. Hucko A. Lacko P., Diacritics restoration using deep neural networks, 2018 World Symposium on Digital Intelligence for Systems and Machines (DISA), IEEE, 2018, 195–200.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献