Affiliation:
1. EACH / USP
2. Institute of Computing / Unicamp
Abstract
Fifty years after Damerau set up his statistics for the distribution of errors in typed texts, his findings are still used in a range of different languages. Because these statistics were derived from texts in English, the question of whether they actually apply to other languages has been raised. We address this issue through the analysis of a set of typed texts in Brazilian Portuguese, deriving statistics tailored to this language. Results show that diacritical marks play a major role, as indicated by the frequency of mistakes involving them, thereby rendering Damerau's original findings mostly unfit for spelling correction systems, although still holding them useful, should one set aside such marks. Furthermore, a comparison between these results and those published for Spanish show no statistically significant differences between both languages—an indication that the distribution of spelling errors depends on the adopted character set rather than the language itself.
Subject
Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics
Reference14 articles.
1. A spelling corrector for Basque based on morphology
2. Automatic Stochastic Arabic Spelling Correction With Emphasis on Space Insertions and Deletions
3. Andrade, Guilherme, F. Teixeira, C. R. Xavier, R. S. Oliveira, Leonardo C. da Rocha, and A. G. Evsukoff. 2012. Hasch: High performance automatic spell checker for Portuguese texts from the Web. In Proceedings of ICCS-2012, pages 403–411, Omaha, NE.
4. Attia, Mohammed, Pavel Pecina, Younes Samih, Khaled Shaalan, and Josef van Genabith. 2012. Improved spelling error detection and correction for Arabic. In Proceedings of COLING-2012, pages 103–112, Mumbai.
5. Baba, Yukino and Hisami Suzuki. 2012. How are spelling errors generated and corrected? A study of corrected and uncorrected spelling errors using keystroke logs. In Proceedings of ACL-2012, pages 373–377, Jeju Island.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献