Ontologies and Bigram-based approach for Isolated Non-word Errors Correction in OCR System
-
Published:2015-12-01
Issue:6
Volume:5
Page:1458
-
ISSN:2088-8708
-
Container-title:International Journal of Electrical and Computer Engineering (IJECE)
-
language:
-
Short-container-title:IJECE
Author:
Eutamene Aicha,Kholladi Mohamed Khireddine,Belhadef Hacene
Abstract
In this paper, we describe a new and original approach for post-processing step in an OCR system. This approach is based on new method of spelling correction to correct automatically misspelled words resulting from a character recognition step of scanned documents by combining both ontologies and bigram code in order to create a robust system able to solve automatically the anomalies of classical approaches. The proposed approach is based on a hybrid method which is spread over two stages, first one is character recognition by using the ontological model and the second one is word recognition based on spelling correction approach based on bigram codification for detection and correction of errors. The spelling error is broadly classified in two categories namely non-word error and real-word error. In this paper, we interested only on detection and correction of non-word errors because this is the only type of errors treated by an OCR. In addition, the use of an online external resource such as WordNet proves necessary to improve its performances.
Publisher
Institute of Advanced Engineering and Science
Subject
Electrical and Electronic Engineering,General Computer Science
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献