POST-EDITING THROUGH APPROXIMATION AND GLOBAL CORRECTION-Reference-Cited by-同舟云学术

POST-EDITING THROUGH APPROXIMATION AND GLOBAL CORRECTION

Published:1995-12 Issue:06 Volume:09 Page:911-923
ISSN:0218-0014
Container-title:International Journal of Pattern Recognition and Artificial Intelligence
language:en
Short-container-title:Int. J. Patt. Recogn. Artif. Intell.

Author:

TAGHVA KAZEM¹,BORSACK JULIE¹,BULLARD BRYAN¹,CONDIT ALLEN¹

Affiliation:

1. Information Science Research Institute University of Nevada, Las Vegas, USA

Abstract

This paper describes a new automatic spelling correction program to deal with OCR generated errors. The method used here is based on three principles: 1. Approximate string matching between the misspellings and the terms occuring in the database as opposed to the entire dictionary 2. Local information obtained from the individual documents 3. The use of a confusion matrix, which contains information inherently specific to the nature of errors caused by the particular OCR device This system is then utilized to process approximately 10,000 pages of OCR generated documents. Among the misspellings discovered by this algorithm, about 87% were corrected.

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218001495000377

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Contrastive Study on Linguistic Features between HT and MT based on NLPIR-ICTCLAS;2021 5th International Conference on Natural Language Processing and Information Retrieval (NLPIR);2021-12-17

2. Using the Google Web 1T 5-Gram Corpus for OCR Error Correction;16th International Conference on Information Technology-New Generations (ITNG 2019);2019

3. Aligning Ground Truth Text with OCR Degraded Text;Advances in Intelligent Systems and Computing;2019

4. Incorporating linguistic post-processing into whole-book recognition;Document Recognition and Retrieval XVII;2010-01-17

5. Autotag: A tool for creating structured document collections from printed materials;Electronic Publishing, Artistic Imaging, and Digital Typography;1998