MULTILINGUAL MACHINE PRINTED OCR-Reference-Cited by-同舟云学术

MULTILINGUAL MACHINE PRINTED OCR

Published:2001-02 Issue:01 Volume:15 Page:43-63
ISSN:0218-0014
Container-title:International Journal of Pattern Recognition and Artificial Intelligence
language:en
Short-container-title:Int. J. Patt. Recogn. Artif. Intell.

Author:

NATARAJAN PREMKUMAR¹,LU ZHIDONG¹,SCHWARTZ RICHARD¹,BAZZI ISSAM¹,MAKHOUL JOHN¹

Affiliation:

1. BBN Technologies, Verizon, Cambridge, MA 02138, USA

Abstract

This paper presents a script-independent methodology for optical character recognition (OCR) based on the use of hidden Markov models (HMM). The feature extraction, training and recognition components of the system are all designed to be script independent. The training and recognition components were taken without modification from a continuous speech recognition system; the only component that is specific to OCR is the feature extraction component. To port the system to a new language, all that is needed is text image training data from the new language, along with ground truth which gives the identity of the sequences of characters along each line of each text image, without specifying the location of the characters on the image. The parameters of the character HMMs are estimated automatically from the training data, without the need for laborious handwritten rules. The system does not require presegmentation of the data, neither at the word level nor at the character level. Thus, the system is able to handle languages with connected characters in a straightforward manner. The script independence of the system is demonstrated in three languages with different types of script: Arabic, English, and Chinese. The robustness of the system is further demonstrated by testing the system on fax data. An unsupervised adaptation method is then described to improve performance under degraded conditions.

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218001401000745

Reference45 articles.

1. Text page recognition using Grey-level features and hidden Markov models

2. Hidden markov model based optical character recognition in the presence of deterministic transformations

3. Survey and bibliography of Arabic optical text recognition

4. Segmentation versus segmentation-free for recognizing Arabic text

5. HIDDEN MARKOV MODELS IN TEXT RECOGNITION

Cited by 36 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Efficient CRNN: Towards end-to-end low resource Urdu text recognition using depthwise separable convolutions and gated recurrent units;Information Processing & Management;2024-01

2. Printed Ottoman text recognition using synthetic data and data augmentation;International Journal on Document Analysis and Recognition (IJDAR);2023-05-24

3. Combining Convolutional Neural Networks and LSTMs for Segmentation-Free OCR;2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR);2017-11

4. Multi-font Telugu Text Recognition Using Hidden Markov Models and Akshara Bi-grams;Computer Vision, Graphics, and Image Processing;2017

5. Conservative preprocessing of document images;International Journal on Document Analysis and Recognition (IJDAR);2016-09-20