Optical Character Recognition of Amharic Documents-Reference-Cited by-同舟云学术

Optical Character Recognition of Amharic Documents

Published:2007-08-13 Issue:2 Volume:3 Page:
ISSN:1449-2679
Container-title:African Journal of Information & Communication Technology
language:
Short-container-title:African Journal of ICT

Author:

Meshesha Million,Jawahar C V

Abstract

In Africa around 2,500 languages are spoken. Some of these languages have their own indigenous scripts. Accordingly, there is a bulk of printed documents available in libraries, information centers, museums and offices. Digitization of these documents enables to harness already available information technologies to local information needs and developments. This paper presents an Optical Character Recognition (OCR) system for converting digitized documents in local languages. An extensive literature survey reveals that this is the first attempt that report the challenges towards the recognition of indigenous African scripts and a possible solution for Amharic script. Research in the recognition of African indigenous scripts faces major challenges due to (i) the use of large number characters in the writing and (ii) existence of large set of visually similar characters. In this paper, we propose a novel feature extraction scheme using principal component and linear discriminant analysis, followed by a decision directed acyclic graph based support vector machine classifier. Recognition results are presented on real-life degraded documents such as books, magazines and newspapers to demonstrate the performance of the recognizer.

Publisher

University of Technology, Sydney (UTS)

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. How can we detect news surrounding community safety crisis incidents in the internet? Experiments using attention-based Bi-LSTM models;International Journal of Information Management Data Insights;2024-04

2. Handwritten Amharic Word Recognition With Additive Attention Mechanism;IEEE Access;2024

3. Typewritten OCR Model for Ethiopic Characters;Communications in Computer and Information Science;2024

4. Challenges to Prepare the Parallel Corpus for Luganda Language;Studies in Computational Intelligence;2024

5. OCR System for the Recognition of Ethiopic Real-Life Documents;Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering;2022