iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing-Reference-Cited by-同舟云学术

iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing

Published:2021-01-30 Issue:2 Volume:49 Page:253-284
ISSN:0885-7458
Container-title:International Journal of Parallel Programming
language:en
Short-container-title:Int J Parallel Prog

Author:

Tekleyohannes Menbere Kina,Rybalkin Vladimir,Ghaffar Muhammad Mohsin,Varela Javier Alejandro,Wehn Norbert,Dengel Andreas

Abstract

AbstractIn recent years,

$$\hbox {optical character recognition (OCR)}$$

optical character recognition (OCR) systems have been used to digitally preserve historical archives. To transcribe historical archives into a machine-readable form, first, the documents are scanned, then an

$$\hbox {OCR}$$

OCR is applied. In order to digitize documents without the need to remove them from where they are archived, it is valuable to have a portable device that combines scanning and

$$\hbox {OCR}$$

OCR capabilities. Nowadays, there exist many commercial and open-source document digitization techniques, which are optimized for contemporary documents. However, they fail to give sufficient text recognition accuracy for transcribing historical documents due to the severe quality degradation of such documents. On the contrary, the anyOCR system, which is designed to mainly digitize historical documents, provides high accuracy. However, this comes at a cost of high computational complexity resulting in long runtime and high power consumption. To tackle these challenges, we propose a low power energy-efficient accelerator with real-time capabilities called iDocChip, which is a configurable hybrid hardware-software programmable

$$\hbox {System-on-Chip (SoC)}$$

System-on-Chip (SoC) based on anyOCR for digitizing historical documents. In this paper, we focus on one of the most crucial processing steps in the anyOCR system: Text and Image Segmentation, which makes use of a multi-resolution morphology-based algorithm. Moreover, an optimized

$$\hbox {FPGA}$$

FPGA -based hybrid architecture of this anyOCR step along with its optimized software implementations are presented. We demonstrate our results on multiple embedded and general-purpose platforms with respect to runtime and power consumption. The resulting hardware accelerator outperforms the existing anyOCR by 6.2

$$\times$$

× , while achieving 207

$$\times$$

× higher energy-efficiency and maintaining its high accuracy.

Funder

Projekt DEAL

Publisher

Springer Science and Business Media LLC

Subject

Information Systems,Theoretical Computer Science,Software

Link

http://link.springer.com/content/pdf/10.1007/s10766-020-00690-y.pdf

Reference45 articles.

1. ABBYY. https://www.abbyy.com/en-eu/. Accessed 24 Apr 2020

2. Omnipage. https://www.kofax.com/Products/omnipage?source=nuance. Accessed 24 Apr 2020

3. OCRopus. https://github.com/tmbarchive/ocropy. Accessed: 2020-04-24

4. Tesseract. https://github.com/tesseract-ocr. Accessed 24 Apr 2020

5. Bukhari, S. S., Kadi, A, Jouneh, M. A., Mir, F. M., Dengel, A: anyocr: An open-source ocr system for historical archives. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol. 1 , pp. 305–310. IEEE, (2017)

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Digitizing History: Transitioning Historical Paper Documents to Digital Content for Information Retrieval and Mining—A Comprehensive Survey;IEEE Transactions on Computational Social Systems;2024

2. High-Performance Matrix Eigenvalue Decomposition Using the Parallel Jacobi Algorithm on FPGA;Circuits, Systems, and Signal Processing;2022-09-27

3. Adaptive Threshold-Based Database Preparation Method for Handwritten Image Classification;Communications in Computer and Information Science;2022

4. iDocChip: A Configurable Hardware Accelerator for an End-to-End Historical Document Image Processing;Journal of Imaging;2021-09-03