A comparison of deep transfer learning backbone architecture techniques for printed text detection of different font styles from unstructured documents-Reference-Cited by-同舟云学术

A comparison of deep transfer learning backbone architecture techniques for printed text detection of different font styles from unstructured documents

Published:2024-02-23 Issue: Volume:10 Page:e1769
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Mahadevkar Supriya¹,Patil Shruti²,Kotecha Ketan²,Abraham Ajith³⁴

Affiliation:

1. Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India

2. Symbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology Symbiosis International (Deemed University), Pune, Maharashtra, India

3. School of Computer Science Engineering & Technology, Bennett University, Greater Noida, Uttar Pradesh, India

4. Innopolis University, Innopolis, Republic of Tatarstan, Russia

Abstract

Object detection methods based on deep learning have been used in a variety of sectors including banking, healthcare, e-governance, and academia. In recent years, there has been a lot of attention paid to research endeavors made towards text detection and recognition from different scenesor images of unstructured document processing. The article’s novelty lies in the detailed discussion and implementation of the various transfer learning-based different backbone architectures for printed text recognition. In this research article, the authors compared the ResNet50, ResNet50V2, ResNet152V2, Inception, Xception, and VGG19 backbone architectures with preprocessing techniques as data resizing, normalization, and noise removal on a standard OCR Kaggle dataset. Further, the top three backbone architectures selected based on the accuracy achieved and then hyper parameter tunning has been performed to achieve more accurate results. Xception performed well compared with the ResNet, Inception, VGG19, MobileNet architectures by achieving high evaluation scores with accuracy (98.90%) and min loss (0.19). As per existing research in this domain, until now, transfer learning-based backbone architectures that have been used on printed or handwritten data recognition are not well represented in literature. We split the total dataset into 80 percent for training and 20 percent for testing purpose and then into different backbone architecture models with the same number of epochs, and found that the Xception architecture achieved higher accuracy than the others. In addition, the ResNet50V2 model gave us higher accuracy (96.92%) than the ResNet152V2 model (96.34%).

Funder

The Analytical Center for the Government of Russian Federation

Publisher

PeerJ

Link

https://peerj.com/articles/cs-1769.pdf

Reference32 articles.

1. An algorithmic approach for text recognition from printed/typed text images;Agrawal,2018

2. Character region awareness for text detection;Baek,2019

3. Efficient automated processing of the unstructured documents using artificial intelligence: a systematic literature review and future directions;Baviskar;IEEE Access,2021

4. The power of ensembles for active learning in image classification;Beluch;Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2018

5. Voting-based document image skew detection;Boiangiu;Applied Sciences (Switzerland),2020