Manuscripts Character Recognition Using Machine Learning and Deep Learning-Reference-Cited by-同舟云学术

Manuscripts Character Recognition Using Machine Learning and Deep Learning

Published:2023-04-04 Issue:2 Volume:4 Page:168-188
ISSN:2673-3951
Container-title:Modelling
language:en
Short-container-title:Modelling

Author:

Islam Mohammad Anwarul¹,Iacob Ionut E.¹^ORCID

Affiliation:

1. Mathematical Sciences, Georgia Southern University, Statesboro, GA 30458, USA

Abstract

The automatic character recognition of historic documents gained more attention from scholars recently, due to the big improvements in computer vision, image processing, and digitization. While Neural Networks, the current state-of-the-art models used for image recognition, are very performant, they typically suffer from using large amounts of training data. In our study we manually built our own relatively small dataset of 404 characters by cropping letter images from a popular historic manuscript, the Electronic Beowulf. To compensate for the small dataset we use ImageDataGenerator, a Python library was used to augment our Beowulf manuscript’s dataset. The training dataset was augmented once, twice, and thrice, which we call resampling 1, resampling 2, and resampling 3, respectively. To classify the manuscript’s character images efficiently, we developed a customized Convolutional Neural Network (CNN) model. We conducted a comparative analysis of the results achieved by our proposed model with other machine learning (ML) models such as support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), random forest (RF), and XGBoost. We used pretrained models such as VGG16, MobileNet, and ResNet50 to extract features from character images. We then trained and tested the above ML models and recorded the results. Moreover, we validated our proposed CNN model against the well-established MNIST dataset. Our proposed CNN model achieves very good recognition accuracies of 88.67%, 90.91%, and 98.86% in the cases of resampling 1, resampling 2, and resampling 3, respectively, for the Beowulf manuscript’s data. Additionally, our CNN model achieves the benchmark recognition accuracy of 99.03% for the MNIST dataset.

Publisher

MDPI AG

Subject

Multidisciplinary

Link

https://www.mdpi.com/2673-3951/4/2/10/pdf

Reference64 articles.

1. Saqib, N., Haque, K.F., Yanambaka, V.P., and Abdelgawad, A. (2022). Convolutional-Neural-Network-Based Handwritten Character Recognition: An Approach with Massive Multisource Data. Algorithms, 15.

2. Handwritten bangla character recognition using the state-of-the-art deep convolutional neural networks;Alom;Comput. Intell. Neurosci.,2018

3. Methods, Models and Tools for Improving the Quality of Textual Annotations;Artese;Modelling,2022

4. Kiernn, K., and Iacob, I.E. (2023, February 28). Electronic Beowulf, CD-ROM, British Library, 3rd edition, October 2011. Available online: https://ebeowulf.uky.edu/.

5. Library, B. (2023, February 28). British Library Collection Items. Available online: https://www.bl.uk/collection-items/beowulf(Website).

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Distributed Image Classification on Big Data Platforms: A Gradient Boosted Trees Approach;2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE);2024-05-09

2. Digital Forensics to Identify Damaged Part of Palm Leaf Manuscript;2023 6th International Conference of Computer and Informatics Engineering (IC2IE);2023-09-14

3. Enhancing Handwritten Alphabet Prediction with Real-time IoT Sensor Integration in Machine Learning for Image;Journal of Smart Internet of Things;2022-12-01