Affiliation:
1. Mathematical Sciences, Georgia Southern University, Statesboro, GA 30458, USA
Abstract
The automatic character recognition of historic documents gained more attention from scholars recently, due to the big improvements in computer vision, image processing, and digitization. While Neural Networks, the current state-of-the-art models used for image recognition, are very performant, they typically suffer from using large amounts of training data. In our study we manually built our own relatively small dataset of 404 characters by cropping letter images from a popular historic manuscript, the Electronic Beowulf. To compensate for the small dataset we use ImageDataGenerator, a Python library was used to augment our Beowulf manuscript’s dataset. The training dataset was augmented once, twice, and thrice, which we call resampling 1, resampling 2, and resampling 3, respectively. To classify the manuscript’s character images efficiently, we developed a customized Convolutional Neural Network (CNN) model. We conducted a comparative analysis of the results achieved by our proposed model with other machine learning (ML) models such as support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), random forest (RF), and XGBoost. We used pretrained models such as VGG16, MobileNet, and ResNet50 to extract features from character images. We then trained and tested the above ML models and recorded the results. Moreover, we validated our proposed CNN model against the well-established MNIST dataset. Our proposed CNN model achieves very good recognition accuracies of 88.67%, 90.91%, and 98.86% in the cases of resampling 1, resampling 2, and resampling 3, respectively, for the Beowulf manuscript’s data. Additionally, our CNN model achieves the benchmark recognition accuracy of 99.03% for the MNIST dataset.
Reference64 articles.
1. Saqib, N., Haque, K.F., Yanambaka, V.P., and Abdelgawad, A. (2022). Convolutional-Neural-Network-Based Handwritten Character Recognition: An Approach with Massive Multisource Data. Algorithms, 15.
2. Handwritten bangla character recognition using the state-of-the-art deep convolutional neural networks;Alom;Comput. Intell. Neurosci.,2018
3. Methods, Models and Tools for Improving the Quality of Textual Annotations;Artese;Modelling,2022
4. Kiernn, K., and Iacob, I.E. (2023, February 28). Electronic Beowulf, CD-ROM, British Library, 3rd edition, October 2011. Available online: https://ebeowulf.uky.edu/.
5. Library, B. (2023, February 28). British Library Collection Items. Available online: https://www.bl.uk/collection-items/beowulf(Website).
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献