<scp>AENet</scp>: Image Retrieval of Kazakh Handwritten Documents Based on Attention Mechanism and Feature Aggregation-Reference-Cited by-同舟云学术

AENet: Image Retrieval of Kazakh Handwritten Documents Based on Attention Mechanism and Feature Aggregation

Published:2024-07-02 Issue:10 Volume:19 Page:1640-1651
ISSN:1931-4973
Container-title:IEEJ Transactions on Electrical and Electronic Engineering
language:en
Short-container-title:IEEJ Transactions Elec Engng

Author:

Chen Gang¹,Xu Xuebin¹,Wang Jiaoyan¹,Mamat Hornisa¹,Ubul Kurban¹^ORCID

Affiliation:

1. Xinjiang Key Laboratory of Multilingual Information Technology, School of Computer Science and Technology Xinjiang University Urumqi 830046 China

Abstract

Kazakh is one of the multilingual languages of China and is widely spoken in some areas of Xinjiang, China. However, due to the fact that Kazakh is a language in which several characters are glued together to form a continuous word with a unique shape and complex structural combinations of relationships. This paper explores a solution for offline image retrieval of handwritten Kazakh words, which is a challenging task because, due to the lack of relevant datasets and the special writing morphology of the Kazakh language, traditional text image retrieval algorithms often struggle to achieve satisfactory results when dealing with writing styles that are varied and adherent to the language. Therefore, a dataset of offline Kazakh handwritten document images was created in this paper. The dataset contains 300 pages of document images with 20 500 words. Then, a new model called the ‘AENet’ is proposed. The model utilizes an attention mechanism to focus more finely on focal regions such as centers, inflection points, and contours of handwritten word images and to capture important local features from different scales. Fusion space pyramid pooling, feature aggregation, encoding operations, and feature downscaling and reconstruction are used to extract and reconstruct more representative feature representations from local to global to capture the overall information in the word images. Through experimental evaluation on Kazak‐80, Zilla‐64, and HWDB1.1‐375 datasets, it is verified that the method significantly improves the mAP for image retrieval of handwritten words, which is especially applicable to adhesive languages like Kazakh. © 2024 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.

Funder

Natural Science Foundation of Xinjiang Uygur Autonomous Region

National Natural Science Foundation of China

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/tee.24122

Reference39 articles.

1. BenabdelazizR GacebD HaddadM.Word‐spotting approach using transfer deep learning of a cnn network.Proceedings of 2020 1st International Conference on Communications Control Systems and Signal Processing (CCSSP) 219–224.2020.

2. Improving OCR Accuracy for Kazakh Handwriting Recognition Using GAN Models

3. Bao‐sheY.Research on input method of kazakh language for mobile phone. Computer Technology and Development.2013.

4. A Document Image Retrieval System

5. HeK ZhangX RenS SunJ.Deep residual learning for image recognition.Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778.2015.