PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition-Reference-Cited by-同舟云学术

PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition

Published:2022-09-28 Issue: Volume:2022 Page:1-11
ISSN:1687-5273
Container-title:Computational Intelligence and Neuroscience
language:en
Short-container-title:Computational Intelligence and Neuroscience

Author:

Dan Yongping¹^ORCID,Zhu Zongnan¹,Jin Weishou¹,Li Zhuo¹

Affiliation:

1. School of Electronic Information, Zhongyuan University of Technology, Zhengzhou 450007, Henan, China

Abstract

Recently, Vision Transformer (ViT) has been widely used in the field of image recognition. Unfortunately, the ViT model repeatedly stacks 12-layer encoders, resulting in a large number of model computations, many parameters, and slow training speed, making it difficult to deploy on mobile devices. In order to reduce the computational complexity of the model and improve the training speed, a parallel and fast Vision Transformer method for offline handwritten Chinese character recognition is proposed. The method adds parallel branches of the encoder module to the structure of the Vision Transformer model. Parallel modes include two-way parallel, four-way parallel, and seven-way parallel. The original picture is fed to the encoder module after flattening and linear embedding processing operations. The core step in the encoder is the multihead attention mechanism. Multihead self-attention can learn the interdependence between image sequence blocks. In addition, the use of data expansion strategies increases the diversity of data. In the two-way parallel experiment, when the model is 98.1% accurate on the dataset, the number of parameters and the number of FLOPs are 43.11 million and 4.32 G, respectively. Compared with the ViT model, whose parameters and FLOPs are 86 million and 16.8 G, respectively, the two-way parallel model has a 50.1% decrease in parameters and a 34.6% decrease in FLOPs. This method has been demonstrated to effectively reduce the computational complexity of the model while indirectly improving image recognition speed.

Publisher

Hindawi Limited

Subject

General Mathematics,General Medicine,General Neuroscience,General Computer Science

Link

http://downloads.hindawi.com/journals/cin/2022/8255763.pdf

Reference47 articles.

1. Vision Transformers for Remote Sensing Image Classification

2. Optical recognition of handwritten Chinese characters: Advances since 1980

3. Online recognition of chinese characters: the state-of-the-art

4. Chinese character recognition: history, status and prospects

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optical Character Recognition Using Optimized Convolutional Networks^*;2023 Eighth International Conference on Fog and Mobile Edge Computing (FMEC);2023-09-18

2. LW-ViT: The Lightweight Vision Transformer Model Applied in Offline Handwritten Chinese Character Recognition;Electronics;2023-04-03

3. Particle Swarm Optimization-Based Convolutional Neural Network for Handwritten Chinese Character Recognition;Journal of Advanced Computational Intelligence and Intelligent Informatics;2023-03-20

4. A novel multilevel stacked SqueezeNet model for handwritten Chinese character recognition;Computer Science and Information Systems;2023