Scene Text Detection Based on Multi-Headed Self-Attention Using Shifted Windows-Reference-Cited by-同舟云学术

Scene Text Detection Based on Multi-Headed Self-Attention Using Shifted Windows

Published:2023-03-20 Issue:6 Volume:13 Page:3928
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Huang Baohua¹^ORCID,Feng Xiaoru¹

Affiliation:

1. School of Computer and Electronic Information, Guangxi University, Nanning 530004, China

Abstract

Scene text detection has become a popular topic in computer vision research. Most of the current research is based on deep learning, using Convolutional Neural Networks (CNNs) to extract the visual features of images. However, due to the limitations of convolution kernel size, CNNs can only extract local features of images with small perceptual fields, and they cannot obtain more global features. In this paper, to improve the accuracy of scene text detection, a feature enhancement module is added to the text detection model. This module acquires global features of an image by computing the multi-headed self-attention of the feature map. The improved model extracts local features using CNNs, while extracting global features through the feature enhancement module. The features extracted by both of these are then fused to ensure that visual features at different levels of the image are extracted. A shifted window is used in the calculation of the self-attention, which reduces the computational complexity from the second power of the input image width-height product to the first power. Experiments are conducted on the multi-oriented text dataset ICDAR2015 and the multi-language text dataset MSRA-TD500. Compared with the pre-improvement method DBNet, the F1-score improves by 0.5% and 3.5% on ICDAR2015 and MSRA-TD500, respectively, indicating the effectiveness of the model improvement.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/6/3928/pdf

Reference39 articles.

1. Cao, D., Zhong, Y., Wang, L., He, Y., and Dang, J. (2020). Scene Text Detection in Natural Images: A Review. Symmetry, 12.

2. Ibravim, M., Li, Y., and Hamdulla, A. (2022). Scene Text Detection Based on Two-Branch Feature Extraction. Sensors, 22.

3. Hassan, E., and Lekshmi, V. (2022). Scene Text Detection Using Attention with Depthwise Separable Convolutions. Appl. Sci., 12.

4. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23–28). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.

5. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., and Berg, A.C. (2016, January 11–14). SSD: Single shot multibox detector. Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Hybrid Model of Conformer and LSTM for Ocean Wave Height Prediction;Applied Sciences;2024-07-15