Few‐shot learning for word‐level scene text script identification

Author:

Naosekpam Veronica1ORCID,Sahu Nilkanta2

Affiliation:

1. Artificial Intelligence Lab Indian Institute of Information Technology Guwahati Guwahati Assam India

2. Computer Vision and Biometrics Lab Indian Institute of Information Technology Guwahati Guwahati Assam India

Abstract

AbstractScript identification of text in scene images has attracted massive attention recently. However, the existing techniques primarily emphasize on scripts where data are available abundantly, such as English, European, or East Asian. Although these methods are robust in dealing with high‐resource data, how these techniques will work on low‐resource scripts has yet to be discovered. For example, in India, there is a disparity among the text scripts across the country's demographic. To bridge the research gap for resource‐constraint script identification, we present a few‐shot learning network called the TextScriptFSLNet. This network does not require huge training data while achieving state‐of‐the‐art performance on benchmark datasets. Our proposed method acts in accordance with a ‐way ‐shot paradigm by splitting the train set as support and query set, respectively. The support set learns representative knowledge of each class and creates its prototypes. We use multi‐kernel spatial attention fused 2‐layer convolutional neural network and averaging technique to generate the prototype of each class. This spatial attention aids in grasping important information in an image and enriches the feature representation. To the best of our knowledge, the proposed work is the first of its kind in the scene text understanding domain. Additionally, we created a dataset called Indic‐FSL2023 comprising 10 of the 22 officially recognized Indian scripts. The proposed method achieves the highest accuracy among the tested methods on the newly created Indic‐FSL2023. Experiments are also conducted on MLe2e to demonstrate its versatility. Furthermore, we also showed how our proposed model performed concerning illumination changes and blur on scene text script images.

Publisher

Wiley

Subject

Artificial Intelligence,Computational Mathematics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3