Rethinking Multilingual Scene Text Spotting: A Novel Benchmark and a Character-Level Feature Based Approach

Author:

Ma Siliang1ORCID,Xu Yong2

Affiliation:

1. School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

2. School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; Pengcheng Laboratory, Shenzhen, China

Abstract

End-to-end multilingual scene text spotting aims to integrate scene text detection and recognition into a unified framework. Actually, the accuracy of text recognition largely depends on the accuracy of text detection. Due to the lackage of benchmarks with adequate and high-quality character-level annotations for multilingual scene text spotting, most of the existing methods train on the benchmarks only with word-level annotations. However, the performance of multilingual scene text spotting are not that satisfied training on the existing benchmarks, especially for those images with special layout or words out of vocabulary. In this paper, we proposed a simple YOLO-like baseline named CMSTR for character-level multilingual scene text spotting simultaneously and efficiently. Technically, for each text instance, we represent the character sequence as ordered points and model them with learnable explicit point queries. After passing a single decoder, the point queries have encoded requisite text semantics and locations, thus can be further decoded to the center line, boundary, script, and confidence of text via very simple prediction heads in parallel. Furthermore, we show the surprisingly good extensibility of our method, in terms of character class, language type, and task. On the one hand, DeepSolo not only performs well in English scenes but also masters the Chinese transcription with complex font structure and a thousand-level character classes. On the other hand, based on the extensibility of DeepSolo, we launch DeepSolo++ for multilingual text spotting, making a further step to let Transformer decoder with explicit points solo for multilingual text detection, recognition, and script identification all at once.

Publisher

Science Publishing Group

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3