Affiliation:
1. Computer Science & Engineering Department, University of South Carolina, Columbia, SC 29201, USA
Abstract
Scene text spotting is a challenging multi-task modulation for locating and recognizing texts in complex scenes. Existing end-to-end text spotters generally adopt sequentially decoupled multi-tasks, consisting of text detection and text recognition modules. Although customized modules are designed to connect the tasks closely, there is no interaction among multiple tasks, resulting in compatible information loss for the overall text spotting. Moreover, the independent and sequential modulation is unidirectional, accumulating errors from early to later tasks. In this paper, we propose CommuSpotter, which enhances multi-task communication by explicitly and concurrently sharing compatible information in overall scene text spotting. To address task-specific inconsistencies, we propose a Conversation Mechanism (CM) to extract and exchange expertise in each specific task with others. Specifically, the detection task is rectified by the text recognition task to filter out duplicated results and false positives, while the text recognition task is corrected by the rectified text detection task to replenish missing characters and decrease non-text interruptions. Consequently, the communication compensates for interaction information and breaks the sequential pipeline of error propagation. In addition, we adopt text semantic segmentation in the text recognition task, which reduces the complex design of customized modules and corresponding extra annotations. Compared with state-of-the-art methods, experimental results show that our method achieves competitive results with computation efficiency.
Funder
XSEDE Program of the National Science Foundation
Aspire-II Research Program at the University of South Carolina
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference50 articles.
1. Mining criminal networks from unstructured text documents;Fung;Digit. Investig.,2012
2. Sivic, Z. (2003, January 14–17). Video Google: A text retrieval approach to object matching in videos. Proceedings of the 9th IEEE International Conference on Computer Vision, Nice, France.
3. Persuasive robotic assistant for health self-management of older adults: Design and evaluation of social behaviors;Looije;Int. J. Hum. Comput. Stud.,2010
4. Jung, S., Lee, U., Jung, J., and Shim, D. (2016, January 19–22). Real-time Traffic Sign Recognition system with deep convolutional neural network. Proceedings of the 13th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Xi’an, China.
5. Qin, S., Bissacco, A., Raptis, M., Fujii, Y., and Xiao, Y. (November, January 27). Towards unconstrained end-to-end text spotting. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea.