Synthetized Multilanguage OCR Using CRNN and SVTR Models for Realtime Collaborative Tools-Reference-Cited by-同舟云学术

Synthetized Multilanguage OCR Using CRNN and SVTR Models for Realtime Collaborative Tools

Published:2023-03-30 Issue:7 Volume:13 Page:4419
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Biró Attila¹²³^ORCID,Cuesta-Vargas Antonio Ignacio²³⁴^ORCID,Martín-Martín Jaime³⁵^ORCID,Szilágyi László⁶⁷^ORCID,Szilágyi Sándor Miklós¹^ORCID

Affiliation:

1. Department of Electrical Engineering and Information Technology, George Emil Palade University of Medicine, Pharmacy, Science, and Technology of Targu Mures, Str. Nicolae Iorga, Nr. 1, 540088 Targu Mures, Romania

2. Department of Physiotherapy, University of Malaga, 29071 Malaga, Spain

3. Biomedical Research Institute of Malaga (IBIMA), 29590 Malaga, Spain

4. Faculty of Health Science, School of Clinical Science, Queensland University Technology, Brisbane 4000, Australia

5. Legal and Forensic Medicine Area, Department of Human Anatomy, Legal Medicine and History of Science, Faculty of Medicine, University of Malaga, 29071 Malaga, Spain

6. Computational Intelligence Research Group, Sapientia Hungarian University of Transylvania, 540485 Targu Mures, Romania

7. Physiological Controls Research Center, Óbuda University, 1034 Budapest, Hungary

Abstract

Background: Remote diagnosis using collaborative tools have led to multilingual joint working sessions in various domains, including comprehensive health care, and resulting in more inclusive health care services. One of the main challenges is providing a real-time solution for shared documents and presentations on display to improve the efficacy of noninvasive, safe, and far-reaching collaborative models. Classic optical character recognition (OCR) solutions fail when there is a mixture of languages or dialects or in case of the participation of different technical levels and skills. Due to the risk of misunderstandings caused by mistranslations or lack of domain knowledge of the interpreters involved, the technological pipeline also needs artificial intelligence (AI)-supported improvements on the OCR side. This study examines the feasibility of machine learning-supported OCR in a multilingual environment. The novelty of our method is that it provides a solution not only for different speaking languages but also for a mixture of technological languages, using artificially created vocabulary and a custom training data generation approach. Methods: A novel hybrid language vocabulary creation method is utilized in the OCR training process in combination with convolutional recurrent neural networks (CRNNs) and a single visual model for scene text recognition within the patch-wise image tokenization framework (SVTR). Data: In the research, we used a dedicated Python-based data generator built on dedicated collaborative tool-based templates to cover and simulated the real-life variances of remote diagnosis and co-working collaborative sessions with high accuracy. The generated training datasets ranged from 66 k to 8.5 M in size. Twenty-one research results were analyzed. Instruments: Training was conducted by using tuned PaddleOCR with CRNN and SVTR modeling and a domain-specific, customized vocabulary. The Weight & Biases (WANDB) machine learning (ML) platform is used for experiment tracking, dataset versioning, and model evaluation. Based on the evaluations, the training dataset was adjusted by using a different language corpus or/and modifications applied to templates. Results: The machine learning models recognized the multilanguage/hybrid texts with high accuracy. The highest precision scores achieved are 90.25%, 91.35%, and 93.89%. Conclusions: machine learning models for special multilanguages, including languages with artificially made vocabulary, perform consistently with high accuracy.

Funder

ITware, Hungary

University of Malaga

Consolidator Excellence Researcher Program of Óbuda University, Budapest Hungary

Sapientia Institute for Research Programs, Romania

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/7/4419/pdf

Reference45 articles.

1. The factors affecting team effectiveness in hospitals: The mediating role of using electronic collaborative tools;Qaddumi;J. Interprofessional Educ. Pract.,2021

2. Biró, A., Jánosi-Rancz, K.T., Szilágyi, L., Cuesta-Vargas, A.I., Martín-Martín, J., and Szilágyi, S.M. (2022). Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools. Appl. Sci., 12.

3. Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., and Hassner, T. (2021, January 20–25). A Multiplexed Network for End-to-End, Multilingual OCR. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.

4. An End-to-End OCR Text Reorganization Sequence Learning for Rich-text Detail Image Comprehension. European Conference on Computer Vision;Li;LNCS,2020

5. Du, Y.N., Li, C.X., Guo, R.Y., Yin, X.T., Liu, W.W., Zhou, J., Bai, Y.F., Yu, Z.L., Yang, Y.H., and Dang, Q.Q. (2020). PP-OCR: A Practical Ultra Lightweight OCR System. arXiv.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Navigator: A Decentralized Scheduler for Latency-Sensitive AI Workflows;2024 IEEE International Conference on Edge Computing and Communications (EDGE);2024-07-07

2. Research on a Web System Data-Filling Method Based on Optical Character Recognition and Multi-Text Similarity;Applied Sciences;2024-01-25

3. Optimal Training Dataset Preparation for AI-Supported Multilanguage Real-Time OCRs Using Visual Methods;Applied Sciences;2023-12-08

4. Farsi Optical Character Recognition Using a Transformer-based Model;2023 13th International Conference on Computer and Knowledge Engineering (ICCKE);2023-11-01

5. Detection and Recognition of Tilted Characters on Railroad Wagon Wheelsets Based on Deep Learning;Sensors;2023-09-07