Synthetized Multilanguage OCR Using CRNN and SVTR Models for Realtime Collaborative Tools
-
Published:2023-03-30
Issue:7
Volume:13
Page:4419
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Biró Attila123ORCID, Cuesta-Vargas Antonio Ignacio234ORCID, Martín-Martín Jaime35ORCID, Szilágyi László67ORCID, Szilágyi Sándor Miklós1ORCID
Affiliation:
1. Department of Electrical Engineering and Information Technology, George Emil Palade University of Medicine, Pharmacy, Science, and Technology of Targu Mures, Str. Nicolae Iorga, Nr. 1, 540088 Targu Mures, Romania 2. Department of Physiotherapy, University of Malaga, 29071 Malaga, Spain 3. Biomedical Research Institute of Malaga (IBIMA), 29590 Malaga, Spain 4. Faculty of Health Science, School of Clinical Science, Queensland University Technology, Brisbane 4000, Australia 5. Legal and Forensic Medicine Area, Department of Human Anatomy, Legal Medicine and History of Science, Faculty of Medicine, University of Malaga, 29071 Malaga, Spain 6. Computational Intelligence Research Group, Sapientia Hungarian University of Transylvania, 540485 Targu Mures, Romania 7. Physiological Controls Research Center, Óbuda University, 1034 Budapest, Hungary
Abstract
Background: Remote diagnosis using collaborative tools have led to multilingual joint working sessions in various domains, including comprehensive health care, and resulting in more inclusive health care services. One of the main challenges is providing a real-time solution for shared documents and presentations on display to improve the efficacy of noninvasive, safe, and far-reaching collaborative models. Classic optical character recognition (OCR) solutions fail when there is a mixture of languages or dialects or in case of the participation of different technical levels and skills. Due to the risk of misunderstandings caused by mistranslations or lack of domain knowledge of the interpreters involved, the technological pipeline also needs artificial intelligence (AI)-supported improvements on the OCR side. This study examines the feasibility of machine learning-supported OCR in a multilingual environment. The novelty of our method is that it provides a solution not only for different speaking languages but also for a mixture of technological languages, using artificially created vocabulary and a custom training data generation approach. Methods: A novel hybrid language vocabulary creation method is utilized in the OCR training process in combination with convolutional recurrent neural networks (CRNNs) and a single visual model for scene text recognition within the patch-wise image tokenization framework (SVTR). Data: In the research, we used a dedicated Python-based data generator built on dedicated collaborative tool-based templates to cover and simulated the real-life variances of remote diagnosis and co-working collaborative sessions with high accuracy. The generated training datasets ranged from 66 k to 8.5 M in size. Twenty-one research results were analyzed. Instruments: Training was conducted by using tuned PaddleOCR with CRNN and SVTR modeling and a domain-specific, customized vocabulary. The Weight & Biases (WANDB) machine learning (ML) platform is used for experiment tracking, dataset versioning, and model evaluation. Based on the evaluations, the training dataset was adjusted by using a different language corpus or/and modifications applied to templates. Results: The machine learning models recognized the multilanguage/hybrid texts with high accuracy. The highest precision scores achieved are 90.25%, 91.35%, and 93.89%. Conclusions: machine learning models for special multilanguages, including languages with artificially made vocabulary, perform consistently with high accuracy.
Funder
ITware, Hungary University of Malaga Consolidator Excellence Researcher Program of Óbuda University, Budapest Hungary Sapientia Institute for Research Programs, Romania
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference45 articles.
1. The factors affecting team effectiveness in hospitals: The mediating role of using electronic collaborative tools;Qaddumi;J. Interprofessional Educ. Pract.,2021 2. Biró, A., Jánosi-Rancz, K.T., Szilágyi, L., Cuesta-Vargas, A.I., Martín-Martín, J., and Szilágyi, S.M. (2022). Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools. Appl. Sci., 12. 3. Huang, J., Pang, G., Kovvuri, R., Toh, M., Liang, K.J., Krishnan, P., Yin, X., and Hassner, T. (2021, January 20–25). A Multiplexed Network for End-to-End, Multilingual OCR. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA. 4. An End-to-End OCR Text Reorganization Sequence Learning for Rich-text Detail Image Comprehension. European Conference on Computer Vision;Li;LNCS,2020 5. Du, Y.N., Li, C.X., Guo, R.Y., Yin, X.T., Liu, W.W., Zhou, J., Bai, Y.F., Yu, Z.L., Yang, Y.H., and Dang, Q.Q. (2020). PP-OCR: A Practical Ultra Lightweight OCR System. arXiv.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|