Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools-Reference-Cited by-同舟云学术

Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools

Published:2022-06-12 Issue:12 Volume:12 Page:5977
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Biró Attila^ORCID,Jánosi-Rancz Katalin Tünde^ORCID,Szilágyi László^ORCID,Cuesta-Vargas Antonio Ignacio^ORCID,Martín-Martín Jaime^ORCID,Szilágyi Sándor Miklós^ORCID

Abstract

Real-time multilingual phrase detection from/during online video presentations—to support instant remote diagnostics—requires near real-time visual (textual) object detection and preprocessing for further analysis. Connecting remote specialists and sharing specific ideas is most effective using the native language. The main objective of this paper is to analyze and propose—through DEtection TRansformer (DETR) models, architectures, hyperparameters—recommendation, and specific procedures with simplified methods to achieve reasonable accuracy to support real-time textual object detection for further analysis. The development of real-time video conference translation based on artificial intelligence supported solutions has a relevant impact in the health sector, especially on clinical practice via better video consultation (VC) or remote diagnosis. The importance of this development was augmented by the COVID-19 pandemic. The challenge of this topic is connected to the variety of languages and dialects that the involved specialists speak and that usually needs human translator proxies which can be substituted by AI-enabled technological pipelines. The sensitivity of visual textual element localization is directly connected to complexity, quality, and the variety of collected training data sets. In this research, we investigated the DETR model with several variations. The research highlights the differences of the most prominent real-time object detectors: YOLO4, DETR, and Detectron2, and brings AI-based novelty to collaborative solutions combined with OCR. The performance of the procedures was evaluated through two research phases: a 248/512 (Phase1/Phase2) record train data set, with a 55/110 set of validated data instances for 7/10 application categories and 3/3 object categories, using the same object categories for annotation. The achieved score breaks the expected values in terms of visual text detection scope, giving high detection accuracy of textual data, the mean average precision ranging from 0.4 to 0.65.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/12/5977/pdf

Reference43 articles.

1. Spillover of COVID-19: Impact on the Global Economy

2. Conducting remote medical asylum evaluations in the United States during COVID-19: Clinicians’ perspectives on acceptability, challenges and opportunities

3. Object Detection With Deep Learning: A Review

4. Salient object detection based on global to local visual search guidance

5. Diagnostic accuracy in remote expert consultation using standard video-conference technology

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AI-controlled training method for performance hardening or injury recovery in sports;2024 IEEE 22nd World Symposium on Applied Machine Intelligence and Informatics (SAMI);2024-01-25

2. Real-time Artificial Intelligence Text Analysis for Identifying Burnout Syndromes in High-Performance Athletes;2024 IEEE 22nd World Symposium on Applied Machine Intelligence and Informatics (SAMI);2024-01-25

3. Optimal Training Dataset Preparation for AI-Supported Multilanguage Real-Time OCRs Using Visual Methods;Applied Sciences;2023-12-08

4. Precognition of mental health and neurogenerative disorders using AI-parsed text and sentiment analysis;Acta Universitatis Sapientiae, Informatica;2023-12-01

5. Galaxy detection and classification in sky images with neural network;2023 IEEE 23rd International Symposium on Computational Intelligence and Informatics (CINTI);2023-11-20