Abstract
Speech is a commonly used interaction-recognition technique in edutainment-based systems and is a key technology for smooth educational learning and user–system interaction. However, its application to real environments is limited owing to the various noise disruptions in real environments. In this study, an audio and visual information-based multimode interaction system is proposed that enables virtual aquarium systems that use speech to interact to be robust to ambient noise. For audio-based speech recognition, a list of words recognized by a speech API is expressed as word vectors using a pretrained model. Meanwhile, vision-based speech recognition uses a composite end-to-end deep neural network. Subsequently, the vectors derived from the API and vision are classified after concatenation. The signal-to-noise ratio of the proposed system was determined based on data from four types of noise environments. Furthermore, it was tested for accuracy and efficiency against existing single-mode strategies for extracting visual features and audio speech recognition. Its average recognition rate was 91.42% when only speech was used, and improved by 6.7% to 98.12% when audio and visual information were combined. This method can be helpful in various real-world settings where speech recognition is regularly utilized, such as cafés, museums, music halls, and kiosks.
Funder
National Research Foundation of Korea (NRF) grant funded by the Korea government
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference63 articles.
1. Framing the Design Space of Multimodal Mid-Air Gesture and Speech-Based Interaction With Mobile Devices for Older People
2. Lifelong robot edutainment based on self-efficacy;Kaburagi;Proceedings of the 2021 5th IEEE International Conference on Cybernetics (CYBCONF),2021
3. AI applications on music technology for edutainment;Soo;Proceedings of the International Conference on Innovative Technologies and Learning,2018
4. A sketch classifier technique with deep learning models realized in an embedded system;Tsai;Proceedings of the 2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS),2019
5. Educational values in factual nature pictures;Disney;Educ. Horiz.,1954
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献