Generative Adversarial Networks (GANs) for Audio-Visual Speech Recognition in Artificial Intelligence IoT-Reference-Cited by-同舟云学术

Generative Adversarial Networks (GANs) for Audio-Visual Speech Recognition in Artificial Intelligence IoT

Published:2023-10-19 Issue:10 Volume:14 Page:575
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

He Yibo¹,Seng Kah Phooi¹²³,Ang Li Minn³

Affiliation:

1. School of AI and Advanced Computing, Xi’an Jiaotong Liverpool University, Suzhou 215000, China

2. School of Computer Science, Queensland University of Technology, Brisbane City, QLD 4000, Australia

3. School of Science Technology and Engineering, University of the Sunshine Coast, Sippy Downs, QLD 4556, Australia

Abstract

This paper proposes a novel multimodal generative adversarial network AVSR (multimodal AVSR GAN) architecture, to improve both the energy efficiency and the AVSR classification accuracy of artificial intelligence Internet of things (IoT) applications. The audio-visual speech recognition (AVSR) modality is a classical multimodal modality, which is commonly used in IoT and embedded systems. Examples of suitable IoT applications include in-cabin speech recognition systems for driving systems, AVSR in augmented reality environments, and interactive applications such as virtual aquariums. The application of multimodal sensor data for IoT applications requires efficient information processing, to meet the hardware constraints of IoT devices. The proposed multimodal AVSR GAN architecture is composed of a discriminator and a generator, each of which is a two-stream network, corresponding to the audio stream information and the visual stream information, respectively. To validate this approach, we used augmented data from well-known datasets (LRS2-Lip Reading Sentences 2 and LRS3) in the training process, and testing was performed using the original data. The research and experimental results showed that the proposed multimodal AVSR GAN architecture improved the AVSR classification accuracy. Furthermore, in this study, we discuss the domain of GANs and provide a concise summary of the proposed GANs.

Publisher

MDPI AG

Subject

Information Systems

Link

https://www.mdpi.com/2078-2489/14/10/575/pdf

Reference45 articles.

1. The internet of things: A survey;Atzori;Comput. Netw.,2010

2. Interference alignment and game-theoretic power allocation in MIMO heterogeneous sensor networks communications;Zhao;Signal Process.,2016

3. Radio frequency identification (RFID);Roberts;Comput. Secur.,2006

4. Recent advances delivered by mobile cloud computing and internet of things for big data applications: A survey;Stergiou;Int. J. Netw. Manag.,2017

5. What is the McGurk effect?;Tiippana;Front. Psychol.,2014

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. HNet: A deep learning based hybrid network for speaker dependent visual speech recognition;International Journal of Hybrid Intelligent Systems;2024-06-03

2. Enhancing Network Analysis Through Computational Intelligence in GANs;Advances in Information Security, Privacy, and Ethics;2024-05-16

3. Deep learning and content-based filtering techniques for improving plant disease identification and treatment recommendations: A comprehensive review;Heliyon;2024-05

4. Enhancing Arabic Handwritten Recognition System-Based CNN-BLSTM Using Generative Adversarial Networks;European Journal of Artificial Intelligence and Machine Learning;2024-04-02

5. Generative Künstliche Intelligenz – die neue Ära der kreativen Maschinen;HMD Praxis der Wirtschaftsinformatik;2024-03-28