End-to-End Lip-Reading Open Cloud-Based Speech Architecture-Reference-Cited by-同舟云学术

End-to-End Lip-Reading Open Cloud-Based Speech Architecture

Published:2022-04-12 Issue:8 Volume:22 Page:2938
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Jeon Sanghun^ORCID,Kim Mun Sang

Abstract

Deep learning technology has encouraged research on noise-robust automatic speech recognition (ASR). The combination of cloud computing technologies and artificial intelligence has significantly improved the performance of open cloud-based speech recognition application programming interfaces (OCSR APIs). Noise-robust ASRs for application in different environments are being developed. This study proposes noise-robust OCSR APIs based on an end-to-end lip-reading architecture for practical applications in various environments. Several OCSR APIs, including Google, Microsoft, Amazon, and Naver, were evaluated using the Google Voice Command Dataset v2 to obtain the optimum performance. Based on performance, the Microsoft API was integrated with Google’s trained word2vec model to enhance the keywords with more complete semantic information. The extracted word vector was integrated with the proposed lip-reading architecture for audio-visual speech recognition. Three forms of convolutional neural networks (3D CNN, 3D dense connection CNN, and multilayer 3D CNN) were used in the proposed lip-reading architecture. Vectors extracted from API and vision were classified after concatenation. The proposed architecture enhanced the OCSR API average accuracy rate by 14.42% using standard ASR evaluation measures along with the signal-to-noise ratio. The proposed model exhibits improved performance in various noise settings, increasing the dependability of OCSR APIs for practical applications.

Funder

National Research Foundation of Korea

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/22/8/2938/pdf

Reference46 articles.

1. Spoken Language Processing;Huang,2001

2. Speech Processing: A Dynamic and Optimization-Oriented Approach;Deng,2003

3. Speech-Centric Information Processing: An Optimization-Oriented Approach