Automatic Lip Reading Using Convolution Neural Network and Bidirectional Long Short-term Memory-Reference-Cited by-同舟云学术

Automatic Lip Reading Using Convolution Neural Network and Bidirectional Long Short-term Memory

Published:2019-05-24 Issue:01 Volume:34 Page:2054003
ISSN:0218-0014
Container-title:International Journal of Pattern Recognition and Artificial Intelligence
language:en
Short-container-title:Int. J. Patt. Recogn. Artif. Intell.

Author:

Lu Yuanyao¹^ORCID,Yan Jie¹

Affiliation:

1. School of Electronic and Information Engineering, North China University of Technology, Beijing, P. R. China

Abstract

Traditional automatic lip-reading systems generally consist of two stages: feature extraction and recognition, while the handcrafted features are empirical and cannot learn the relevance of lip movement sequence sufficiently. Recently, deep learning approaches have attracted increasing attention, especially the significant improvements of convolution neural network (CNN) applied to image classification and long short-term memory (LSTM) used in speech recognition, video processing and text analysis. In this paper, we propose a hybrid neural network architecture, which integrates CNN and bidirectional LSTM (BiLSTM) for lip reading. First, we extract key frames from each isolated video clip and use five key points to locate mouth region. Then, features are extracted from raw mouth images using an eight-layer CNN. The extracted features have the characteristics of stronger robustness and fault-tolerant capability. Finally, we use BiLSTM to capture the correlation of sequential information among frame features in two directions and the softmax function to predict final recognition result. The proposed method is capable of extracting local features through convolution operations and finding hidden correlation in temporal information from lip image sequences. The evaluation results of lip-reading recognition experiments demonstrate that our proposed method outperforms conventional approaches such as active contour model (ACM) and hidden Markov model (HMM).

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218001420540038

Reference27 articles.

1. Comparison of low- and high-level visual features for audio-visual continuous automatic speech recognition

2. Multiple cameras audio visual speech recognition using active appearance model visual features in car environment

3. Automatic Lip Reading in the Dutch Language Using Active Appearance Models on High Speed Recordings

4. Long Short-Term Memory

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AI LipReader-Transcribing Speech from Lip Movements;2024 International Conference on Emerging Smart Computing and Informatics (ESCI);2024-03-05

2. Lip-Reading Advancements: A 3D Convolutional Neural Network/Long Short-Term Memory Fusion for Precise Word Recognition;BioMedInformatics;2024-02-04

3. A Review on Deep Learning-Based Automatic Lipreading;Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering;2023

4. Gabor-based Audiovisual Fusion for Mandarin Chinese Speech Recognition;2022 30th European Signal Processing Conference (EUSIPCO);2022-08-29

5. Derin Öğrenme ile Dudak Okuma Üzerine Detaylı Bir Araştırma;Uluslararası Muhendislik Arastirma ve Gelistirme Dergisi;2022-07-31