Affiliation:
1. School of Electronic and Information Engineering, North China University of Technology, Beijing, P. R. China
Abstract
Traditional automatic lip-reading systems generally consist of two stages: feature extraction and recognition, while the handcrafted features are empirical and cannot learn the relevance of lip movement sequence sufficiently. Recently, deep learning approaches have attracted increasing attention, especially the significant improvements of convolution neural network (CNN) applied to image classification and long short-term memory (LSTM) used in speech recognition, video processing and text analysis. In this paper, we propose a hybrid neural network architecture, which integrates CNN and bidirectional LSTM (BiLSTM) for lip reading. First, we extract key frames from each isolated video clip and use five key points to locate mouth region. Then, features are extracted from raw mouth images using an eight-layer CNN. The extracted features have the characteristics of stronger robustness and fault-tolerant capability. Finally, we use BiLSTM to capture the correlation of sequential information among frame features in two directions and the softmax function to predict final recognition result. The proposed method is capable of extracting local features through convolution operations and finding hidden correlation in temporal information from lip image sequences. The evaluation results of lip-reading recognition experiments demonstrate that our proposed method outperforms conventional approaches such as active contour model (ACM) and hidden Markov model (HMM).
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. AI LipReader-Transcribing Speech from Lip Movements;2024 International Conference on Emerging Smart Computing and Informatics (ESCI);2024-03-05
2. Lip-Reading Advancements: A 3D Convolutional Neural Network/Long Short-Term Memory Fusion for Precise Word Recognition;BioMedInformatics;2024-02-04
3. A Review on Deep Learning-Based Automatic Lipreading;Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering;2023
4. Gabor-based Audiovisual Fusion for Mandarin Chinese Speech Recognition;2022 30th European Signal Processing Conference (EUSIPCO);2022-08-29
5. Derin Öğrenme ile Dudak Okuma Üzerine Detaylı Bir Araştırma;Uluslararası Muhendislik Arastirma ve Gelistirme Dergisi;2022-07-31