Automatic lip-reading model using 3D-CNN & LSTM-Reference-Cited by-同舟云学术

Automatic lip-reading model using 3D-CNN & LSTM

Published:2024 Issue:3 Volume:18 Page:32
ISSN:0973-5151
Container-title:i-manager’s Journal on Software Engineering
language:en
Short-container-title:JSE

Author:

Kaki Leela Prasad¹,P. Yeshwanth Sai¹,P. Sailaja Devi¹,P. Chandan Lohit¹,V. Sai Tarun¹

Affiliation:

1. Maharaj Vijayaram Gajapathi Raj College of Engineering

Abstract

Automatic lip-reading, the process of decoding spoken language through visual analysis of lip movements, presents a promising avenue for advancing human-computer interaction and accessibility. This research proposes an innovative model integrating 3D Convolutional Neural Networks (3D-CNN) and Long Short-Term Memory (LSTM) networks to enhance the accuracy and efficiency of lip-reading systems. The model addresses challenges related to lighting variations, speaker articulation, and linguistic diversity. This contrasts with traditional 2D-CNN, which focuses solely on spatial information, often missing temporal intricacies vital for accurate lip-reading. By incorporating 3D-CNN alongside LSTM, the proposed model significantly enhances recognition accuracy, offering a more comprehensive understanding of speech nuances. Extensive training on a diverse dataset and the exploration of transfer learning techniques contribute to the robustness and generalization of the model.

Publisher

i-manager Publications

Reference16 articles.

1. Lip Reading Multiclass Classification by Using Dilated CNN with Turkish Dataset

2. Some observations on computer lip-reading: moving from the dream to the reality

3. Lipreading with DenseNet and resBi-LSTM

4. Duchnowski, P., Meier, U., & Waibel, A. (1994, September). See me, hear me: Integrating automatic speech recognition and lip-reading. In International Conference on Spoken Language Processing (ICSLP), 94, 547-550.