Affiliation:
1. SASTRA University, India
Abstract
By using image processing techniques, visual voice recognition (VSR) is able to extract voice or textual data from facial features. Similar to speech recognition systems, lip reading (LR) systems encounter issues because of variations in facial characteristics, speaking rates, skin tones, and pronunciations. An audio speech recognition system can be synchronised with the LR systems. The lip movement data, also known as lip characteristics or visemes, were obtained from the input video clip that was saved in the cloud. It takes each frame's lip features and stores them. Furthermore, training using a varied number of frames prevents a training dataset from yielding suitable text matches. Two parts make up the system: a feature extraction approach that turns lip characteristics into a visual feature cube and a Conv3D algorithm that matches words to their associated visemes. Precision is found in around 89% of the words. As a result, the 3D-CNN for the MIRACL-VC1 dataset performs better and offers increased classification accuracy when compared to the prior system.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Script Generation for Silent Speech in E-Learning;Advances in Educational Technologies and Instructional Design;2024-06-03