Diverse Pose Lip-Reading Framework

Author:

Akhter Naheed,Ali Mushtaq,Hussain LalORCID,Shah Mohsin,Mahmood Toqeer,Ali Amjad,Al-Fuqaha Ala

Abstract

Lip-reading is a technique to understand speech by observing a speaker’s lips movement. It has numerous applications; for example, it is helpful for hearing impaired persons and understanding the speech in noisy environments. Most of the previous works of lips-reading focused on frontal and near frontal face lip-reading and some of them targeted multiple poses in high quality videos. However, their results are not satisfactory on low quality videos containing multiple poses. In this research work, a lip-reading framework is proposed for improving the recognition rate in low quality videos. In this work, a Multiple Pose (MP) dataset of low quality videos containing multiple extreme poses is built. The proposed framework decomposes the input video into frames and enhances them by applying the Contrast Limited Adaptive Histogram Equalization (CLAHE) method. Next, faces are detected from enhanced frames and frontalized the multiple poses using the face frontalization Generative Adversarial Network (FF-GAN). After face frontalization, the mouth region is extracted. The extracted mouth region in the whole video and its respective sentences are then provided to the ResNet during the training process. The proposed framework achieved a sentence prediction accuracy of 90% on a testing dataset containing 100 silent low-quality videos with multiple poses that are better as compared to state-of-the-art methods.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference32 articles.

1. Hearing lips and seeing voices

2. Feature analysis for automatic speechreading;Scanlon;Proceedings of the 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No. 01TH8564),2001

3. A comparison of model and transform-based visual features for audio-visual LVCSR;Matthews;Proceedings of the IEEE International Conference on Multimedia and Expo, 2001 (ICME 2001),2001

4. Comparison of low-and high-level visual features for audio-visual continuous automatic speech recognition;Aleksic;Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing,2004

5. Lip reading using optical flow and support vector machines;Shaikh;Proceedings of the 2010 3rd International Congress on Image and Signal Processing,2010

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3