Speechreading using Modified Visual Feature Vectors-Reference-Cited by-同舟云学术

Speechreading using Modified Visual Feature Vectors

Published:2013 Issue: Volume: Page:292-315
ISSN:
Container-title:Emerging Applications of Natural Language Processing
language:
Short-container-title:

Author:

Singh Preety¹,Laxmi Vijay¹,Gaur M. S.¹

Affiliation:

1. Malaviya National Institute of Technology, India

Abstract

Audio-Visual Speech Recognition (AVSR) is an emerging technology that helps in improved machine perception of speech by taking into account the bimodality of human speech. Automated speech is inspired from the fact that human beings subconsciously use visual cues to interpret speech. This chapter surveys the techniques for audio-visual speech recognition. Through this survey, the authors discuss the steps involved in a robust mechanism for perception of speech for human-computer interaction. The main emphasis is on visual speech recognition taking only the visual cues into account. Previous research has shown that visual-only speech recognition systems pose many challenges. The authors present a speech recognition system where only the visual modality is used for recognition of the spoken word. Significant features are extracted from lip images. These features are used to build n-gram feature vectors. Classification of speech using these modified feature vectors results in improved accuracy of the spoken word.

Publisher

IGI Global

Reference51 articles.

1. Discrete Cosine Transform

2. Alizadeh, S., Boostani, R., & Asadpour, V. (2008). Lip feature extraction and reduction for HMM-based visual speech recognition systems. In Proceedings of the 9th International Conference on Signal Processing (ICSP 2008), (pp. 561-564). ICSP.

3. Audiovisual speech processing

4. Chen, Q. C., Deng, G. H., Wang, X. L., & Huang, H. J. (2006). An inner contour based lip moving feature extraction method for Chinese speech. In Proceedings of the International Conference on Machine Learning and Cybernetics, (pp. 3859-3864). IEEE.