Affiliation:
1. Malaviya National Institute of Technology, India
Abstract
Audio-Visual Speech Recognition (AVSR) is an emerging technology that helps in improved machine perception of speech by taking into account the bimodality of human speech. Automated speech is inspired from the fact that human beings subconsciously use visual cues to interpret speech. This chapter surveys the techniques for audio-visual speech recognition. Through this survey, the authors discuss the steps involved in a robust mechanism for perception of speech for human-computer interaction. The main emphasis is on visual speech recognition taking only the visual cues into account. Previous research has shown that visual-only speech recognition systems pose many challenges. The authors present a speech recognition system where only the visual modality is used for recognition of the spoken word. Significant features are extracted from lip images. These features are used to build n-gram feature vectors. Classification of speech using these modified feature vectors results in improved accuracy of the spoken word.