Audio and Visual Speech Recognition Recent Trends-Reference-Cited by-同舟云学术

Audio and Visual Speech Recognition Recent Trends

Published: Issue: Volume: Page:42-86
ISSN:
Container-title:Intelligent Image and Video Interpretation
language:
Short-container-title:

Author:

Wei Lee Hao¹,Phooi Seng Kah¹,Li-Minn Ang²

Affiliation:

1. Sunway University, Malaysia

2. Edith Cowan University, Australia

Abstract

This chapter focuses on a brief introduction on the origins of the audio-visual speech recognition process and relevant techniques often used by researchers in the field. Brief background theory regarding commonly used methods for feature extraction and classification for both audio and visual processing are discussed with highlights pertaining to Mel-Frequency Cepstral Coefficient, and contour/geometric based lips feature extraction with corresponding tracking methods (Yingjie, Haiyan, Yingjie, & Jinyang, 2011; Liu & Cheung, 2011). Proposed solution concepts will include time derivatives of mel-frequency cepstral coefficients for audio feature extraction, Chroma-colour-based (YCbCr) Face segmentation, Feature Point extraction, Localized Active Contour tracking algorithm, and Hidden Markov Models with Vitebri algorithm incorporated. Information contained in this chapter focuses on being informative for novice speech processing candidates but insufficient mastery knowledge. Additional suggested reading materials should assist in expediting field mastery.

Publisher

IGI Global

Reference82 articles.

1. A, B., & Hogg, D. (1994). Learning flexible models from image sequence. In Proceedings in Euro Conference Computing Visual Journal, (pp. 299-308). IEEE.

2. Speech recognition by machine: A review.;M. A.Anusuya;International Journal of Computer Science and Information Security,2009

3. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking

4. An automatic face detection system for RGB images.;T.Barbu;International Journal of Computers, Communications & Control,2011

5. Active Contours