Author:
Sinha Shweta,Agrawal S S,Jain Aruna
Abstract
Abstract
State of the art automatic speech recognition system uses Mel frequency cepstral coefficients as feature extractor along with Gaussian mixture model for acoustic modeling but there is no standard value to assign number of mixture component in speech recognition process.Current choice of mixture component is arbitrary with little justification. Also the standard set for European languages can not be used in Hindi speech recognition due to mismatch in database size of the languages.Parameter estimation with too many or few component may inappropriately estimate the mixture model. Therefore, number of mixture is important for initial estimation of expectation maximization process. In this research work, the authors estimate number of Gaussian mixture component for Hindi database based upon the size of vocabulary.Mel frequency cepstral feature and perceptual linear predictive feature along with its extended variations with delta-delta-delta feature have been used to evaluate this number based on optimal recognition score of the system . Comparitive analysis of recognition performance for both the feature extraction methods on medium size Hindi database is also presented in this paper.HLDA has been used as feature reduction technique and also its impact on the recognition score has been highlighted.
Publisher
Springer Science and Business Media LLC
Reference24 articles.
1. Joseph Picone, Signal Modeling Techniques in Speech Recognition, Proc IEEE June1993
2. H Hermansky, Perceptual linear predictive(PLP) analysis of speech, J. Acoustic Society of America, Vol 87,No 4, 1990
3. András Zolnay, Daniil Kocharov, Ralf Schlüter, Hermann Ney, Using Multiple Acoustic Feature Sets for Speech Recognition, Speech Communication, April 2007.
4. Zolnay, R. Schlüter, and H. Ney, Robust speech recognition using a voiced-unvoiced feature, in Proc. Int. Conf. on Spoken Language Processing, Denver, CO, vol 2,Sept. 2002.
5. Alberto Abad, Thomas Pellegrini, Isabel Trancoso and Jo˜ao Neto, Context dependent modeling approaches for hybrid speech recognizers, INTERSPEECH 2010, 26-30 September 2010, Makuhari, Chiba, Japan
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献