Sparse coding of the modulation spectrum for noise-robust automatic speech recognition-Reference-Cited by-同舟云学术

Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

Published:2014-10-21 Issue:1 Volume:2014 Page:
ISSN:1687-4722
Container-title:EURASIP Journal on Audio, Speech, and Music Processing
language:en
Short-container-title:J AUDIO SPEECH MUSIC PROC.

Author:

Ahmadi Sara,Ahadi Seyed Mohammad,Cranen Bert,Boves Lou

Abstract

Abstract The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of short-time power spectra, mainly to simplify the computation of the posterior probability that a frame of an unknown speech signal is related to a specific state. In this paper we use the raw output of a modulation spectrum analyser in combination with sparse coding as a means for obtaining state posterior probabilities. The modulation spectrum analyser uses 15 gammatone filters. The Hilbert envelope of the output of these filters is then processed by nine modulation frequency filters, with bandwidths up to 16 Hz. Experiments using the AURORA-2 task show that the novel approach is promising. We found that the representation of medium-term dynamics in the modulation spectrum analyser must be improved. We also found that we should move towards sparse classification, by modifying the cost function in sparse coding such that the class(es) represented by the exemplars weigh in, in addition to the accuracy with which unknown observations are reconstructed. This creates two challenges: (1) developing a method for dictionary learning that takes the class occupancy of exemplars into account and (2) developing a method for learning a mapping from exemplar activations to state posterior probabilities that keeps the generalization to unseen conditions that is one of the strongest advantages of sparse coding.

Publisher

Springer Science and Business Media LLC

Subject

Electrical and Electronic Engineering,Acoustics and Ultrasonics

Link

http://link.springer.com/content/pdf/10.1186/s13636-014-0036-3.pdf

Reference41 articles.

1. Drullman R, Festen JM, Plomp R: Effect of temporal envelope smearing on speech reception. J. Acoust. Soc. Am 1994, 95: 1053-1064. 10.1121/1.408467

2. H Hermansky, in Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding. The modulation spectrum in the automatic recognition of speech (Santa Barbara, 14–17 December 1997), pp. 140–147.

3. Xiao X, Chng ES, Li H: Normalization of the speech modulation spectra for robust speech recognition. IEEE Trans. Audio Speech Lang. Process 2008, 16(8):1662-1674. 10.1109/TASL.2008.2002082

4. JK Thompson, LE Atlas, in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, 5. A non-uniform modulation transform for audio coding with increased time resolution (Hong Kong, 6–10 April 2003), pp. 397–400.

5. Paliwal K, Schwerin B, Wójcicki K: Role of modulation magnitude and phase spectrum towards speech intelligibility. Speech Commun 2011, 53(3):327-339. 10.1016/j.specom.2010.10.004

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Acoustic scene classification with multi-temporal complex modulation spectrogram features and a convolutional LSTM network;Multimedia Tools and Applications;2022-11-11

2. Complementary models for audio-visual speech classification;International Journal of Speech Technology;2022-01-07

3. Early-Onset Familial Alzheimer Disease Variant PSEN2 N141I Heterozygosity is Associated with Altered Microglia Phenotype;Journal of Alzheimer's Disease;2020-09-15

4. Disambiguating Conflicting Classification Results in AVSR;Intelligent Speech Signal Processing;2019

5. Robust front-end for audio, visual and audio–visual speech classification;International Journal of Speech Technology;2018-04-13