1. T Chih, P Ru, S Shamma, Multiresolution spectrotemporal analysis of complex sounds. J Acoust Soc Am. 118, 887–906 (2005).
2. Y Lecun, Y Bengio, in The Handbook of Brain Theory and Neural Networks, ed. by MA Arbib. Convolutional networks for images, speech and time series (MIT PressCambridge, 1995), pp. 255–258.
3. A Mohamed, GE Dahl, G Hinton, Acoustic modeling using deep belief networks. IEEE Trans ASLP. 20(1), 14–22 (2012).
4. GE Dahl, D Yu, L Deng, A Acero, Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans ASLP. 20(1), 30–42 (2012).
5. F Seide, G Li, L Chen, D Yu, in Proc ASRU. Feature engineering in context-dependent deep neural networks for conversational speech transcription, (2011), pp. 24–29.