1. Kaldi Toolkit: http://kaldi.sourceforge.net .
2. Dahl, G., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Transactions on Audio Speech, and Language Processing (receiving 2013 IEEE SPS Best Paper Award), 20(1), 30–42.
3. Deepak, K.T., Sarma, B.D., & Prasanna, S.R.M. (2012). Foreground speech segmentation using zero frequency filtered signal. In Proc. Interspeech.
4. Glass, J.R. (1999). Challanges for spoken dialogue systems. In Proc. IEEE ASRU workshop.
5. Hinton, G.E., Deng, L., Yu, D., Dahl, G., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition. Signal Processing Magazine.