1. Alim, S. A., & Rashid, N. K. A. (2018). Some commonly used speech feature extraction algorithms (pp. 2–19). IntechOpen.
2. Chen, M., & Zhao, X. (2020). A multi-scale fusion framework for bimodal speech emotion recognition. In Proceedings of INTERSPEECH, October 2020 (pp. 374–378).
3. Ciresan, D. C., Meier, U., Masci, J., Gambardella, L. M., & Schmidhuber, J. (2011). High-performance neural networks for visual object classification. arXiv preprint arXiv:1102.0183
4. Edgar, J., Slama, H., Dronkers, N., Amici, S., & Luisa Gorno-Tempini, M. (2005). Apraxia of speech: An overview. Neurocase, 11(6), 427–432.
5. Eshky, A., Ribeiro, M. S., Cleland, J., Richmond, K., Roxburgh, Z., Scobbie, J., & Wrench, A. (2019). Ultrasuite: A repository of ultrasound and acoustic data from child speech therapy sessions. arXiv preprint arXiv:1907.00835