1. Amodei D, Anubhai R, Battenberg E, Case C, Casper J, Catanzaro B, Chen J, Chrzanowski M, Coates A, Diamos G, Elsen E, Engel J, Fan L, Fougner C, Hannun AY, Jun B, Han T, LeGresley P, Li X, Lin L, Narang S, Ng AY, Ozair S, Prenger R, Qian S, Raiman J, Satheesh S, Seetapun D, Sengupta S, Wang C, Yi W, Wang Z, Bo X, Xie Y, Yogatama D, Zhan J, Zhu Z (2016) Deep speech 2: End-to-end speech recognition in english and mandarin. In: Proceedings of the 33nd international conference on machine learning, pp 173–182
2. Boureau Y-L, Ponce J, LeCun Y (2010) A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th international conference on machine learning, pp 111–118
3. Bouvrie J (2006) Notes on convolutional neural networks. Neural Nets 2006:1–8
4. Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
5. Chen L, Mao X, Xue Y-L, Cheng LL (2012) Speech emotion recognition: features and classification models. Digital Signal Process 22(6):1154–1160