1. Learning deep architectures for AI;Bengio;Foundations and Trends in Machine Learning,2009
2. Learning long-term dependencies with gradient descent is difficult;Bengio;IEEE Transactions on Neural Networks,1994
3. Pattern Recognition and Machine Learning;Bishop,2006
4. Supervised and unsupervised co-training of adaptive activation functions in neural nets;Castelli,2012
5. Semi-unsupervised weighted maximum-likelihood estimation of joint densities for the co-training of adaptive activation functions;Castelli,2012