1. Learning representations by backpropagating errors;Rumelhart;Nature,1986
2. ADADELTA: an adaptive learning rate method;Zeiler,2012
3. Preventing over-fitting during model selection via Bayesian regularisation of the hyper-parameters;Cawley;J. Mach. Learn. Res.,2007
4. Gradient-based optimization of hyperparameters;Bengio;Neural Comput.,2000
5. What size neural network gives optimal generalization? Convergence properties of backpropagation;Lawrence,1996