1. Joshua Achiam David Held Aviv Tamar and Pieter Abbeel. 2017. Constrained policy optimization. arXiv preprint arXiv:1705.10528 (2017). Joshua Achiam David Held Aviv Tamar and Pieter Abbeel. 2017. Constrained policy optimization. arXiv preprint arXiv:1705.10528 (2017).
2. Peter Auer Nicolo Cesa-Bianchi and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning Vol. 47 2--3 (2002) 235--256. 10.1023/A:1013689704352 Peter Auer Nicolo Cesa-Bianchi and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning Vol. 47 2--3 (2002) 235--256. 10.1023/A:1013689704352
3. Yoshua Bengio Jean-Sébastien Senécal et almbox. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling.. In AISTATS . 1--9. Yoshua Bengio Jean-Sébastien Senécal et almbox. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling.. In AISTATS . 1--9.