1. Aiolli, F., & Sperduti, A. (2010). A preference optimization based unifying framework for supervised learning problems. In J. Fürnkranz & E. Hüllermeier (Eds.), Preference learning (pp. 19–42). Berlin: Springer.
2. Audibert, J. Y., Munos, R., & Szepesvári, C. (2009). Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19), 1876–1902.
3. Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. (1995). Gambling in a rigged casino: the adversarial multi-armed bandit problem. In Proceedings of the 36th annual symposium on foundations of computer science (pp. 322–331). Los Alamitos: IEEE Computer Society Press.
4. Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. (2002). The non-stochastic multi-armed bandit problem. SIAM Journal on Computing, 32(1), 48–77.
5. Benbouzid, D., Busa-Fekete, R., & Kégl, B. (2011). MDDAG: learning deep decision DAGs in a Markov decision process setup. In NIPS’11 workshop on deep learning and unsupervised feature learning.