1. Apprenticeship learning via inverse reinforcement learning;Abbeel,2004
2. Reinforcement learning based recommender systems: A survey;Afsar;ACM Computing Surveys,2021
3. An optimistic perspective on offline reinforcement learning;Agarwal,2020
4. Agrawal, S., & Goyal, N. (2012). Analysis of Thompson Sampling for the Multi-armed Bandit Problem. In COLT 2012 - the 25th annual conference on learning theory (pp. 39.1–39.26).
5. Agrawal, S., & Goyal, N. (2013). Thompson Sampling for Contextual Bandits with Linear Payoffs. In ICML (3) (pp. 127–135).