1. Abe N, Pednault E, Wang H, et al. (2002) Empirical comparison of various reinforcement learning strategies for sequential targeted marketing. In: IEEE international conference on data mining (eds V Kumar, S Tsurnoto, N Zhong, et al.), Maebashi City, Japan, 9–12 December, pp.3–10. New York, NY: IEEE.
2. Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. In: Conference on learning theory, Vol. 23, (eds S Mannor, N Srebro and RC Williamson), Edinburgh, Scotland, 25–27 June, pp.39.1–39.26. PMLR.
3. Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. In: International conference on machine learning (eds S Dasgupta and D McAllester), Atlanta, USA, 16–21 June, pp.127–135. PMLR.
4. Learning the Minimal Representation of a Dynamic System from Transition Data
5. Beyond the Last Touch: Attribution in Online Advertising