Variance-Minimizing Augmentation Logging for Counterfactual Evaluation in Contextual Bandits-Reference-Cited by-同舟云学术

Variance-Minimizing Augmentation Logging for Counterfactual Evaluation in Contextual Bandits

Published:2023-02-27 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining
language:
Short-container-title:

Author:

Tucker Aaron D.¹^ORCID,Joachims Thorsten¹^ORCID

Affiliation:

1. Cornell University, Ithaca, NY, USA

Funder

NSF

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3539597.3570452

Reference32 articles.

1. A. Agarwal , S. Basu , T. Schnabel , and T. Joachims . 2017. Effective Evaluation using Logged Bandit Feedback from Multiple Loggers . In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). A. Agarwal, S. Basu, T. Schnabel, and T. Joachims. 2017. Effective Evaluation using Logged Bandit Feedback from Multiple Loggers. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).

2. Alekh Agarwal , Daniel Hsu , Satyen Kale , John Langford , Lihong Li , and Robert Schapire . 2014 . Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits . In Proceedings of the 31st International Conference on Machine Learning (Proceedings of Machine Learning Research , Vol. 32), Eric P. Xing and Tony Jebara (Eds.). PMLR, Bejing, China, 1638-- 1646 . http://proceedings.mlr.press/v32/agarwalb14.html Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. 2014. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. In Proceedings of the 31st International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 32), Eric P. Xing and Tony Jebara (Eds.). PMLR, Bejing, China, 1638--1646. http://proceedings.mlr.press/v32/agarwalb14.html

3. Shipra Agrawal and Navin Goyal . 2013 . Thompson Sampling for Contextual Bandits with Linear Payoffs . In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research , Vol. 28), Sanjoy Dasgupta and David McAllester (Eds.). PMLR, Atlanta, Georgia, USA, 127-- 135 . http://proceedings.mlr.press/v28/agrawal13.html Shipra Agrawal and Navin Goyal. 2013. Thompson Sampling for Contextual Bandits with Linear Payoffs. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 28), Sanjoy Dasgupta and David McAllester (Eds.). PMLR, Atlanta, Georgia, USA, 127--135. http://proceedings.mlr.press/v28/agrawal13.html

4. Alina Beygelzimer and John Langford. 2009. The offset tree for learning with partial labels. In KDD. ACM 129--138. Alina Beygelzimer and John Langford. 2009. The offset tree for learning with partial labels. In KDD. ACM 129--138.

5. Alberto Bietti , Alekh Agarwal , and John Langford . 2018. A contextual bandit bake-off. arXiv preprint arXiv:1802.04064 ( 2018 ). Alberto Bietti, Alekh Agarwal, and John Langford. 2018. A contextual bandit bake-off. arXiv preprint arXiv:1802.04064 (2018).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top- n Recommendation;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24