Affiliation:
1. IBM Research
2. Orange Labs
3. IBM
4. University of Montreal
Abstract
In various recommender system applications, from medical diagnosis to dialog systems, due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration; however, the agent has a freedom to choose which variables to observe. In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS), which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets.
Publisher
International Joint Conferences on Artificial Intelligence Organization
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Tutorial on Multi-Armed Bandit Applications for Large Language Models;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24
2. Mixtron: Bandit Online Multiclass Prediction with Implicit Feedback;2023 IEEE International Conference on Data Mining (ICDM);2023-12-01
3. Dialogue System with Missing Observation;ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2023-06-04
4. Question Answering System with Sparse and Noisy Feedback;ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2023-06-04