Affiliation:
1. Texas A&M University, TX, USA
2. Rice University, TX, USA
3. Samsung Electronics, USA
Abstract
Real-time bidding (RTB) has become a major paradigm of display advertising. Each ad impression generated from a user visit is auctioned in real time, where demand-side plat- form (DSP) automatically provides bid price usually relying on the ad impression value estimation and the optimal bid price determination. However, the current bid strategy over- looks the randomness of the user behaviors (e.g., click) and the cost uncertainty caused by the auction competition. In this work, we propose a novel adaptive risk-aware bidding algorithm with budget constraint via reinforcement learn- ing, which is the rst to simultaneously consider estimation uncertainty and the dynamic risk tendency of a DSP. Specif- ically, we explicitly factor in the uncertainty of estimated ad impression values and model the risk preference of a DSP under a speci c state and market environment via a sequen- tial decision process. Additionally, we theoretically unveil the intrinsic relation between the uncertainty and the risk tendency based on value at risk (VaR). Consequently, we propose two instantiations to model risk tendency, includ- ing an expert knowledge-based formulation embracing three essential properties and an adaptive learning method based on self-supervised reinforcement learning. We conduct ex- periments on public datasets and show that the proposed framework achieves better performance in terms of the num- ber of clicks under di erent budget constraints 1.
Publisher
Association for Computing Machinery (ACM)
Reference41 articles.
1. S. A. Armstrong . A meta-analysis of randomness in human behavioral research . Louisiana State University and Agricultural & Mechanical College , 2004 . S. A. Armstrong. A meta-analysis of randomness in human behavioral research. Louisiana State University and Agricultural & Mechanical College, 2004.
2. A Large Scale Prediction Engine for App Install Clicks and Conversions
3. Real-Time Bidding by Reinforcement Learning in Display Advertising
4. Modeling delayed feedback in display advertising
5. Wide & Deep Learning for Recommender Systems