Abstract
We consider a risk-aware multi-armed bandit framework with the goal of avoiding catastrophic risk. Such a framework has multiple applications in financial risk management. We introduce a new conditional value-at-risk (CVaR) estimation procedure combining extreme value theory with automated threshold selection by ordered goodness-of-fit tests, and we apply this procedure to a pure exploration best-arm identification problem under a fixed budget. We empirically compare our results with the commonly used sample average estimator of the CVaR, and we show a significant performance improvement when the underlying arm distributions are heavy-tailed.
Funder
Natural Sciences and Engineering Research Council