Thompson sampling algorithms for mean-variance bandits

Q Zhu, V Tan - International Conference on Machine …, 2020‏ - proceedings.mlr.press
The multi-armed bandit (MAB) problem is a classical learning task that exemplifies the
exploration-exploitation tradeoff. However, standard formulations do not take into account …

Concentration of risk measures: A Wasserstein distance approach

SP Bhat, P LA - Advances in neural information processing …, 2019‏ - proceedings.neurips.cc
Known finite-sample concentration bounds for the Wasserstein distance between the
empirical and true distribution of a random variable are used to derive a two-sided …

[PDF][PDF] Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards.

A Kagrecha, J Nair, KP Jagannathan - NeurIPS, 2019‏ - proceedings.neurips.cc
Classical multi-armed bandit problems use the expected value of an arm as a metric to
evaluate its goodness. However, the expected value is a risk-neutral metric. In many …

-Armed Bandits: Optimizing Quantiles, CVaR and Other Risks

L Torossian, A Garivier… - Asian Conference on …, 2019‏ - proceedings.mlr.press
We propose and analyze StoROO, an algorithm for risk optimization on stochastic black-box
functions derived from StoOO. Motivated by risk-averse decision making fields like …

Estimation of spectral risk measures

AK Pandey, LA Prashanth, SP Bhat - … of the AAAI Conference on Artificial …, 2021‏ - ojs.aaai.org
We consider the problem of estimating a spectral risk measure (SRM) from iid samples, and
propose a novel method that is based on numerical integration. We show that our SRM …

Risk-averse explore-then-commit algorithms for finite-time bandits

A Yekkehkhany, E Arian… - 2019 IEEE 58th …, 2019‏ - ieeexplore.ieee.org
In this paper, we study multi-armed bandit problems in an explore-then-commit setting. In
our proposed explore-then-commit setting, the goal is to identify the best arm after a pure …

A cost–based analysis for risk–averse explore–then–commit finite–time bandits

A Yekkehkhany, E Arian, R Nagi, I Shomorony - IISE Transactions, 2021‏ - Taylor & Francis
In this article, a multi–armed bandit problem is studied in an explore–then–commit setting
where the cost of pulling an arm in the experimentation (exploration) phase may not be …

Risk averse non-stationary multi-armed bandits

L Benac, F Godin - arxiv preprint arxiv:2109.13977, 2021‏ - arxiv.org
This paper tackles the risk averse multi-armed bandits problem when incurred losses are
non-stationary. The conditional value-at-risk (CVaR) is used as the objective function. Two …

Risk-averse multi-armed bandits and game theory

A Yekkehkhany - 2020‏ - ideals.illinois.edu
The multi-armed bandit (MAB) and game theory literature is mainly focused on the expected
cumulative reward and the expected payoffs in a game, respectively. In contrast, the rewards …

Online Resource Allocation and its Applications

Q Zhu - 2022‏ - search.proquest.com
Online Resource Allocation and its Applications Page 1 ONLINE RESOURCE
ALLOCATION AND ITS APPLICATIONS by QIUYU ZHU (BS, University of Science and …