Thompson sampling algorithms for mean-variance bandits
The multi-armed bandit (MAB) problem is a classical learning task that exemplifies the
exploration-exploitation tradeoff. However, standard formulations do not take into account …
exploration-exploitation tradeoff. However, standard formulations do not take into account …
Concentration of risk measures: A Wasserstein distance approach
Known finite-sample concentration bounds for the Wasserstein distance between the
empirical and true distribution of a random variable are used to derive a two-sided …
empirical and true distribution of a random variable are used to derive a two-sided …
[PDF][PDF] Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards.
Classical multi-armed bandit problems use the expected value of an arm as a metric to
evaluate its goodness. However, the expected value is a risk-neutral metric. In many …
evaluate its goodness. However, the expected value is a risk-neutral metric. In many …
-Armed Bandits: Optimizing Quantiles, CVaR and Other Risks
We propose and analyze StoROO, an algorithm for risk optimization on stochastic black-box
functions derived from StoOO. Motivated by risk-averse decision making fields like …
functions derived from StoOO. Motivated by risk-averse decision making fields like …
Estimation of spectral risk measures
We consider the problem of estimating a spectral risk measure (SRM) from iid samples, and
propose a novel method that is based on numerical integration. We show that our SRM …
propose a novel method that is based on numerical integration. We show that our SRM …
Risk-averse explore-then-commit algorithms for finite-time bandits
In this paper, we study multi-armed bandit problems in an explore-then-commit setting. In
our proposed explore-then-commit setting, the goal is to identify the best arm after a pure …
our proposed explore-then-commit setting, the goal is to identify the best arm after a pure …
A cost–based analysis for risk–averse explore–then–commit finite–time bandits
In this article, a multi–armed bandit problem is studied in an explore–then–commit setting
where the cost of pulling an arm in the experimentation (exploration) phase may not be …
where the cost of pulling an arm in the experimentation (exploration) phase may not be …
Risk averse non-stationary multi-armed bandits
This paper tackles the risk averse multi-armed bandits problem when incurred losses are
non-stationary. The conditional value-at-risk (CVaR) is used as the objective function. Two …
non-stationary. The conditional value-at-risk (CVaR) is used as the objective function. Two …
Risk-averse multi-armed bandits and game theory
A Yekkehkhany - 2020 - ideals.illinois.edu
The multi-armed bandit (MAB) and game theory literature is mainly focused on the expected
cumulative reward and the expected payoffs in a game, respectively. In contrast, the rewards …
cumulative reward and the expected payoffs in a game, respectively. In contrast, the rewards …
Online Resource Allocation and its Applications
Q Zhu - 2022 - search.proquest.com
Online Resource Allocation and its Applications Page 1 ONLINE RESOURCE
ALLOCATION AND ITS APPLICATIONS by QIUYU ZHU (BS, University of Science and …
ALLOCATION AND ITS APPLICATIONS by QIUYU ZHU (BS, University of Science and …