Stochastic multi-armed-bandit problem with non-stationary rewards
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play
one of K arms, each characterized by an unknown reward distribution. Reward realizations …
one of K arms, each characterized by an unknown reward distribution. Reward realizations …
Learning to optimize under non-stationarity
We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-
stationary linear stochastic bandit setting. It captures natural applications such as dynamic …
stationary linear stochastic bandit setting. It captures natural applications such as dynamic …
Reinforcement learning for non-stationary markov decision processes: The blessing of (more) optimism
We consider un-discounted reinforcement learning (RL) in Markov decision processes
(MDPs) under drifting non-stationarity,\ie, both the reward and state transition distributions …
(MDPs) under drifting non-stationarity,\ie, both the reward and state transition distributions …
A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free
We propose the first contextual bandit algorithm that is parameter-free, efficient, and optimal
in terms of dynamic regret. Specifically, our algorithm achieves $\mathcal {O}(\min\{\sqrt …
in terms of dynamic regret. Specifically, our algorithm achieves $\mathcal {O}(\min\{\sqrt …
Near-optimal model-free reinforcement learning in non-stationary episodic mdps
We consider model-free reinforcement learning (RL) in non-stationary Markov decision
processes. Both the reward functions and the state transition functions are allowed to vary …
processes. Both the reward functions and the state transition functions are allowed to vary …
Hedging the drift: Learning to optimize under nonstationarity
We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic
regret bounds for a collection of nonstationary stochastic bandit settings. These settings …
regret bounds for a collection of nonstationary stochastic bandit settings. These settings …
Efficient contextual bandits in non-stationary worlds
Most contextual bandit algorithms minimize regret against the best fixed policy, a
questionable benchmark for non-stationary environments that are ubiquitous in applications …
questionable benchmark for non-stationary environments that are ubiquitous in applications …
Dynamic regret of policy optimization in non-stationary environments
We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information
reward feedback and unknown fixed transition kernels. We propose two model-free policy …
reward feedback and unknown fixed transition kernels. We propose two model-free policy …
Non-stationary experimental design under linear trends
D Simchi-Levi, C Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc
Experimentation has been critical and increasingly popular across various domains, such as
clinical trials and online platforms, due to its widely recognized benefits. One of the primary …
clinical trials and online platforms, due to its widely recognized benefits. One of the primary …
Non-stationary reinforcement learning under general function approximation
General function approximation is a powerful tool to handle large state and action spaces in
a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding …
a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding …