Академия Google

O Besbes, Y Gur, A Zeevi - Advances in neural information …, 2014 - proceedings.neurips.cc

In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play
one of K arms, each characterized by an unknown reward distribution. Reward realizations …

Сохранить Цитировать Цитируется: 604 Похожие статьи Все версии статьи (16) Поиск в библиотеках В виде HTML

[Free GPT-4]

[PDF] mlr.press

Learning to optimize under non-stationarity

WC Cheung, D Simchi-Levi… - The 22nd International …, 2019 - proceedings.mlr.press

We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-
stationary linear stochastic bandit setting. It captures natural applications such as dynamic …

Сохранить Цитировать Цитируется: 171 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]

[PDF] mlr.press

Reinforcement learning for non-stationary markov decision processes: The blessing of (more) optimism

WC Cheung, D Simchi-Levi… - … conference on machine …, 2020 - proceedings.mlr.press

We consider un-discounted reinforcement learning (RL) in Markov decision processes
(MDPs) under drifting non-stationarity,\ie, both the reward and state transition distributions …

Сохранить Цитировать Цитируется: 120 Похожие статьи Все версии статьи (7) В виде HTML

[Free GPT-4]

[PDF] mlr.press

A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free

Y Chen, CW Lee, H Luo… - Conference on Learning …, 2019 - proceedings.mlr.press

We propose the first contextual bandit algorithm that is parameter-free, efficient, and optimal
in terms of dynamic regret. Specifically, our algorithm achieves $\mathcal {O}(\min\{\sqrt …

Сохранить Цитировать Цитируется: 145 Похожие статьи Все версии статьи (7) В виде HTML

[Free GPT-4]

[PDF] mlr.press

Near-optimal model-free reinforcement learning in non-stationary episodic mdps

W Mao, K Zhang, R Zhu… - … on Machine Learning, 2021 - proceedings.mlr.press

We consider model-free reinforcement learning (RL) in non-stationary Markov decision
processes. Both the reward functions and the state transition functions are allowed to vary …

Сохранить Цитировать Цитируется: 44 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]

[PDF] arxiv.org

Hedging the drift: Learning to optimize under nonstationarity

WC Cheung, D Simchi-Levi, R Zhu - Management Science, 2022 - pubsonline.informs.org

We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic
regret bounds for a collection of nonstationary stochastic bandit settings. These settings …

Сохранить Цитировать Цитируется: 125 Похожие статьи Все версии статьи (11)

[Free GPT-4]

[PDF] mlr.press

Efficient contextual bandits in non-stationary worlds

H Luo, CY Wei, A Agarwal… - Conference On Learning …, 2018 - proceedings.mlr.press

Most contextual bandit algorithms minimize regret against the best fixed policy, a
questionable benchmark for non-stationary environments that are ubiquitous in applications …

Сохранить Цитировать Цитируется: 140 Похожие статьи Все версии статьи (7) В виде HTML

[Free GPT-4]

[PDF] neurips.cc

Dynamic regret of policy optimization in non-stationary environments

Y Fei, Z Yang, Z Wang, Q **e - Advances in Neural …, 2020 - proceedings.neurips.cc

We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information
reward feedback and unknown fixed transition kernels. We propose two model-free policy …

Сохранить Цитировать Цитируется: 56 Похожие статьи Все версии статьи (7) В виде HTML

[Free GPT-4]

[PDF] neurips.cc

Non-stationary experimental design under linear trends

D Simchi-Levi, C Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Experimentation has been critical and increasingly popular across various domains, such as
clinical trials and online platforms, due to its widely recognized benefits. One of the primary …

Сохранить Цитировать Цитируется: 6 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]

[PDF] mlr.press

Non-stationary reinforcement learning under general function approximation

S Feng, M Yin, R Huang, YX Wang… - International …, 2023 - proceedings.mlr.press

General function approximation is a powerful tool to handle large state and action spaces in
a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding …

Сохранить Цитировать Цитируется: 5 Похожие статьи Все версии статьи (9) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Multi-armed bandits: Competing with optimal sequences

Stochastic multi-armed-bandit problem with non-stationary rewards

Learning to optimize under non-stationarity

Reinforcement learning for non-stationary markov decision processes: The blessing of (more) optimism

A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free

Near-optimal model-free reinforcement learning in non-stationary episodic mdps

Hedging the drift: Learning to optimize under nonstationarity

Efficient contextual bandits in non-stationary worlds

Dynamic regret of policy optimization in non-stationary environments

Non-stationary experimental design under linear trends

Non-stationary reinforcement learning under general function approximation