Provably efficient exploration in policy optimization

Q Cai, Z Yang, C **, Z Wang - International Conference on …, 2020 - proceedings.mlr.press
While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

[書籍][B] Algorithms for reinforcement learning

C Szepesvári - 2022 - books.google.com
Reinforcement learning is a learning paradigm concerned with learning to control a system
so as to maximize a numerical performance measure that expresses a long-term objective …

Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach

CY Wei, H Luo - Conference on learning theory, 2021 - proceedings.mlr.press
We propose a black-box reduction that turns a certain reinforcement learning algorithm with
optimal regret in a (near-) stationary environment into another algorithm with optimal …

A unified view of entropy-regularized markov decision processes

G Neu, A Jonsson, V Gómez - arxiv preprint arxiv:1705.07798, 2017 - arxiv.org
We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …

Corruption-robust offline reinforcement learning with general function approximation

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc
We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …

Learning adversarial markov decision processes with bandit feedback and unknown transition

C **, T **, H Luo, S Sra, T Yu - International Conference on …, 2020 - proceedings.mlr.press
We consider the task of learning in episodic finite-horizon Markov decision processes with
an unknown transition function, bandit feedback, and adversarial losses. We propose an …

Online convex optimization in adversarial markov decision processes

A Rosenberg, Y Mansour - International Conference on …, 2019 - proceedings.mlr.press
We consider online learning in episodic loop-free Markov decision processes (MDPs),
where the loss function can change arbitrarily between episodes, and the transition function …

Corruption-robust algorithms with uncertainty weighting for nonlinear contextual bandits and markov decision processes

C Ye, W **ong, Q Gu, T Zhang - International Conference on …, 2023 - proceedings.mlr.press
Despite the significant interest and progress in reinforcement learning (RL) problems with
adversarial corruption, current works are either confined to the linear setting or lead to an …

Combinatorial pure exploration of multi-armed bandits

S Chen, T Lin, I King, MR Lyu… - Advances in neural …, 2014 - proceedings.neurips.cc
We study the {\em combinatorial pure exploration (CPE)} problem in the stochastic multi-
armed bandit setting, where a learner explores a set of arms with the objective of identifying …

A model selection approach for corruption robust reinforcement learning

CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press
We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …