Byol-explore: Exploration by bootstrapped prediction

Z Guo, S Thakoor, M Pîslar… - Advances in neural …, 2022 - proceedings.neurips.cc
We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven
exploration in visually complex environments. BYOL-Explore learns the world …

Model-free representation learning and exploration in low-rank mdps

A Modi, J Chen, A Krishnamurthy, N Jiang… - Journal of Machine …, 2024 - jmlr.org
The low-rank MDP has emerged as an important model for studying representation learning
and exploration in reinforcement learning. With a known representation, several model-free …

Fast active learning for pure exploration in reinforcement learning

P Ménard, OD Domingues, A Jonsson… - International …, 2021 - proceedings.mlr.press
Realistic environments often provide agents with very limited feedback. When the
environment is initially unknown, the feedback, in the beginning, can be completely absent …

Reward is enough for convex mdps

T Zahavy, B O'Donoghue… - Advances in Neural …, 2021 - proceedings.neurips.cc
Maximising a cumulative reward function that is Markov and stationary, ie, defined over state-
action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov …

Unified algorithms for rl with decision-estimation coefficients: No-regret, pac, and reward-free learning

F Chen, S Mei, Y Bai - arxiv preprint arxiv:2209.11745, 2022 - arxiv.org
Finding unified complexity measures and algorithms for sample-efficient learning is a central
topic of research in reinforcement learning (RL). The Decision-Estimation Coefficient (DEC) …

Policy finetuning in reinforcement learning via design of experiments using offline data

R Zhang, A Zanette - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …

On the statistical efficiency of reward-free exploration in non-linear rl

J Chen, A Modi, A Krishnamurthy… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study reward-free reinforcement learning (RL) under general non-linear function
approximation, and establish sample efficiency and hardness results under various standard …

The challenges of exploration for offline reinforcement learning

N Lambert, M Wulfmeier, W Whitney, A Byravan… - arxiv preprint arxiv …, 2022 - arxiv.org
Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked
processes of reinforcement learning: collecting informative experience and inferring optimal …

Drm: Mastering visual reinforcement learning through dormant ratio minimization

G Xu, R Zheng, Y Liang, X Wang, Z Yuan, T Ji… - arxiv preprint arxiv …, 2023 - arxiv.org
Visual reinforcement learning (RL) has shown promise in continuous control tasks. Despite
its progress, current algorithms are still unsatisfactory in virtually every aspect of the …

Provably efficient reward-agnostic navigation with linear value iteration

A Zanette, A Lazaric… - Advances in Neural …, 2020 - proceedings.neurips.cc
There has been growing progress on theoretical analyses for provably efficient learning in
MDPs with linear function approximation, but much of the existing work has made strong …