- Academic Search

Speichern Zitieren Zitiert von: 264 Ähnliche Artikel Alle 11 Versionen HTML-Version

Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms

C **, Q Liu, S Miryoosefi - Advances in neural information …, 2021 - proceedings.neurips.cc

Finding the minimal structural assumptions that empower sample-efficient learning is one of
the most important research directions in Reinforcement Learning (RL). This paper …

Speichern Zitieren Zitiert von: 244 Ähnliche Artikel Alle 7 Versionen HTML-Version

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

Speichern Zitieren Zitiert von: 59 Ähnliche Artikel Alle 7 Versionen HTML-Version

Nearly minimax optimal reinforcement learning for linear markov decision processes

J He, H Zhao, D Zhou, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …

Speichern Zitieren Zitiert von: 296 Ähnliche Artikel Alle 10 Versionen HTML-Version

Flambe: Structural complexity and representation learning of low rank mdps

A Agarwal, S Kakade… - Advances in neural …, 2020 - proceedings.neurips.cc

In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common
practice to make parametric assumptions where values or policies are functions of some low …

Speichern Zitieren Zitiert von: 257 Ähnliche Artikel Alle 5 Versionen HTML-Version

Learning near optimal policies with low inherent bellman error

A Zanette, A Lazaric, M Kochenderfer… - International …, 2020 - proceedings.mlr.press

We study the exploration problem with approximate linear action-value functions in episodic
reinforcement learning under the notion of low inherent Bellman error, a condition normally …

Speichern Zitieren Zitiert von: 68 Ähnliche Artikel Alle 8 Versionen HTML-Version

Unpacking reward sha**: Understanding the benefits of reward engineering on sample complexity

A Gupta, A Pacchiano, Y Zhai… - Advances in Neural …, 2022 - proceedings.neurips.cc

The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …

Speichern Zitieren Zitiert von: 76 Ähnliche Artikel Alle 4 Versionen HTML-Version

[PDF] arxiv.org

The role of coverage in online reinforcement learning

T **e, DJ Foster, Y Bai, N Jiang, SM Kakade - arxiv preprint arxiv …, 2022 - arxiv.org

Coverage conditions--which assert that the data logging distribution adequately covers the
state space--play a fundamental role in determining the sample complexity of offline …

Speichern Zitieren Zitiert von: 183 Ähnliche Artikel Alle 6 Versionen HTML-Version

Reinforcement learning with general value function approximation: Provably efficient approach via bounded eluder dimension

R Wang, RR Salakhutdinov… - Advances in Neural …, 2020 - proceedings.neurips.cc

Value function approximation has demonstrated phenomenal empirical success in
reinforcement learning (RL). Nevertheless, despite a handful of recent progress on …