- Academic Search

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

Zapisz Cytuj Cytowane przez 245 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]

[PDF] neurips.cc

Provable benefits of actor-critic methods for offline reinforcement learning

A Zanette, MJ Wainwright… - Advances in neural …, 2021 - proceedings.neurips.cc

Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …

Zapisz Cytuj Cytowane przez 145 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

[Free GPT-4]

[PDF] mlr.press

Nearly minimax optimal reinforcement learning for linear markov decision processes

J He, H Zhao, D Zhou, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …

Zapisz Cytuj Cytowane przez 60 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]

[PDF] mlr.press

Learning near optimal policies with low inherent bellman error

A Zanette, A Lazaric, M Kochenderfer… - International …, 2020 - proceedings.mlr.press

We study the exploration problem with approximate linear action-value functions in episodic
reinforcement learning under the notion of low inherent Bellman error, a condition normally …

Zapisz Cytuj Cytowane przez 257 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

[Free GPT-4]

[PDF] mlr.press

Leveraging offline data in online reinforcement learning

A Wagenmaker, A Pacchiano - International Conference on …, 2023 - proceedings.mlr.press

Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …

Zapisz Cytuj Cytowane przez 48 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]

[PDF] neurips.cc

Unpacking reward sha**: Understanding the benefits of reward engineering on sample complexity

A Gupta, A Pacchiano, Y Zhai… - Advances in Neural …, 2022 - proceedings.neurips.cc

The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …

Zapisz Cytuj Cytowane przez 68 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

[Free GPT-4]

[PDF] neurips.cc

Pc-pg: Policy cover directed exploration for provable policy gradient learning

A Agarwal, M Henaff, S Kakade… - Advances in neural …, 2020 - proceedings.neurips.cc

Direct policy gradient methods for reinforcement learning are a successful approach for a
variety of reasons: they are model free, they directly optimize the performance metric of …

Zapisz Cytuj Cytowane przez 146 Powiązane artykuły Wszystkie wersje 11 Wersja HTML

[Free GPT-4]

[PDF] mlr.press

Reward-free rl is no harder than reward-aware rl in linear markov decision processes

AJ Wagenmaker, Y Chen… - International …, 2022 - proceedings.mlr.press

Reward-free reinforcement learning (RL) considers the setting where the agent does not
have access to a reward function during exploration, but must propose a near-optimal policy …

Zapisz Cytuj Cytowane przez 67 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Optimism in reinforcement learning with generalized linear function approximation

Y Wang, R Wang, SS Du, A Krishnamurthy - arxiv preprint arxiv …, 2019 - arxiv.org

We design a new provably efficient algorithm for episodic reinforcement learning with
generalized linear function approximation. We analyze the algorithm under a new …

Zapisz Cytuj Cytowane przez 184 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] mlr.press

Learning zero-sum simultaneous-move markov games using function approximation and correlated equilibrium

Q **e, Y Chen, Z Wang, Z Yang - Conference on learning …, 2020 - proceedings.mlr.press

In this work, we develop provably efficient reinforcement learning algorithms for two-player
zero-sum Markov games with simultaneous moves. We consider a family of Markov games …

Zapisz Cytuj Cytowane przez 161 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Frequentist regret bounds for randomized least-squares value iteration

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

Provable benefits of actor-critic methods for offline reinforcement learning

Nearly minimax optimal reinforcement learning for linear markov decision processes

Learning near optimal policies with low inherent bellman error

Leveraging offline data in online reinforcement learning

Unpacking reward sha**: Understanding the benefits of reward engineering on sample complexity

Pc-pg: Policy cover directed exploration for provable policy gradient learning

Reward-free rl is no harder than reward-aware rl in linear markov decision processes

Optimism in reinforcement learning with generalized linear function approximation

Learning zero-sum simultaneous-move markov games using function approximation and correlated equilibrium