Google Akademik

Y **, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

Kaydet Alıntı yap Alıntılanma sayısı: 450 İlgili makaleler 7 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] neurips.cc

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

P Rashidinejad, B Zhu, C Ma, J Jiao… - Advances in Neural …, 2021 - proceedings.neurips.cc

Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …

Kaydet Alıntı yap Alıntılanma sayısı: 315 İlgili makaleler 8 sürümün hepsi HTML olarak görüntüle

[KİTAP][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com

A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

Kaydet Alıntı yap Alıntılanma sayısı: 158 İlgili makaleler 3 sürümün hepsi Kütüphane Araması

[Free GPT-4]

[PDF] mlr.press

Jump-start reinforcement learning

I Uchendu, T **ao, Y Lu, B Zhu, M Yan… - International …, 2023 - proceedings.mlr.press

Reinforcement learning (RL) provides a theoretical framework for continuously improving an
agent's behavior via trial and error. However, efficiently learning policies from scratch can be …

Kaydet Alıntı yap Alıntılanma sayısı: 127 İlgili makaleler 10 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] neurips.cc

Provable benefits of actor-critic methods for offline reinforcement learning

A Zanette, MJ Wainwright… - Advances in neural …, 2021 - proceedings.neurips.cc

Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …

Kaydet Alıntı yap Alıntılanma sayısı: 145 İlgili makaleler 8 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] neurips.cc

Natural policy gradient primal-dual method for constrained markov decision processes

D Ding, K Zhang, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc

We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …

Kaydet Alıntı yap Alıntılanma sayısı: 221 İlgili makaleler 8 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] jmlr.org

On the convergence rates of policy gradient methods

L **ao - Journal of Machine Learning Research, 2022 - jmlr.org

We consider infinite-horizon discounted Markov decision problems with finite state and
action spaces and study the convergence rates of the projected policy gradient method and …

Kaydet Alıntı yap Alıntılanma sayısı: 116 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] mlr.press

Policy gradient method for robust reinforcement learning

Y Wang, S Zou - International conference on machine …, 2022 - proceedings.mlr.press

This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …

Kaydet Alıntı yap Alıntılanma sayısı: 76 İlgili makaleler 7 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] mlr.press

Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies

I Fatkhullin, A Barakat, A Kireeva… - … Conference on Machine …, 2023 - proceedings.mlr.press

Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed
the development of their theoretical foundations. Despite the huge efforts directed at the …

Kaydet Alıntı yap Alıntılanma sayısı: 44 İlgili makaleler 8 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] mlr.press

Provably efficient safe exploration via primal-dual policy optimization

D Ding, X Wei, Z Yang, Z Wang… - … conference on artificial …, 2021 - proceedings.mlr.press

We study the safe reinforcement learning problem using the constrained Markov decision
processes in which an agent aims to maximize the expected total reward subject to a safety …

Kaydet Alıntı yap Alıntılanma sayısı: 186 İlgili makaleler 9 sürümün hepsi HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Variational policy gradient method for reinforcement learning with general utilities

Is pessimism provably efficient for offline rl?

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

[KİTAP][B] Control systems and reinforcement learning

Jump-start reinforcement learning

Provable benefits of actor-critic methods for offline reinforcement learning

Natural policy gradient primal-dual method for constrained markov decision processes

On the convergence rates of policy gradient methods

Policy gradient method for robust reinforcement learning

Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies

Provably efficient safe exploration via primal-dual policy optimization