- Academic Search

Provably efficient offline reinforcement learning for partially observable Markov decision processes

Z Deng, J Jiang, G Long, C Zhang - ar** estimators whose guarantee avoids exponential …

Simpan Kutip Dirujuk 4 kali Artikel terkait 3 versi Versi HTML

A policy gradient method for confounded pomdps

M Hong, Z Qi, Y Xu - arxiv preprint arxiv:2305.17083, 2023 - arxiv.org

In this paper, we propose a policy gradient method for confounded partially observable
Markov decision processes (POMDPs) with continuous state and observation spaces in the …

Simpan Kutip Dirujuk 6 kali Artikel terkait 7 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Provably efficient offline reinforcement learning in regular decision processes

R Cipollone, A Jonsson, A Ronca… - Advances in Neural …, 2023 - proceedings.neurips.cc

This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular
Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes …

Simpan Kutip Dirujuk 3 kali Artikel terkait 7 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Provably efficient ucb-type algorithms for learning predictive state representations

R Huang, Y Liang, J Yang - arxiv preprint arxiv:2307.00405, 2023 - arxiv.org

The general sequential decision-making problem, which includes Markov decision
processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at …

Simpan Kutip Dirujuk 6 kali Artikel terkait 7 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

J Hong, A Dragan, S Levine - arxiv preprint arxiv:2310.20663, 2023 - arxiv.org

Offline reinforcement learning (RL) can in principle synthesize more optimal behavior from a
dataset consisting only of suboptimal trials. One way that this can happen is by" stitching" …

Simpan Kutip Dirujuk 2 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learn to teach: Improve sample efficiency in teacher-student learning for sim-to-real transfer

F Wu, Z Gu, Y Zhao, A Wu - arxiv preprint arxiv:2402.06783, 2024 - arxiv.org

Simulation-to-reality (sim-to-real) transfer is a fundamental problem for robot learning.
Domain Randomization, which adds randomization during training, is a powerful technique …

Simpan Kutip Dirujuk 1 kali Artikel terkait 3 versi Versi HTML

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Provably efficient offline reinforcement learning for partially observable Markov decision processes

Causal reinforcement learning: A survey

A policy gradient method for confounded pomdps

Provably efficient offline reinforcement learning in regular decision processes

Provably efficient ucb-type algorithms for learning predictive state representations

Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

Learn to teach: Improve sample efficiency in teacher-student learning for sim-to-real transfer