A policy gradient method for confounded pomdps

M Hong, Z Qi, Y Xu - arxiv preprint arxiv:2305.17083, 2023 - arxiv.org
In this paper, we propose a policy gradient method for confounded partially observable
Markov decision processes (POMDPs) with continuous state and observation spaces in the …

Provably efficient offline reinforcement learning in regular decision processes

R Cipollone, A Jonsson, A Ronca… - Advances in Neural …, 2023 - proceedings.neurips.cc
This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular
Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes …

Provably efficient ucb-type algorithms for learning predictive state representations

R Huang, Y Liang, J Yang - arxiv preprint arxiv:2307.00405, 2023 - arxiv.org
The general sequential decision-making problem, which includes Markov decision
processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at …

Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

J Hong, A Dragan, S Levine - arxiv preprint arxiv:2310.20663, 2023 - arxiv.org
Offline reinforcement learning (RL) can in principle synthesize more optimal behavior from a
dataset consisting only of suboptimal trials. One way that this can happen is by" stitching" …

Learn to teach: Improve sample efficiency in teacher-student learning for sim-to-real transfer

F Wu, Z Gu, Y Zhao, A Wu - arxiv preprint arxiv:2402.06783, 2024 - arxiv.org
Simulation-to-reality (sim-to-real) transfer is a fundamental problem for robot learning.
Domain Randomization, which adds randomization during training, is a powerful technique …