Offline reinforcement learning: Tutorial, review, and perspectives on open problems

S Levine, A Kumar, G Tucker, J Fu - ar**_Policy_Optimization/links/674fb7dd876bd177783b0769/Graph-Attention-Based-Casual-Discovery-With-Trust-Region-Navigated-Clip**-Policy-Optimization.pdf" data-clk="hl=en&sa=T&oi=gga&ct=gga&cd=5&d=4215501129336400677&ei=1OerZ_mOHcqL6rQP_OOf0AI" data-clk-atid="JataRPF3gDoJ" target="_blank">[PDF] researchgate.net

[PDF][PDF] Trust Region Policy Optimization

J Schulman - ar** error reduction
A Kumar, J Fu, M Soh, G Tucker… - Advances in neural …, 2019 - proceedings.neurips.cc
Off-policy reinforcement learning aims to leverage experience collected from prior policies
for sample-efficient learning. However, in practice, commonly used off-policy approximate …

Morel: Model-based offline reinforcement learning

R Kidambi, A Rajeswaran… - Advances in neural …, 2020 - proceedings.neurips.cc
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based
solely on a dataset of historical interactions with the environment. This serves as an extreme …

Constrained policy optimization

J Achiam, D Held, A Tamar… - … conference on machine …, 2017 - proceedings.mlr.press
For many applications of reinforcement learning it can be more convenient to specify both a
reward function and constraints, rather than trying to design behavior through the reward …