- Academic Search

S Levine, A Kumar, G Tucker, J Fu - ar**_Policy_Optimization/links/674fb7dd876bd177783b0769/Graph-Attention-Based-Casual-Discovery-With-Trust-Region-Navigated-Clip**-Policy-Optimization.pdf" data-clk="hl=en&sa=T&oi=gga&ct=gga&cd=5&d=4215501129336400677&ei=1OerZ_mOHcqL6rQP_OOf0AI" data-clk-atid="JataRPF3gDoJ" target="_blank">[PDF] researchgate.net

[PDF][PDF] Trust Region Policy Optimization

J Schulman - ar** error reduction

A Kumar, J Fu, M Soh, G Tucker… - Advances in neural …, 2019 - proceedings.neurips.cc

Off-policy reinforcement learning aims to leverage experience collected from prior policies
for sample-efficient learning. However, in practice, commonly used off-policy approximate …

Save Cite Cited by 1182 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Morel: Model-based offline reinforcement learning

R Kidambi, A Rajeswaran… - Advances in neural …, 2020 - proceedings.neurips.cc

In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based
solely on a dataset of historical interactions with the environment. This serves as an extreme …

Save Cite Cited by 785 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Constrained policy optimization

J Achiam, D Held, A Tamar… - … conference on machine …, 2017 - proceedings.mlr.press

For many applications of reinforcement learning it can be more convenient to specify both a
reward function and constraints, rather than trying to design behavior through the reward …

Save Cite Cited by 1670 Related articles All 7 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Approximately optimal approximate reinforcement learning

Offline reinforcement learning: Tutorial, review, and perspectives on open problems

[PDF][PDF] Trust Region Policy Optimization

Morel: Model-based offline reinforcement learning

Constrained policy optimization