- Academic Search

I Kostrikov, A Nair, S Levine - arxiv preprint arxiv:2110.06169, 2021 - arxiv.org

Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that
improves over the behavior policy that collected the dataset, while at the same time …

Save Cite Cited by 850 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

A minimalist approach to offline reinforcement learning

S Fujimoto, SS Gu - Advances in neural information …, 2021 - proceedings.neurips.cc

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.
Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms …

Save Cite Cited by 854 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Offline reinforcement learning as one big sequence modeling problem

M Janner, Q Li, S Levine - Advances in neural information …, 2021 - proceedings.neurips.cc

Reinforcement learning (RL) is typically viewed as the problem of estimating single-step
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …

Save Cite Cited by 782 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Combo: Conservative offline model-based policy optimization

T Yu, A Kumar, R Rafailov… - Advances in neural …, 2021 - proceedings.neurips.cc

Abstract Model-based reinforcement learning (RL) algorithms, which learn a dynamics
model from logged experience and perform conservative planning under the learned model …

Save Cite Cited by 454 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] openreview.net

What matters in learning from offline human demonstrations for robot manipulation

A Mandlekar, D Xu, J Wong, S Nasiriany… - arxiv preprint arxiv …, 2021 - arxiv.org

Imitating human demonstrations is a promising approach to endow robots with various
manipulation capabilities. While recent advances have been made in imitation learning and …

Save Cite Cited by 414 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

P Rashidinejad, B Zhu, C Ma, J Jiao… - Advances in Neural …, 2021 - proceedings.neurips.cc

Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …

Save Cite Cited by 315 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

M Nakamoto, S Zhai, A Singh… - Advances in …, 2024 - proceedings.neurips.cc

A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization
from existing datasets followed by fast online fine-tuning with limited interaction. However …

Save Cite Cited by 101 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Idql: Implicit q-learning as an actor-critic method with diffusion policies

P Hansen-Estruch, I Kostrikov, M Janner… - arxiv preprint arxiv …, 2023 - arxiv.org

Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-
learning (IQL) addresses this by training a Q-function using only dataset actions through a …

Save Cite Cited by 114 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Mildly conservative q-learning for offline reinforcement learning

J Lyu, X Ma, X Li, Z Lu - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …

Save Cite Cited by 124 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble

S Lee, Y Seo, K Lee, P Abbeel… - Conference on Robot …, 2022 - proceedings.mlr.press

Recent advance in deep offline reinforcement learning (RL) has made it possible to train
strong robotic agents from offline datasets. However, depending on the quality of the trained …

Save Cite Cited by 202 Related articles All 5 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Emaq: Expected-max q-learning operator for simple yet effective offline and online rl

Offline reinforcement learning with implicit q-learning

A minimalist approach to offline reinforcement learning

Offline reinforcement learning as one big sequence modeling problem

Combo: Conservative offline model-based policy optimization

What matters in learning from offline human demonstrations for robot manipulation

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

Idql: Implicit q-learning as an actor-critic method with diffusion policies

Mildly conservative q-learning for offline reinforcement learning

Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble