Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

M Nakamoto, S Zhai, A Singh… - Advances in …, 2023 - proceedings.neurips.cc
A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization
from existing datasets followed by fast online fine-tuning with limited interaction. However …

Foundation models for decision making: Problems, methods, and opportunities

S Yang, O Nachum, Y Du, J Wei, P Abbeel… - arxiv preprint arxiv …, 2023 - arxiv.org
Foundation models pretrained on diverse data at scale have demonstrated extraordinary
capabilities in a wide range of vision and language tasks. When such models are deployed …

Idql: Implicit q-learning as an actor-critic method with diffusion policies

P Hansen-Estruch, I Kostrikov, M Janner… - arxiv preprint arxiv …, 2023 - arxiv.org
Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-
learning (IQL) addresses this by training a Q-function using only dataset actions through a …

Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation

M Heo, Y Lee, D Lee, JJ Lim - The International Journal of …, 2023 - journals.sagepub.com
Reinforcement learning (RL), imitation learning (IL), and task and motion planning (TAMP)
have demonstrated impressive performance across various robotic manipulation tasks …

Bootstrap your own skills: Learning to solve new tasks with large language model guidance

J Zhang, J Zhang, K Pertsch, Z Liu, X Ren… - arxiv preprint arxiv …, 2023 - arxiv.org
We propose BOSS, an approach that automatically learns to solve new long-horizon,
complex, and meaningful tasks by growing a learned skill library with minimal supervision …

Revisiting the minimalist approach to offline reinforcement learning

D Tarasov, V Kurenkov, A Nikulin… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent years have witnessed significant advancements in offline reinforcement learning
(RL), resulting in the development of numerous algorithms with varying degrees of …

Serl: A software suite for sample-efficient robotic reinforcement learning

J Luo, Z Hu, C Xu, YL Tan, J Berg… - … on Robotics and …, 2024 - ieeexplore.ieee.org
In recent years, significant progress has been made in the field of robotic reinforcement
learning (RL), enabling methods that handle complex image observations, train in the real …

Leveraging offline data in online reinforcement learning

A Wagenmaker, A Pacchiano - International Conference on …, 2023 - proceedings.mlr.press
Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …

Reconciling reality through simulation: A real-to-sim-to-real approach for robust manipulation

M Torne, A Simeonov, Z Li, A Chan, T Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Imitation learning methods need significant human supervision to learn policies robust to
changes in object poses, physical disturbances, and visual distractors. Reinforcement …

Dataset reset policy optimization for rlhf

JD Chang, W Zhan, O Oertell, K Brantley… - arxiv preprint arxiv …, 2024 - arxiv.org
Reinforcement Learning (RL) from Human Preference-based feedback is a popular
paradigm for fine-tuning generative models, which has produced impressive models such as …