Offline reinforcement learning with implicit q-learning
Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that
improves over the behavior policy that collected the dataset, while at the same time …
improves over the behavior policy that collected the dataset, while at the same time …
A minimalist approach to offline reinforcement learning
Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.
Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms …
Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms …
Offline reinforcement learning as one big sequence modeling problem
Reinforcement learning (RL) is typically viewed as the problem of estimating single-step
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …
Combo: Conservative offline model-based policy optimization
Abstract Model-based reinforcement learning (RL) algorithms, which learn a dynamics
model from logged experience and perform conservative planning under the learned model …
model from logged experience and perform conservative planning under the learned model …
What matters in learning from offline human demonstrations for robot manipulation
Imitating human demonstrations is a promising approach to endow robots with various
manipulation capabilities. While recent advances have been made in imitation learning and …
manipulation capabilities. While recent advances have been made in imitation learning and …
Bridging offline reinforcement learning and imitation learning: A tale of pessimism
Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …
a fixed dataset without active data collection. Based on the composition of the offline dataset …
Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning
A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization
from existing datasets followed by fast online fine-tuning with limited interaction. However …
from existing datasets followed by fast online fine-tuning with limited interaction. However …
Idql: Implicit q-learning as an actor-critic method with diffusion policies
Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-
learning (IQL) addresses this by training a Q-function using only dataset actions through a …
learning (IQL) addresses this by training a Q-function using only dataset actions through a …
Mildly conservative q-learning for offline reinforcement learning
Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …
without continually interacting with the environment. The distribution shift between the …
Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble
Recent advance in deep offline reinforcement learning (RL) has made it possible to train
strong robotic agents from offline datasets. However, depending on the quality of the trained …
strong robotic agents from offline datasets. However, depending on the quality of the trained …