Survival instinct in offline reinforcement learning
We present a novel observation about the behavior of offline reinforcement learning (RL)
algorithms: on many benchmark datasets, offline RL can produce well-performing and safe …
algorithms: on many benchmark datasets, offline RL can produce well-performing and safe …
Mahalo: Unifying offline reinforcement learning and imitation learning from observations
We study a new paradigm for sequential decision making, called offline policy learning from
observations (PLfO). Offline PLfO aims to learn policies using datasets with substandard …
observations (PLfO). Offline PLfO aims to learn policies using datasets with substandard …
Unsupervised behavior extraction via random intent priors
Reward-free data is abundant and contains rich prior knowledge of human behaviors, but it
is not well exploited by offline reinforcement learning (RL) algorithms. In this paper, we …
is not well exploited by offline reinforcement learning (RL) algorithms. In this paper, we …
Offline multitask representation learning for reinforcement learning
We study offline multitask representation learning in reinforcement learning (RL), where a
learner is provided with an offline dataset from different tasks that share a common …
learner is provided with an offline dataset from different tasks that share a common …
Offline imitation learning with model-based reverse augmentation
In offline Imitation Learning (IL), one of the main challenges is the covariate shift between
the expert observations and the actual distribution encountered by the agent, because it is …
the expert observations and the actual distribution encountered by the agent, because it is …
In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning
Offline preference-based reinforcement learning (PbRL) typically operates in two phases:
first, use human preferences to learn a reward model and annotate rewards for a reward …
first, use human preferences to learn a reward model and annotate rewards for a reward …
Leveraging unlabeled data sharing through kernel function approximation in offline reinforcement learning
Offline reinforcement learning (RL) learns policies from a fixed dataset, but often requires
large amounts of data. The challenge arises when labeled datasets are expensive …
large amounts of data. The challenge arises when labeled datasets are expensive …
How to Solve Contextual Goal-Oriented Problems with Offline Datasets?
We present a novel method, Contextual goal-Oriented Data Augmentation (CODA), which
uses commonly available unlabeled trajectories and context-goal pairs to solve Contextual …
uses commonly available unlabeled trajectories and context-goal pairs to solve Contextual …
Augmenting Offline RL with Unlabeled Data
Recent advancements in offline Reinforcement Learning (Offline RL) have led to an
increased focus on methods based on conservative policy updates to address the Out-of …
increased focus on methods based on conservative policy updates to address the Out-of …
RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
The reasoning abilities of large language models (LLMs) have improved with chain-of-
thought (CoT) prompting, allowing models to solve complex tasks in a stepwise manner …
thought (CoT) prompting, allowing models to solve complex tasks in a stepwise manner …