Survival instinct in offline reinforcement learning

A Li, D Misra, A Kolobov… - Advances in neural …, 2024 - proceedings.neurips.cc
We present a novel observation about the behavior of offline reinforcement learning (RL)
algorithms: on many benchmark datasets, offline RL can produce well-performing and safe …

Mahalo: Unifying offline reinforcement learning and imitation learning from observations

A Li, B Boots, CA Cheng - International Conference on …, 2023 - proceedings.mlr.press
We study a new paradigm for sequential decision making, called offline policy learning from
observations (PLfO). Offline PLfO aims to learn policies using datasets with substandard …

Unsupervised behavior extraction via random intent priors

H Hu, Y Yang, J Ye, Z Mai… - Advances in Neural …, 2023 - proceedings.neurips.cc
Reward-free data is abundant and contains rich prior knowledge of human behaviors, but it
is not well exploited by offline reinforcement learning (RL) algorithms. In this paper, we …

Offline multitask representation learning for reinforcement learning

H Ishfaq, T Nguyen-Tang, S Feng, R Arora… - arxiv preprint arxiv …, 2024 - arxiv.org
We study offline multitask representation learning in reinforcement learning (RL), where a
learner is provided with an offline dataset from different tasks that share a common …

Offline imitation learning with model-based reverse augmentation

JJ Shao, HS Shi, LZ Guo, YF Li - Proceedings of the 30th ACM SIGKDD …, 2024 - dl.acm.org
In offline Imitation Learning (IL), one of the main challenges is the covariate shift between
the expert observations and the actual distribution encountered by the agent, because it is …

In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning

S Tu, J Sun, Q Zhang, Y Zhang, J Liu, K Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Offline preference-based reinforcement learning (PbRL) typically operates in two phases:
first, use human preferences to learn a reward model and annotate rewards for a reward …

Leveraging unlabeled data sharing through kernel function approximation in offline reinforcement learning

YR Lai, FC Chang, PY Wu - arxiv preprint arxiv:2408.12307, 2024 - arxiv.org
Offline reinforcement learning (RL) learns policies from a fixed dataset, but often requires
large amounts of data. The challenge arises when labeled datasets are expensive …

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

Y Fan, J Li, A Swaminathan, A Modi… - arxiv preprint arxiv …, 2024 - arxiv.org
We present a novel method, Contextual goal-Oriented Data Augmentation (CODA), which
uses commonly available unlabeled trajectories and context-goal pairs to solve Contextual …

Augmenting Offline RL with Unlabeled Data

Z Wang, B Gangopadhyay, JF Yeh… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in offline Reinforcement Learning (Offline RL) have led to an
increased focus on methods based on conservative policy updates to address the Out-of …

RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner

FC Chang, YT Lee, HY Shih, PY Wu - arxiv preprint arxiv:2410.23912, 2024 - arxiv.org
The reasoning abilities of large language models (LLMs) have improved with chain-of-
thought (CoT) prompting, allowing models to solve complex tasks in a stepwise manner …