Reward-agnostic fine-tuning: Provable statistical benefits of hybrid reinforcement learning

G Li, W Zhan, JD Lee, Y Chi… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes
access to both an offline dataset and online interactions with the unknown environment. A …

Provably safe reinforcement learning with step-wise violation constraints

N **ong, Y Du, L Huang - Advances in Neural Information …, 2024 - proceedings.neurips.cc
We investigate a novel safe reinforcement learning problem with step-wise violation
constraints. Our problem differs from existing works in that we focus on stricter step-wise …

Minimax-optimal reward-agnostic exploration in reinforcement learning

G Li, Y Yan, Y Chen, J Fan - The Thirty Seventh Annual …, 2024 - proceedings.mlr.press
This paper studies reward-agnostic exploration in reinforcement learning (RL)—a scenario
where the learner is unware of the reward functions during the exploration stage—and …

Provable safe reinforcement learning with binary feedback

A Bennett, D Misra, N Kallus - International Conference on …, 2023 - proceedings.mlr.press
Safety is a crucial necessity in many applications of reinforcement learning (RL), whether
robotic, automotive, or medical. Many existing approaches to safe RL rely on receiving …

Near-optimal conservative exploration in reinforcement learning under episode-wise constraints

D Li, R Huang, C Shen, J Yang - … Conference on Machine …, 2023 - proceedings.mlr.press
This paper investigates conservative exploration in reinforcement learning where the
performance of the learning agent is guaranteed to be above a certain threshold throughout …

AED: Adaptable Error Detection for Few-shot Imitation Policy

JF Yeh, KH Hung, PC Lo, CM Chung, TH Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
We study how to report few-shot imitation (FSI) policies' behavior errors in novel
environments, a novel task named adaptable error detection (AED). The potential to cause …