On sample-efficient offline reinforcement learning: Data diversity, posterior sampling and beyond
We seek to understand what facilitates sample-efficient learning from historical datasets for
sequential decision-making, a problem that is popularly known as offline reinforcement …
sequential decision-making, a problem that is popularly known as offline reinforcement …
Corruption Robust Offline Reinforcement Learning with Human Feedback
We study data corruption robustness for reinforcement learning with human feedback
(RLHF) in an offline setting. Given an offline dataset of pairs of trajectories along with …
(RLHF) in an offline setting. Given an offline dataset of pairs of trajectories along with …
A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Low-Rank MDPs
Offline reinforcement learning (RL) aims to learn a policy that maximizes the expected
cumulative reward using a pre-collected dataset. Offline RL with low-rank MDPs or general …
cumulative reward using a pre-collected dataset. Offline RL with low-rank MDPs or general …
The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation
In this paper, we study the offline RL problem with linear function approximation. Our main
structural assumption is that the MDP has low inherent Bellman error, which stipulates that …
structural assumption is that the MDP has low inherent Bellman error, which stipulates that …
Performative Reinforcement Learning with Linear Markov Decision Process
We study the setting of\emph {performative reinforcement learning} where the deployed
policy affects both the reward, and the transition of the underlying Markov decision process …
policy affects both the reward, and the transition of the underlying Markov decision process …
Offline RL via Feature-Occupancy Gradient Ascent
We study offline Reinforcement Learning in large infinite-horizon discounted Markov
Decision Processes (MDPs) when the reward and transition models are linearly realizable …
Decision Processes (MDPs) when the reward and transition models are linearly realizable …
A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs
We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon
discounted setting which aims to learn a policy that maximizes the expected discounted …
discounted setting which aims to learn a policy that maximizes the expected discounted …
[PDF][PDF] Offline Reinforcement Learning via Inverse Optimization
I Dimanidis, T Ok, PM Esfahani - 2024 - dcsc.tudelft.nl
Inspired by the recent successes of Inverse Optimization (IO) across various application
domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for …
domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for …
Reinforcement learning under general function approximation and novel interaction settings
J Chen - 2023 - ideals.illinois.edu
Reinforcement Learning (RL) is an area of machine learning where an intelligent agent
solves sequential decision-making problems based on experience. Recent advances in the …
solves sequential decision-making problems based on experience. Recent advances in the …