Google 학술 검색

저장 인용 4회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

Corruption Robust Offline Reinforcement Learning with Human Feedback

D Mandal, A Nika, P Kamalaruban, A Singla… - arxiv preprint arxiv …, 2024 - arxiv.org

We study data corruption robustness for reinforcement learning with human feedback
(RLHF) in an offline setting. Given an offline dataset of pairs of trajectories along with …

저장 인용 2회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Low-Rank MDPs

K Hong, A Tewari - arxiv preprint arxiv:2402.04493, 2024 - arxiv.org

Offline reinforcement learning (RL) aims to learn a policy that maximizes the expected
cumulative reward using a pre-collected dataset. Offline RL with low-rank MDPs or general …

The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation

N Golowich, A Moitra - arxiv preprint arxiv:2406.11686, 2024 - arxiv.org

In this paper, we study the offline RL problem with linear function approximation. Our main
structural assumption is that the MDP has low inherent Bellman error, which stipulates that …

Performative Reinforcement Learning with Linear Markov Decision Process

D Mandal, G Radanovic - arxiv preprint arxiv:2411.05234, 2024 - arxiv.org

We study the setting of\emph {performative reinforcement learning} where the deployed
policy affects both the reward, and the transition of the underlying Markov decision process …

저장 인용 관련 학술자료 전체 2개의 버전 HTML 버전

Offline RL via Feature-Occupancy Gradient Ascent

G Neu, N Okolo - arxiv preprint arxiv:2405.13755, 2024 - arxiv.org

We study offline Reinforcement Learning in large infinite-horizon discounted Markov
Decision Processes (MDPs) when the reward and transition models are linearly realizable …

[PDF] openreview.net

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

K Hong, A Tewari - Forty-first International Conference on Machine … - openreview.net

We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon
discounted setting which aims to learn a policy that maximizes the expected discounted …

저장 인용 1회 인용 관련 학술자료 HTML 버전

[PDF] tudelft.nl

[PDF][PDF] Offline Reinforcement Learning via Inverse Optimization

I Dimanidis, T Ok, PM Esfahani - 2024 - dcsc.tudelft.nl

Inspired by the recent successes of Inverse Optimization (IO) across various application
domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for …

[PDF] illinois.edu

Reinforcement learning under general function approximation and novel interaction settings

J Chen - 2023 - ideals.illinois.edu

Reinforcement Learning (RL) is an area of machine learning where an intelligent agent
solves sequential decision-making problems based on experience. Recent advances in the …