Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

Rorl: Robust offline reinforcement learning via conservative smoothing

R Yang, C Bai, X Ma, Z Wang… - Advances in neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) provides a promising direction to exploit massive amount
of offline data for complex decision-making tasks. Due to the distribution shift issue, current …

Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint

W **ong, H Dong, C Ye, Z Wang, H Zhong… - … on Machine Learning, 2024 - openreview.net
This paper studies the theoretical framework of the alignment process of generative models
with Reinforcement Learning from Human Feedback (RLHF). We consider a standard …

Leveraging offline data in online reinforcement learning

A Wagenmaker, A Pacchiano - International Conference on …, 2023 - proceedings.mlr.press
Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …

Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game

W **ong, H Zhong, C Shi, C Shen, L Wang… - arxiv preprint arxiv …, 2022 - arxiv.org
Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-
collected dataset without further interactions with the environment. While various algorithms …

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org
This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …

Optimal conservative offline rl with general function approximation via augmented lagrangian

P Rashidinejad, H Zhu, K Yang, S Russell… - arxiv preprint arxiv …, 2022 - arxiv.org
Offline reinforcement learning (RL), which refers to decision-making from a previously-
collected dataset of interactions, has received significant attention over the past years. Much …

Settling the sample complexity of online reinforcement learning

Z Zhang, Y Chen, JD Lee… - The Thirty Seventh Annual …, 2024 - proceedings.mlr.press
A central issue lying at the heart of online reinforcement learning (RL) is data efficiency.
While a number of recent works achieved asymptotically minimal regret in online RL, the …

Posterior sampling with delayed feedback for reinforcement learning with linear function approximation

NL Kuang, M Yin, M Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …