Unifying principles of generalization: past, present, and future

CM Wu, B Meder, E Schulz - Annual Review of Psychology, 2024 - annualreviews.org
Generalization, defined as applying limited experiences to novel situations, represents a
cornerstone of human intelligence. Our review traces the evolution and continuity of …

Is pessimism provably efficient for offline rl?

Y **, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arxiv preprint arxiv:2112.13487, 2021 - arxiv.org
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

Bilinear classes: A structural framework for provable generalization in rl

S Du, S Kakade, J Lee, S Lovett… - International …, 2021 - proceedings.mlr.press
Abstract This work introduces Bilinear Classes, a new structural framework, which permit
generalization in reinforcement learning in a wide variety of settings through the use of …

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press
We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

Policy finetuning: Bridging sample-efficient offline and online reinforcement learning

T **e, N Jiang, H Wang, C **ong… - Advances in neural …, 2021 - proceedings.neurips.cc
Recent theoretical work studies sample-efficient reinforcement learning (RL) extensively in
two settings: learning interactively in the environment (online RL), or learning from an offline …

Human-in-the-loop: Provably efficient preference-based reinforcement learning with general function approximation

X Chen, H Zhong, Z Yang, Z Wang… - … on Machine Learning, 2022 - proceedings.mlr.press
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences, where
instead of receiving a numeric reward at each step, the RL agent only receives preferences …

Guarantees for epsilon-greedy reinforcement learning with function approximation

C Dann, Y Mansour, M Mohri… - International …, 2022 - proceedings.mlr.press
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to
explore efficiently in some reinforcement learning tasks and yet, they perform well in many …

The role of coverage in online reinforcement learning

T **e, DJ Foster, Y Bai, N Jiang, SM Kakade - arxiv preprint arxiv …, 2022 - arxiv.org
Coverage conditions--which assert that the data logging distribution adequately covers the
state space--play a fundamental role in determining the sample complexity of offline …

Corruption-robust offline reinforcement learning with general function approximation

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc
We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …