Unifying principles of generalization: past, present, and future
Generalization, defined as applying limited experiences to novel situations, represents a
cornerstone of human intelligence. Our review traces the evolution and continuity of …
cornerstone of human intelligence. Our review traces the evolution and continuity of …
Is pessimism provably efficient for offline rl?
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …
a dataset collected a priori. Due to the lack of further interactions with the environment …
The statistical complexity of interactive decision making
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
Bilinear classes: A structural framework for provable generalization in rl
Abstract This work introduces Bilinear Classes, a new structural framework, which permit
generalization in reinforcement learning in a wide variety of settings through the use of …
generalization in reinforcement learning in a wide variety of settings through the use of …
Nearly minimax optimal reinforcement learning for linear mixture markov decision processes
We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …
underlying transition probability kernel of the Markov decision process (MDP) is a linear …
Policy finetuning: Bridging sample-efficient offline and online reinforcement learning
Recent theoretical work studies sample-efficient reinforcement learning (RL) extensively in
two settings: learning interactively in the environment (online RL), or learning from an offline …
two settings: learning interactively in the environment (online RL), or learning from an offline …
Human-in-the-loop: Provably efficient preference-based reinforcement learning with general function approximation
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences, where
instead of receiving a numeric reward at each step, the RL agent only receives preferences …
instead of receiving a numeric reward at each step, the RL agent only receives preferences …
Guarantees for epsilon-greedy reinforcement learning with function approximation
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to
explore efficiently in some reinforcement learning tasks and yet, they perform well in many …
explore efficiently in some reinforcement learning tasks and yet, they perform well in many …
The role of coverage in online reinforcement learning
Coverage conditions--which assert that the data logging distribution adequately covers the
state space--play a fundamental role in determining the sample complexity of offline …
state space--play a fundamental role in determining the sample complexity of offline …
Corruption-robust offline reinforcement learning with general function approximation
We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …
with general function approximation, where an adversary can corrupt each sample in the …