The curious price of distributional robustness in reinforcement learning with a generative model
This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
Settling the sample complexity of model-based offline reinforcement learning
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Distributionally robust model-based offline reinforcement learning with near-optimal sample complexity
This paper concerns the central issues of model robustness and sample efficiency in offline
reinforcement learning (RL), which aims to learn to perform decision making from history …
reinforcement learning (RL), which aims to learn to perform decision making from history …
Breaking the sample size barrier in model-based reinforcement learning with a generative model
We investigate the sample efficiency of reinforcement learning in a $\gamma $-discounted
infinite-horizon Markov decision process (MDP) with state space S and action space A …
infinite-horizon Markov decision process (MDP) with state space S and action space A …
Adversarial model for offline reinforcement learning
We propose a novel model-based offline Reinforcement Learning (RL) framework, called
Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn …
Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn …
Reinforcement learning with human feedback: Learning dynamic choices via pessimism
In this paper, we study offline Reinforcement Learning with Human Feedback (RLHF) where
we aim to learn the human's underlying reward and the MDP's optimal policy from a set of …
we aim to learn the human's underlying reward and the MDP's optimal policy from a set of …
Near-optimal offline reinforcement learning with linear representation: Leveraging variance information with pessimism
Offline reinforcement learning, which seeks to utilize offline/historical data to optimize
sequential decision-making strategies, has gained surging prominence in recent studies …
sequential decision-making strategies, has gained surging prominence in recent studies …
Is Q-learning minimax optimal? a tight sample complexity analysis
Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP)
in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the …
in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the …
Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game
Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-
collected dataset without further interactions with the environment. While various algorithms …
collected dataset without further interactions with the environment. While various algorithms …
The blessing of heterogeneity in federated q-learning: Linear speedup and beyond
In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function
by periodically aggregating local Q-estimates trained on local data alone. Focusing on …
by periodically aggregating local Q-estimates trained on local data alone. Focusing on …