Near-optimal offline reinforcement learning with linear representation: Leveraging variance information with pessimism

M Yin, Y Duan, M Wang, YX Wang - arxiv preprint arxiv:2203.05804, 2022 - arxiv.org
Offline reinforcement learning, which seeks to utilize offline/historical data to optimize
sequential decision-making strategies, has gained surging prominence in recent studies …

Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game

W **ong, H Zhong, C Shi, C Shen, L Wang… - arxiv preprint arxiv …, 2022 - arxiv.org
Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-
collected dataset without further interactions with the environment. While various algorithms …

Learn to match with no regret: Reinforcement learning in markov matching markets

Y Min, T Wang, R Xu, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study a Markov matching market involving a planner and a set of strategic agents on the
two sides of the market. At each step, the agents are presented with a dynamical context …

Posterior sampling with delayed feedback for reinforcement learning with linear function approximation

NL Kuang, M Yin, M Wang… - Advances in neural …, 2023 - proceedings.neurips.cc
Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …

Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes

M Lu, Y Min, Z Wang, Z Yang - arxiv preprint arxiv:2205.13589, 2022 - arxiv.org
We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …

Noise-adaptive thompson sampling for linear contextual bandits

R Xu, Y Min, T Wang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Linear contextual bandits represent a fundamental class of models with numerous real-
world applications, and it is critical to develop algorithms that can effectively manage noise …

Sample complexity of offline distributionally robust linear markov decision processes

H Wang, L Shi, Y Chi - arxiv preprint arxiv:2403.12946, 2024 - arxiv.org
In offline reinforcement learning (RL), the absence of active exploration calls for attention on
the model robustness to tackle the sim-to-real gap, where the discrepancy between the …

Provable benefit of multitask representation learning in reinforcement learning

Y Cheng, S Feng, J Yang, H Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc
As representation learning becomes a powerful technique to reduce sample complexity in
reinforcement learning (RL) in practice, theoretical understanding of its advantage is still …

Cooperative multi-agent reinforcement learning: asynchronous communication and linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …

Minimax optimal and computationally efficient algorithms for distributionally robust offline reinforcement learning

Z Liu, P Xu - arxiv preprint arxiv:2403.09621, 2024 - arxiv.org
Distributionally robust offline reinforcement learning (RL), which seeks robust policy training
against environment perturbation by modeling dynamics uncertainty, calls for function …