- Academic Search

Lagre Referanse Sitert av 54 Beslektede artikler Alle 6 versjoner HTML-versjon

Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game

W **ong, H Zhong, C Shi, C Shen, L Wang… - arxiv preprint arxiv …, 2022 - arxiv.org

Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-
collected dataset without further interactions with the environment. While various algorithms …

Lagre Referanse Sitert av 33 Beslektede artikler Alle 9 versjoner HTML-versjon

Learn to match with no regret: Reinforcement learning in markov matching markets

Y Min, T Wang, R Xu, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study a Markov matching market involving a planner and a set of strategic agents on the
two sides of the market. At each step, the agents are presented with a dynamical context …

Lagre Referanse Sitert av 8 Beslektede artikler Alle 7 versjoner HTML-versjon

Posterior sampling with delayed feedback for reinforcement learning with linear function approximation

NL Kuang, M Yin, M Wang… - Advances in neural …, 2023 - proceedings.neurips.cc

Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …

Lagre Referanse Sitert av 33 Beslektede artikler Alle 7 versjoner HTML-versjon

Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes

M Lu, Y Min, Z Wang, Z Yang - arxiv preprint arxiv:2205.13589, 2022 - arxiv.org

We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …

Lagre Referanse Sitert av 9 Beslektede artikler Alle 4 versjoner HTML-versjon

Noise-adaptive thompson sampling for linear contextual bandits

R Xu, Y Min, T Wang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Linear contextual bandits represent a fundamental class of models with numerous real-
world applications, and it is critical to develop algorithms that can effectively manage noise …

Lagre Referanse Sitert av 11 Beslektede artikler Alle 7 versjoner HTML-versjon

Sample complexity of offline distributionally robust linear markov decision processes

H Wang, L Shi, Y Chi - arxiv preprint arxiv:2403.12946, 2024 - arxiv.org

In offline reinforcement learning (RL), the absence of active exploration calls for attention on
the model robustness to tackle the sim-to-real gap, where the discrepancy between the …

Lagre Referanse Sitert av 25 Beslektede artikler Alle 6 versjoner HTML-versjon

Provable benefit of multitask representation learning in reinforcement learning

Y Cheng, S Feng, J Yang, H Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc

As representation learning becomes a powerful technique to reduce sample complexity in
reinforcement learning (RL) in practice, theoretical understanding of its advantage is still …

Lagre Referanse Sitert av 11 Beslektede artikler Alle 7 versjoner HTML-versjon

[PDF] mlr.press

Cooperative multi-agent reinforcement learning: asynchronous communication and linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …