- Academic Search

J He, H Zhao, D Zhou, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …

Lagre Referanse Sitert av 60 Beslektede artikler Alle 8 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

A theoretical analysis of optimistic proximal policy optimization in linear markov decision processes

H Zhong, T Zhang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

The proximal policy optimization (PPO) algorithm stands as one of the most prosperous
methods in the field of reinforcement learning (RL). Despite its success, the theoretical …

Lagre Referanse Sitert av 33 Beslektede artikler Alle 7 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

A Agarwal, Y **, T Zhang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We study time-inhomogeneous episodic reinforcement learning (RL) under general function
approximation and sparse rewards. We design a new algorithm, Variance-weighted …

Lagre Referanse Sitert av 48 Beslektede artikler Alle 5 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Corruption-robust offline reinforcement learning with general function approximation

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2023 - proceedings.neurips.cc

We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …

Lagre Referanse Sitert av 18 Beslektede artikler Alle 8 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency

H Zhao, J He, D Zhou, T Zhang… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

Recently, several studies\citep {zhou2021nearly, zhang2021variance, kim2021improved,
zhou2022computationally} have provided variance-dependent regret bounds for linear …

Lagre Referanse Sitert av 32 Beslektede artikler Alle 7 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reinforcement learning from human feedback with active queries

K Ji, J He, Q Gu - arxiv preprint arxiv:2402.09401, 2024 - arxiv.org

Aligning large language models (LLM) with human preference plays a key role in building
modern generative models and can be achieved by reinforcement learning from human …

Lagre Referanse Sitert av 19 Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Noise-adaptive thompson sampling for linear contextual bandits

R Xu, Y Min, T Wang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Linear contextual bandits represent a fundamental class of models with numerous real-
world applications, and it is critical to develop algorithms that can effectively manage noise …

Lagre Referanse Sitert av 9 Beslektede artikler Alle 4 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Cooperative multi-agent reinforcement learning: asynchronous communication and linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …

Lagre Referanse Sitert av 11 Beslektede artikler Alle 7 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A nearly optimal and low-switching algorithm for reinforcement learning with general function approximation

H Zhao, J He, Q Gu - arxiv preprint arxiv:2311.15238, 2023 - arxiv.org

The exploration-exploitation dilemma has been a central challenge in reinforcement
learning (RL) with complex model classes. In this paper, we propose a new algorithm …

Lagre Referanse Sitert av 12 Beslektede artikler Alle 4 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Tackling heavy-tailed rewards in reinforcement learning with function approximation: Minimax optimal and instance-dependent regret bounds

J Huang, H Zhong, L Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc

While numerous works have focused on devising efficient algorithms for reinforcement
learning (RL) with uniformly bounded rewards, it remains an open question whether sample …

Lagre Referanse Sitert av 8 Beslektede artikler Alle 7 versjoner HTML-versjon

Opprett varsel

Referanse

Avansert søk

Lagret i Mitt bibliotek

Computationally efficient horizon-free reinforcement learning for linear mixture mdps

Nearly minimax optimal reinforcement learning for linear markov decision processes

A theoretical analysis of optimistic proximal policy optimization in linear markov decision processes

VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

Corruption-robust offline reinforcement learning with general function approximation

Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency

Reinforcement learning from human feedback with active queries

Noise-adaptive thompson sampling for linear contextual bandits

Cooperative multi-agent reinforcement learning: asynchronous communication and linear function approximation

A nearly optimal and low-switching algorithm for reinforcement learning with general function approximation

Tackling heavy-tailed rewards in reinforcement learning with function approximation: Minimax optimal and instance-dependent regret bounds