Академия Google

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

Сохранить Цитировать Цитируется: 110 Похожие статьи Все версии статьи (10) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org

Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

Сохранить Цитировать Цитируется: 91 Похожие статьи Все версии статьи (8)

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Rorl: Robust offline reinforcement learning via conservative smoothing

R Yang, C Bai, X Ma, Z Wang… - Advances in neural …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount
of offline data for complex decision-making tasks. Due to the distribution shift issue, current …

Сохранить Цитировать Цитируется: 85 Похожие статьи Все версии статьи (8) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint

W **ong, H Dong, C Ye, Z Wang, H Zhong… - … on Machine Learning, 2024 - openreview.net

This paper studies the theoretical framework of the alignment process of generative models
with Reinforcement Learning from Human Feedback (RLHF). We consider a standard …

Сохранить Цитировать Цитируется: 87 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Leveraging offline data in online reinforcement learning

A Wagenmaker, A Pacchiano - International Conference on …, 2023 - proceedings.mlr.press

Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …

Сохранить Цитировать Цитируется: 48 Похожие статьи Все версии статьи (6) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game

W **ong, H Zhong, C Shi, C Shen, L Wang… - arxiv preprint arxiv …, 2022 - arxiv.org

Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-
collected dataset without further interactions with the environment. While various algorithms …

Сохранить Цитировать Цитируется: 51 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org

This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …

Сохранить Цитировать Цитируется: 61 Похожие статьи Все версии статьи (8)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Optimal conservative offline rl with general function approximation via augmented lagrangian

P Rashidinejad, H Zhu, K Yang, S Russell… - arxiv preprint arxiv …, 2022 - arxiv.org

Offline reinforcement learning (RL), which refers to decision-making from a previously-
collected dataset of interactions, has received significant attention over the past years. Much …

Сохранить Цитировать Цитируется: 42 Похожие статьи Все версии статьи (8) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Settling the sample complexity of online reinforcement learning

Z Zhang, Y Chen, JD Lee… - The Thirty Seventh Annual …, 2024 - proceedings.mlr.press

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency.
While a number of recent works achieved asymptotically minimal regret in online RL, the …

Сохранить Цитировать Цитируется: 22 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Posterior sampling with delayed feedback for reinforcement learning with linear function approximation

NL Kuang, M Yin, M Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …

Сохранить Цитировать Цитируется: 8 Похожие статьи Все версии статьи (5) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Near-optimal offline reinforcement learning with linear representation: Leveraging variance...

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

Settling the sample complexity of model-based offline reinforcement learning

Rorl: Robust offline reinforcement learning via conservative smoothing

Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint

Leveraging offline data in online reinforcement learning

Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game

The efficacy of pessimism in asynchronous Q-learning

Optimal conservative offline rl with general function approximation via augmented lagrangian

Settling the sample complexity of online reinforcement learning

Posterior sampling with delayed feedback for reinforcement learning with linear function approximation