- Academic Search

S Du, S Kakade, J Lee, S Lovett… - International …, 2021 - proceedings.mlr.press

Abstract This work introduces Bilinear Classes, a new structural framework, which permit
generalization in reinforcement learning in a wide variety of settings through the use of …

บันทึก อ้างอิง อ้างโดย248 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms

C **, Q Liu, S Miryoosefi - Advances in neural information …, 2021 - proceedings.neurips.cc

Finding the minimal structural assumptions that empower sample-efficient learning is one of
the most important research directions in Reinforcement Learning (RL). This paper …

บันทึก อ้างอิง อ้างโดย268 บทความที่เกี่ยวข้อง ทั้งหมด 11 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Flambe: Structural complexity and representation learning of low rank mdps

A Agarwal, S Kakade… - Advances in neural …, 2020 - proceedings.neurips.cc

In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common
practice to make parametric assumptions where values or policies are functions of some low …

บันทึก อ้างอิง อ้างโดย298 บทความที่เกี่ยวข้อง ทั้งหมด 10 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

บันทึก อ้างอิง อ้างโดย247 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Learning near optimal policies with low inherent bellman error

A Zanette, A Lazaric, M Kochenderfer… - International …, 2020 - proceedings.mlr.press

We study the exploration problem with approximate linear action-value functions in episodic
reinforcement learning under the notion of low inherent Bellman error, a condition normally …

บันทึก อ้างอิง อ้างโดย258 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The role of coverage in online reinforcement learning

T **: Understanding the benefits of reward engineering on sample complexity

A Gupta, A Pacchiano, Y Zhai… - Advances in Neural …, 2022 - proceedings.neurips.cc

The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …

บันทึก อ้างอิง อ้างโดย70 บทความที่เกี่ยวข้อง ทั้งหมด 9 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Provably efficient safe exploration via primal-dual policy optimization

D Ding, X Wei, Z Yang, Z Wang… - … conference on artificial …, 2021 - proceedings.mlr.press

We study the safe reinforcement learning problem using the constrained Markov decision
processes in which an agent aims to maximize the expected total reward subject to a safety …

บันทึก อ้างอิง อ้างโดย189 บทความที่เกี่ยวข้อง ทั้งหมด 9 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Reinforcement learning with general value function approximation: Provably efficient approach via bounded eluder dimension

R Wang, RR Salakhutdinov… - Advances in Neural …, 2020 - proceedings.neurips.cc

Value function approximation has demonstrated phenomenal empirical success in
reinforcement learning (RL). Nevertheless, despite a handful of recent progress on …

บันทึก อ้างอิง อ้างโดย186 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Optimism in reinforcement learning with generalized linear function approximation

Bilinear classes: A structural framework for provable generalization in rl

Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms

Flambe: Structural complexity and representation learning of low rank mdps

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

Learning near optimal policies with low inherent bellman error

The role of coverage in online reinforcement learning

Provably efficient safe exploration via primal-dual policy optimization

Reinforcement learning with general value function approximation: Provably efficient approach via bounded eluder dimension