- Academic Search

Q Lan, S Tosatto, H Farrahi, AR Mahmood - arxiv preprint arxiv …, 2021‏ - arxiv.org‏

Despite the increasing popularity of policy gradient methods, they are yet to be widely
utilized in sample-scarce applications, such as robotics. The sample efficiency could be …‏

שמור צטט צוטט על ידי 12 מאמרים בנושא זה כל 5 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Diminishing return of value expansion methods in model-based reinforcement learning‏

D Palenicek, M Lutter, J Carvalho, J Peters - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Model-based reinforcement learning is one approach to increase sample efficiency.
However, the accuracy of the dynamics model and the resulting compounding error over …‏

שמור צטט צוטט על ידי 5 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Diminishing Return of Value Expansion Methods‏

D Palenicek, M Lutter, J Carvalho, D Dennert… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Model-based reinforcement learning aims to increase sample efficiency, but the accuracy of
dynamics models and the resulting compounding errors are often seen as key limitations …‏

שמור צטט מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

A gradient critic for policy gradient estimation‏

S Tosatto, A Patterson, M White… - … European Workshop on …, 2023‏ - openreview.net‏

The policy gradient theorem (Sutton et al., 2000) prescribes the usage of the on-policy state
distribution to approximate the gradient. Most algorithms based on this theorem, in practice …‏

שמור צטט צוטט על ידי 1 מאמרים בנושא זה פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] vu.nl

Analysis of Measure-Valued Derivatives in a Reinforcement Learning Actor-Critic Framework‏

K Van Den Houten, E Van Krieken… - 2022 Winter …, 2022‏ - ieeexplore.ieee.org‏

Policy gradient methods are successful for a wide range of reinforcement learning tasks.
Traditionally, such methods utilize the score function as stochastic gradient estimator. We …‏

שמור צטט צוטט על ידי 1 מאמרים בנושא זה כל 4 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] d-nb.info

[PDF][PDF] Randomizing physics simulations for robot learning‏

F Muratore - 2021‏ - d-nb.info‏

The ability to mentally evaluate variations of the future may well be the key to intelligence.
Combined with the ability to reason, it makes humans excellent at handling new and …‏

שמור צטט צוטט על ידי 2 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] tu-darmstadt.de

[PDF][PDF] Trust region optimization of optimistic actor critic‏

N Kappes, P Herrmann - 2022‏ - ias.informatik.tu-darmstadt.de‏

The exploration-exploitation trade-off is a fundamental challenge in reinforcement learning.
While off-policy algorithms like Soft Actor-Critic (SAC) yield good performance, they can …‏

שמור צטט צוטט על ידי 1 מאמרים בנושא זה פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

An empirical analysis of measure-valued derivatives for policy gradients

Model-free policy learning with reward gradients‏

Diminishing return of value expansion methods in model-based reinforcement learning‏

Diminishing Return of Value Expansion Methods‏

A gradient critic for policy gradient estimation‏

Analysis of Measure-Valued Derivatives in a Reinforcement Learning Actor-Critic Framework‏

[PDF][PDF] Randomizing physics simulations for robot learning‏

[PDF][PDF] Trust region optimization of optimistic actor critic‏