- Academic Search

記事

Scholar

2 件（0.03 秒）

プロフィールマイライブラリ

Mdp environments for the OpenAI Gym

引用している記事内を検索

[Free GPT-4]

[PDF] jmlr.org

Off-policy actor-critic with emphatic weightings

E Graves, E Imani, R Kumaraswamy, M White - Journal of Machine …, 2023 - jmlr.org

A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due
to the policy gradient theorem, which provides a simplified form for the gradient. The off …

保存引用被引用数: 7 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Curious Explorer: a provable exploration strategy in Policy Learning

M Miani, M Parton, M Romito - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

A coverage assumption is critical with policy gradient methods, because while the objective
function is insensitive to updates in unlikely states, the agent may need improvements in …

保存引用関連記事全 3 バージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Mdp environments for the OpenAI Gym

Off-policy actor-critic with emphatic weightings

Curious Explorer: a provable exploration strategy in Policy Learning