- Academic Search

Dynamic policy programming

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Deep reinforcement learning based energy management strategies for electrified vehicles: Recent advances and perspectives

H He, X Meng, Y Wang, A Khajepour, X An… - … and Sustainable Energy …, 2024 - Elsevier

Electrified vehicles provide an effective solution to address the unfavorable impacts of fossil
fuel use in the transportation sector. Energy management strategy (EMS) is the core …

บันทึก อ้างอิง อ้างโดย42 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

A Agarwal, SM Kakade, JD Lee, G Mahajan - Journal of Machine Learning …, 2021 - jmlr.org

Policy gradient methods are among the most effective methods in challenging reinforcement
learning problems with large state and/or action spaces. However, little is known about even …

บันทึก อ้างอิง อ้างโดย515 บทความที่เกี่ยวข้อง ทั้งหมด 13 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Optimality and approximation with policy gradient methods in markov decision processes

A Agarwal, SM Kakade, JD Lee… - … on Learning Theory, 2020 - proceedings.mlr.press

Policy gradient (PG) methods are among the most effective methods in challenging
reinforcement learning problems with large state and/or action spaces. However, little is …

บันทึก อ้างอิง อ้างโดย400 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

A survey of inverse reinforcement learning

S Adams, T Cody, PA Beling - Artificial Intelligence Review, 2022 - Springer

Learning from demonstration, or imitation learning, is the process of learning to act in an
environment from examples provided by a teacher. Inverse reinforcement learning (IRL) is a …

บันทึก อ้างอิง อ้างโดย125 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C **, Z Wang - International Conference on …, 2020 - proceedings.mlr.press

While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

บันทึก อ้างอิง อ้างโดย324 บทความที่เกี่ยวข้อง ทั้งหมด 10 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A theory of regularized markov decision processes

M Geist, B Scherrer, O Pietquin - … conference on machine …, 2019 - proceedings.mlr.press

Many recent successful (deep) reinforcement learning algorithms make use of
regularization, generally based on entropy or Kullback-Leibler divergence. We propose a …

บันทึก อ้างอิง อ้างโดย360 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Bridging the gap between value and policy based reinforcement learning

O Nachum, M Norouzi, K Xu… - Advances in neural …, 2017 - proceedings.neurips.cc

We establish a new connection between value and policy based reinforcement learning
(RL) based on a relationship between softmax temporal value consistency and policy …

บันทึก อ้างอิง อ้างโดย564 บทความที่เกี่ยวข้อง ทั้งหมด 14 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Neural trust region/proximal policy optimization attains globally optimal policy

B Liu, Q Cai, Z Yang, Z Wang - Advances in neural …, 2019 - proceedings.neurips.cc

Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor
and critic parametrized by neural networks achieve significant empirical success in deep …

บันทึก อ้างอิง อ้างโดย235 บทความที่เกี่ยวข้อง ทั้งหมด 9 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Taming the noise in reinforcement learning via soft updates

R Fox, A Pakman, N Tishby - arxiv preprint arxiv:1512.08562, 2015 - arxiv.org

Model-free reinforcement learning algorithms, such as Q-learning, perform poorly in the
early stages of learning in noisy environments, because much effort is spent unlearning …

บันทึก อ้างอิง อ้างโดย389 บทความที่เกี่ยวข้อง ทั้งหมด 11 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A unified view of entropy-regularized markov decision processes

G Neu, A Jonsson, V Gómez - arxiv preprint arxiv:1705.07798, 2017 - arxiv.org

We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …

บันทึก อ้างอิง อ้างโดย296 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Dynamic policy programming

Deep reinforcement learning based energy management strategies for electrified vehicles: Recent advances and perspectives

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

Optimality and approximation with policy gradient methods in markov decision processes

A survey of inverse reinforcement learning

Provably efficient exploration in policy optimization

A theory of regularized markov decision processes

Bridging the gap between value and policy based reinforcement learning

Neural trust region/proximal policy optimization attains globally optimal policy

Taming the noise in reinforcement learning via soft updates

A unified view of entropy-regularized markov decision processes