Deep reinforcement learning based energy management strategies for electrified vehicles: Recent advances and perspectives

H He, X Meng, Y Wang, A Khajepour, X An… - … and Sustainable Energy …, 2024 - Elsevier
Electrified vehicles provide an effective solution to address the unfavorable impacts of fossil
fuel use in the transportation sector. Energy management strategy (EMS) is the core …

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

A Agarwal, SM Kakade, JD Lee, G Mahajan - Journal of Machine Learning …, 2021 - jmlr.org
Policy gradient methods are among the most effective methods in challenging reinforcement
learning problems with large state and/or action spaces. However, little is known about even …

Optimality and approximation with policy gradient methods in markov decision processes

A Agarwal, SM Kakade, JD Lee… - … on Learning Theory, 2020 - proceedings.mlr.press
Policy gradient (PG) methods are among the most effective methods in challenging
reinforcement learning problems with large state and/or action spaces. However, little is …

A survey of inverse reinforcement learning

S Adams, T Cody, PA Beling - Artificial Intelligence Review, 2022 - Springer
Learning from demonstration, or imitation learning, is the process of learning to act in an
environment from examples provided by a teacher. Inverse reinforcement learning (IRL) is a …

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C **, Z Wang - International Conference on …, 2020 - proceedings.mlr.press
While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

A theory of regularized markov decision processes

M Geist, B Scherrer, O Pietquin - … conference on machine …, 2019 - proceedings.mlr.press
Many recent successful (deep) reinforcement learning algorithms make use of
regularization, generally based on entropy or Kullback-Leibler divergence. We propose a …

Bridging the gap between value and policy based reinforcement learning

O Nachum, M Norouzi, K Xu… - Advances in neural …, 2017 - proceedings.neurips.cc
We establish a new connection between value and policy based reinforcement learning
(RL) based on a relationship between softmax temporal value consistency and policy …

Neural trust region/proximal policy optimization attains globally optimal policy

B Liu, Q Cai, Z Yang, Z Wang - Advances in neural …, 2019 - proceedings.neurips.cc
Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor
and critic parametrized by neural networks achieve significant empirical success in deep …

Taming the noise in reinforcement learning via soft updates

R Fox, A Pakman, N Tishby - arxiv preprint arxiv:1512.08562, 2015 - arxiv.org
Model-free reinforcement learning algorithms, such as Q-learning, perform poorly in the
early stages of learning in noisy environments, because much effort is spent unlearning …

A unified view of entropy-regularized markov decision processes

G Neu, A Jonsson, V Gómez - arxiv preprint arxiv:1705.07798, 2017 - arxiv.org
We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …